Scientific Ramblings

Monday, October 19, 2015

Refining the FAIR principles

During the 2015 Biohackathon, Mark Wilkinson and I and a few others got together to discuss and refine the FAIR principles, as originally published on the Force11 website. Our goal was to clarify the principles so that they are naturally orthogonal and could be used to assess an implementation and determine the degree to which it conforms to the principles. Find below a first draft of these revised principles, subject to further elaboration.

FAIR Principles (proposed)

Preamble

One of the grand challenges of data-intensive science is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows. Here, we describe FAIR - a minimal set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable.

To be Findable:

F1. (meta)data are assigned a globally unique and eternally persistent identifier

F2. data are described with rich metadata

F3. (meta)data are registered or indexed in a searchable resource

F4. metadata specify the data identifier

To be Accessible:

A1 (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2 metadata are eternally accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

To be Re-usable:

R1. meta(data) have a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with their provenance

R1.3. (meta)data meet domain-relevant community standards

Update: We have updated the FAIR principles on the Force11 page to the ones you see here.

Thursday, March 21, 2013

Evaluation of ontology-powered scientific research as a means to assess and improve ontology quality

Gave a talk today on the ontology forum. The bottom line is that the structure, content and use of an ontology affects scientific research and that improvements in ontology quality can be task-driven.

Evaluation of ontology-powered scientific research as a means to assess and improve ontology quality from Michel Dumontier

Friday, October 14, 2011

Google Maps: break in the road bug!

Dear Google Maps,
I'm trying to plan a trip [1] in and around the Acatama Desert while i'm here in Chile, but have found an interesting bug. Basically, there is a break in the road (between A and B) which prevents maps from using that road: see here to reproduce http://g.co/maps/zwq8y

There's also a break on the road directly east of that break point.

I'd report the problem to you, if only you had the "Report a Problem" link anywhere on this page, despite your claims [2] to the contrary.

Best,

m.

[1] http://g.co/maps/3zwqt
[2] http://maps.google.com/support/bin/answer.py?hl=en&answer=162873&topic=1687362

Sunday, September 25, 2011

Provenance: what is it and how should we formalize it?

As a testament to the growing recognition of provenance for (e-)science, i'm glad to see that the W3C incubator group worked hard to think about the issues and make it possible to establish a W3C provenance interchange Working Group.

a good starting point:

"provenance is often represented as metadata, but not all metadata is necessarily provenance"

http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#Provenance_and_Metadata

but

"Descriptive metadata of a resource only becomes part of its provenance when one also specifies its relationship to deriving the resource."

does not provide adequate description for identifying the conditions.

and

"Provenance of a resource is a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource"

contains elements that are undefined (record), uncertain (are processes not also entities?), narrow (producing/delivering) and broad (influencing).

Of course, I appreciate the difficulty in crafting a good definition, and I understand that this is a definition from which useful work can be achieved. I will take the opportunity to express my thoughts on the matter.

i think there are two key aspects to provenance (not unlike what is suggested here: http://www.springerlink.com/content/edf0k68ccw3a22hu/)

1. how did the resource come about? (relates to creation and justification)

- important for reproducibility (which is an element of science)

- includes attribution (who created the resource), creation (process that generated the resource), reproduction (process in which a copy was made), derivation (process in which the resource was generated from some resource or portion of a resource), versioning (process of keeping count of sequential derivations)

2. what is the history of the resource (from the point of creation)

- important for authenticity

- includes origin, possession and the acts of transfer

Both have implications for trust, and can be used for accountability, among other things.

I find this part on recommendations of a provenance framework quite nice:

http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#A_Roadmap_for_Provenance_on_the_Web

but get less excited when i see the collection of "provenance concepts"

http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#Recommendations (section 4)

particularly because we need to simplify the discourse such that we consider

an event (for 1 above)

- participants (and their roles; e.g. agents, targets, products)

- locations

- time instants (e.g. action timestamps) and durations (processual attributes)

and a sequence of events (for both 1 and 2 above)

this would certainly help to generate a specification with a minimal set of classes and relations to express this kind of information.

now, i'm writing this late at night, and I appreciate that I may not have considered all the issues that the provenance group has (along with others that have written about the subject), but perhaps there is still some good discussions to be had wrt provenance and how we formally represent it, as it is of strategic importance to the HCLSIG in our current and future efforts.

Thursday, September 22, 2011

A letter to gmail: attachments

Dear Gmail team.

First, thank you for making it possible for me to see my unread mail - i sent you this idea some time ago, and i'm glad that you listened.

Second, I'm now somehow at 50% of my allocated capacity, and what i need is a way to filter my mails by attachment (which i can do!), but also order them from largest to smallest (which I can't do). Once i can order attachments by size, i can start deleting the big ones and free up more room! YAY!

m.