Monday, October 19, 2015

Refining the FAIR principles

During the 2015 Biohackathon, Mark Wilkinson and I and a few others got together to discuss and refine the FAIR principles, as originally published on the Force11 website. Our goal was to clarify the principles so that they are naturally orthogonal and could be used to assess an implementation and determine the degree to which it conforms to the principles.  Find below a first draft of these revised principles, subject to further elaboration.

FAIR Principles  (proposed)
One of the grand challenges of data-intensive science is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows. Here, we describe FAIR - a minimal set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable.

To be Findable:
F1. (meta)data are assigned a globally unique and eternally persistent identifier
F2. data are described with rich metadata
F3. (meta)data are registered or indexed in a searchable resource
F4. metadata specify the data identifier

To be Accessible:
A1  (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2 metadata are eternally accessible, even when the data are no longer available

To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data

To be Re-usable:
R1. meta(data) have a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with their provenance
R1.3. (meta)data meet domain-relevant community standards

Update: We have updated the FAIR principles on the Force11 page to the ones you see here.


The Cycling Bard said...

Does a web page with some RDFa or microdata markup count as a searchable and indexed resource? Or must it be a data specific repository (zenodo?).

Michel Dumontier said...

The main idea is that other people can i) discover the dataset without you telling them where it is located and ii) one can use filters on the metadata fields to precisely find the dataset of interest. So, as long as there is a search engine that has appropriately indexed the web page to enable such functionality then this would be fine.

Alasdair J G Gray said...

A couple of questions:

1) Can anything ever be eternally persistent?

2) What is a data identifier?

Michel Dumontier said...

1) can anything be eternally persistent?
the idea is to think hard about the identifier system put in place ... the caveat is, to the extent possible.

2) what is a data identifier
the key here is that the metadata and the data have their own identifiers so that they may be referred to unambiguously (e.g. the data or metadata can be retrieved).

Peter Wittenburg said...

Hallo Michel,

as promissed here is a comment to your principles from someone who is deeply involved in RDA and thus interacting with various communities.
- I am very much inline with your principles and it would be good if we would start refering to each other. Please have a look at the RDA DFT model which supports your main statements.
You find the core document here:
You find all documents of DFT group (lot of reading under RDA DFT).
- This model simply states the basics and we found that it would be great if this would be implemented in the relevant software and as far as I can see it states the same as you have it in other words.
- wrt to accessibility we now are starting with a group that specifies a protocol and we should be iin close interaction with you.
- wrt to interoperability and re-usability other RDA groups are active and I will pass through your web-page. In particular we all start requesting that syntax needs to be registered and semantics must be defined - both in open registries.

In December we will have a chairs meeting in Washington DC and it would be great if you could join to sit together. Iit will be at 8/9. December at NIST in Gaithersburg. If you are interested let me know.


Michel Dumontier said...

Hi Peter,
That's great! Thanks for the pointers - those definitions will certainly be useful.
I'm currently in Dagstuhl discussing the Data Documentation Initiative (DDI) (, and reviewing their principles to find shared features. Feel free to send me more details about the meeting in Dec.

Michel Dumontier said...

Thanks for your comments. We've now updated the FAIR principles on the Force11 website. We welcome comments there as well.

Peter Wittenburg said...

Hallo Michael,

great - do we all agree that we need to start refering to each other?
This is so important, since we can show that from different points of views, from different domains, etc. we come to almost the same conclusions.

I will have a look on your statements and will add a reference to them on the RDA DFT siite.


Michel Dumontier said...

Hi Peter,
I've added a link to the RDA DFT in the notes. We'll work this up for the narrative and in our paper.



NĂºria Queralt Rosinach said...

Hi Michel,

I would like to share some comments/suggestions:


F5. metadata specify the repository/registry where the (meta)data is stored/registered
F6. (meta)data are associated with its (meta)dataset identifier or/and vice-versa

A3. publication of (meta)data accompanied by access APIs

I4. (meta)data created, processed, analysed with FAIR workflows, software, i.e.provide FAIR support creating/annotating digital objects.

R1.4. data specify the metadata identifier

For Findable and Re-usable principles i would specify that (meta)data have machine and human readable descriptions.


Michel Dumontier said...

Hi Nuria,
Thanks for your comments!

F5 -> i think this goes the other way; repositories index the metadata so the dataset is searchable.

F6 -> the point here is that metadata needs to contain the data identifier so that the data can be retrieved.

A3 -> again, the burden of linking is from the resources that use it, rather than the resource itself. it is the APIs that need to indicate what data/metadata they are APIs for.

I4 -> FAIR in, FAIR out :)

R1.4 -> there can be many metadata for data, and before many data formats have no capacity to include such information, the relationship must be from metadata to data.