Scientific Ramblings: 2009

Wednesday, July 22, 2009

The trouble with ontology is ... the lack of "shared understanding"

This kind of analysis goes to the heart of one of the problems i have with monolithic enterprises that trivially assume agreement will come from some shared understanding of "reality". Nothing could be further from reality:

http://www.nytimes.com/2009/07/21/science/21angier.html?_r=1&ref=science

Monday, June 8, 2009

Setting out clear Ontology Design Principles

I've documented them here:

http://code.google.com/p/semanticscience/wiki/ODP

Friday, May 22, 2009

Critique of OBO Foundry Principles

The OBO Foundry aims to create a suite of orthogonal interoperable reference ontologies in the biomedical domain. They have outlined their principles here:

http://www.obofoundry.org/wiki/index.php/OBO_Foundry_Principles

In reading some of these I found that they poorly expressed true principles in ontology design. I provide here a brief critique on some of the contentious points:

"3. The ontologies possesses a unique identifier space within the OBO Foundry. The identifier uniquely and persistently identifies a definition, which itself unambiguous identifies some type of biological entity. The identifier is for the definition: it is NOT the name and it is NOT an identifier for the name.

There are systems that use alphanumeric id's - eg MetaCyc. This should be dis-encouraged, especially as these have semantic content."

This mixes up a number of issues. An identifier is a symbol for an entity, which should guarantee uniqueness in the lexical space, unlike human readable names which are not required to be unique. So it doesn’t matter whether the identifier is numeric, alphanumeric or alphabetic and thus the latter part of this principle, referring to alphanumeric MetaCyc ids, is pure nonsense. It is the description of the entity that *matters*, and that the textual description is arguably unchanging (What does OBOF say about when a description changes by even one word? Should a new identifier be crafted? How does one assess whether the previous identifier is in fact compatible with the new one? Should one be directed to use the new identifier – is it possible that the semantics are *fundamentally* different? These are far more important questions to address)

"6. The ontology must be orthogonal to other ontologies already lodged within OBO. For each domain, there should be convergence upon a single reference ontology that is recommended for use by those who wish to become involved with the Foundry initiative"

This is a contestable claim. Given that there is no universal agreement on many biological terms, any given ontology will not necessarily capture the semantics of what one wants to express. Anyone familiar with the word "gene" can easily demonstrate this as a case in point.

"7. The ontologies include textual definitions for all terms."

Textual descriptions aren't really useful unless they succinctly capture the essence of the entity in question. For instance, definitions in the (OWL version) BFO are incomprehensible to many people (certainly to my undergrad students). In many other cases the textual descriptions can be shown to be either overly vague or constraining in unrealistic ways.

Einstein said "Make everything as simple as possible, but not simpler" - a good mantra in crafting term descriptions is "Be as accurate as possible, while not adding superfluous information or imposing unnecessary constraints.

"9. The ontology is well documented."

Be more specific - What does "well documented" mean?

"10. The ontology has a plurality of independent users."

This is another unreasonable demand. The defining characteristic is that for every ontology, there exists requirements (possibly in the form of use cases) that the ontology can be demonstrated to satisfy. Ultimately, an ontology should have demonstrated utility. Paraphrasing Salinger - If you build it, (and have shown it to be useful) they will come.

"11. The ontology will be developed collaboratively with other OBO Foundry members."

A long standing myth is that ontologies need to be developed collaboratively - but in fact, we have found that such an approach is in fact wholly unproductive. What is productive is collecting use cases, undertaking focused development, and conducting a peer review and refinement process in which the needs of the community can be publicly solicited and addressed. This kind of procedure is in place at the W3C, and results in high quality standards. The OBO Foundry should consider setting up such a facility, with open calls for review across all relevant mailing lists, including quality assessment, additions/removals etc - particularly before things get published as a so-called "standard"

Monday, April 6, 2009

ChemAxon - Chemical Ontology

Posted at: http://wwmm.ch.cam.ac.uk/blogs/adams/?p=195

Making the distinction between a substance and a molecule is indeed important and valuable from an ontological perspective, particularly when it comes to reasoning about the domain. The distinction is often blurred simply because it is more pragmatic not to consider them different. Indeed, most chemists might agree that while there is a conceptual distinction, they don’t want to navigate through a set of high-brow concepts to find such simple (indirect) relationships. Have you considered the impact of representing knowledge in this way with respect to usability?

Another important issue is that of identity - you are making the argument that every different feature effectively warrants a different identifier. That if i want to make a statement about glucose in one form versus another, i need two different identifiers. This will lead to enormous numbers of different “concepts”, which may affect reasoning capability (especially in OWL!) and potentially also lead to sparsely populated knowledge bases. An alternative is to capture the semantics of non-structural features in relations to the main component. For instance, I could reuse the “Glucose” class by adding additional restrictions, in the context of some process or experimental result. e.g. in my experiment i found glucose to be in its chair form. Indeed, expressing the behaviour (or structural conformation in this case) wrt to the context leads itself to modular reuse and also improves our representation of knowledge.

A couple of pointers that you might find interesting:

1 - Contextual knowledge representation - http://dumontierlab.com/pdf/2008_OWLEDEU_MR.pdf

2 - Biochemical Identifiers - http://www.slideshare.net/micheldumontier/accurate-biochemical-knowledge-starting-with-precise-structurebased-criteria-for-molecular-identity

We’ve also worked on chemical ontologies in OWL - you can see them here:

http://dumontierlab.com/index.php?page=ontologies

it would be nice to merge the two and work towards a comprehensive OWL-based ontology for the chemistry domain. Let me know if that would interest you.