Wednesday, July 13, 2011
Sabbatical Interview
http://cualumni.carleton.ca/magazine/summer-2011/parting-shots/
Monday, July 4, 2011
Breaking Down the Sabbatical
July 3-14: Cambridge, UK (EBI - Dietrich Rebholz-Schuhmann, University of Cambridge - Robert Hoehndorf)
July 14-18: Vienna, Austria (Bio-ontologies, ISMB Tutorial)
July 18-August 7: Malta+Italy (Valletta, Catania, Salerno, Pompeii, Rome, Florence, Venice, Bologna, Pisa)
August 7-10: Finland (Tempere, Helsinki)
August 10-18: Iceland
August 18-29: Kyoto + Tokyo, Japan (Biohackathon 2011)
August 29-September 3: Madrid (Ontology Engineering Group : Alexander De Leon)
September 3-7: Heidelberg, Germany (COMBINE)
September 7-12: Ottawa (defense: Leonid Chepelev -> success!)
September 13-17: Nancy, France (INRIA/LORIA: Adrien Coulet)
September 19: Volendam (OpenPHACTS/Gen2Phen meeting on open data)
September 17-October 1: Paris, St Malo, Bordeaux, London (travel with parents)
October 1-11: Ottawa, Canada (defense: Natalia Villaneuva-Rosales -> success!)
October 11-December 1: Conception, Chile (Universite de Conception : Leo Ferres)
December 1-December 23: Santiago, San Pedro de Atacama, Mendoza, Buenos Aires, Colonia, Punta del Diablo and Montevideo)
December 24-January 31: Toronto, Ottawa
February - March: Stanford university (Nigam Shah, Mark Musen)
April - June : India, Nepal, Bhutan, Thailand, Singapore ?
Sabbatical 2011-2012: Formalizing Scientific Discourse
Background: Advancing knowledge in the life sciences involves experimentally testing hypotheses and interpreting the results based on prior scientific work. In generating a valid hypothesis, biologists face the overwhelming challenge of collecting, evaluating and integrating large and increasing amounts of different kinds of information about organisms, cells, genes and proteins from thousands of articles, hundreds of databases and dozens of tools. A biologist’s ability to efficiently construct and evaluate a hypothesis over current knowledge requires that i) knowledge, data and hypotheses are formally represented so they may be reasoned about, ii) adequate software exists to manage and query formal knowledge, and iii) data can be obtained by searching relevant databases and invoking the right analytical tools. The inability to efficiently discover relevant information can negatively impact scientific research directions and proposed activities. Methods for facilitating the construction and evaluation of hypotheses against the current state of knowledge could translate into greater scientific insight and increased productivity. Innovative approaches for knowledge discovery could be applied to data on the emerging Semantic Web and be transformative on a global scale.
Proposed Activities
1. Text to triples: The purpose of this leg of the sabbatical is to gain an understanding of the current state of the art in natural language processing and develop skill in producing high quality triples from parsing scientific text.
Time Frame: July 2011-September 2011
Location: European Bioinformatics Institute, Hinxton, UK.
Host: Dr. Rebholz-Schuhmann
2. Formalizing equations: The purpose of this leg of the sabbatical is to investigate the ontology of equations, represent scientific equations using Semantic Web technologies (principally the Rule Interchange Format), and implement semantic web services that serve to compute over formalized scientific equations.
Time Frame: October 2011-December 2011
Location: Universidad de Concepcion, Concepcion, Chile.
Host: Dr. Leo Ferres
3. Formalizing Research Hypotheses: The purpose of this leg of the sabbatical is to explore the formalization of hypotheses concerning disease. Specifically, I will extract meaningful facts from AlzForum and integrate these with resources from the National Centre for BioOntology (NCBO) and Bio2RDF, our large scale Semantic Web project.
Time Frame: January 2012-March 2012
Location: Stanford University, Palo Alto, California, USA.
Host: Dr. Mark Musen
4. Integrated Framework for Knowledge Discovery: The last leg of the sabbatical will be focused towards maximizing interoperability between text, equations, ontologies and database-derived facts. I will use SADI, our platform semantic web services framework, towards achieving this objective.
Time Frame: April 2012-June 2012
Location: India, Thailand, Singapore
Scientific Value and Broader Beneficial Impacts
The development and application of efficient strategies for knowledge discovery is a major goal in bioinformatics. My research into new strategies for the representation and evaluation of scientific hypotheses using ontologies, scientific text, data and bioinformatic services will create a novel platform that will significantly contribute to scientific productivity and ultimately improve our understanding of biology. The proposed sabbatical will provide me with new skills that will be used to train a future training of young scientists in the areas of formal knowledge representation, text mining and the Semantic Web. Ultimately, it is expected that the sabbatical will cultivate new partnerships with leading scientists and open new doors to work with industry and government agencies.
Friday, June 4, 2010
SADI
(modified from an email that Mark Wilkinson sent)
SADI is a very lightweight "standard" (set of best-practices, really) for modeling and providing Web Services. It uses standards from the W3C Semantic Web initiative - in particular, it uses OWL for types, and RDF for instance data.
This is critical advantage #1 for SADI over traditional Web Services frameworks - in traditional XML-based Web Services, you still must code your client software to access each service, since the service interfaces cannot be interpreted by the machine. In SADI, we can design ONE piece of software to access all resources exposed as SADI services - "one ring to rule them all!". (and we already have several different "rings" that expose SADI data in different ways)
Critical advantage #2 is a bit more obscure and hard to describe, but is likely to be the more important in the long-run. In SADI, data is "grounded" in explicit semantics. This means that all data in SADI carries with it information about what TYPE of data it is, and how that data relates to other data (e.g. genes transcribed into transcripts translated into proteins which regulate genes: Gene, Transcript, Protein are all data types, and "transcribed", "translated", "regulate" are relationships between them). With this explicit (and extensive!) grounding in semantics, we can start asking our machines to do a lot of the interpretation for us. For example, "what gene regulates gene X" is a nonsensical question biologically, but it's a question that biologists ask all the time! With a solid grounding in semantics, the machine would be able to follow the logical pathway above and say "well, to answer that question, I am going to have to go through transcripts and proteins to get there" and then automatically construct the pipeline of services that get to the answer. This is just one example of how Semantics can be used to facilitate question-answering.
There are several tutorials available.
for what it can do: http://www.slideshare.net/markmoby/sadi-swsip-09
then go to http://sadiframework.org to find the more specific tutorials on how to deploy services.
The current list of available services is at http://sadiframework.org/registry/services/ and that list will be growing rapidly over the next year (we have committed to having at least 400 more services, but I suspect that we'll go far beyond that number!)