Background: Advancing knowledge in the life sciences involves experimentally testing hypotheses and interpreting the results based on prior scientific work. In generating a valid hypothesis, biologists face the overwhelming challenge of collecting, evaluating and integrating large and increasing amounts of different kinds of information about organisms, cells, genes and proteins from thousands of articles, hundreds of databases and dozens of tools. A biologist’s ability to efficiently construct and evaluate a hypothesis over current knowledge requires that i) knowledge, data and hypotheses are formally represented so they may be reasoned about, ii) adequate software exists to manage and query formal knowledge, and iii) data can be obtained by searching relevant databases and invoking the right analytical tools. The inability to efficiently discover relevant information can negatively impact scientific research directions and proposed activities. Methods for facilitating the construction and evaluation of hypotheses against the current state of knowledge could translate into greater scientific insight and increased productivity. Innovative approaches for knowledge discovery could be applied to data on the emerging Semantic Web and be transformative on a global scale.
1. Text to triples: The purpose of this leg of the sabbatical is to gain an understanding of the current state of the art in natural language processing and develop skill in producing high quality triples from parsing scientific text.
Time Frame: July 2011-September 2011
Location: European Bioinformatics Institute, Hinxton, UK.
Host: Dr. Rebholz-Schuhmann
2. Formalizing equations: The purpose of this leg of the sabbatical is to investigate the ontology of equations, represent scientific equations using Semantic Web technologies (principally the Rule Interchange Format), and implement semantic web services that serve to compute over formalized scientific equations.
Time Frame: October 2011-December 2011
Location: Universidad de Concepcion, Concepcion, Chile.
Host: Dr. Leo Ferres
3. Formalizing Research Hypotheses: The purpose of this leg of the sabbatical is to explore the formalization of hypotheses concerning disease. Specifically, I will extract meaningful facts from AlzForum and integrate these with resources from the National Centre for BioOntology (NCBO) and Bio2RDF, our large scale Semantic Web project.
Time Frame: January 2012-March 2012
Location: Stanford University, Palo Alto, California, USA.
Host: Dr. Mark Musen
4. Integrated Framework for Knowledge Discovery: The last leg of the sabbatical will be focused towards maximizing interoperability between text, equations, ontologies and database-derived facts. I will use SADI, our platform semantic web services framework, towards achieving this objective.
Time Frame: April 2012-June 2012
Location: India, Thailand, Singapore
Scientific Value and Broader Beneficial Impacts
The development and application of efficient strategies for knowledge discovery is a major goal in bioinformatics. My research into new strategies for the representation and evaluation of scientific hypotheses using ontologies, scientific text, data and bioinformatic services will create a novel platform that will significantly contribute to scientific productivity and ultimately improve our understanding of biology. The proposed sabbatical will provide me with new skills that will be used to train a future training of young scientists in the areas of formal knowledge representation, text mining and the Semantic Web. Ultimately, it is expected that the sabbatical will cultivate new partnerships with leading scientists and open new doors to work with industry and government agencies.