Scientific Ramblings

Wednesday, July 13, 2011

Sabbatical Interview

Earlier this year I was interviewed about my sabbatical plans, and we did a photo shoot as well. Have a look at the article, which includes my rabbits featured on the monitor screen :)

http://cualumni.carleton.ca/magazine/summer-2011/parting-shots/

Monday, July 4, 2011

Breaking Down the Sabbatical

It's hard to believe that it's already been 7 years since I started at Carleton University. I remember thinking about going on sabbatical - a major feature of becoming a University professor. Well, now the time has come, and along with the proposal , I've scheduled quite a bit of travel to visit with colleagues and to take some well deserved holiday time. So here's the tentative schedule

July 3-14: Cambridge, UK (EBI - Dietrich Rebholz-Schuhmann, University of Cambridge - Robert Hoehndorf)
July 14-18: Vienna, Austria (Bio-ontologies, ISMB Tutorial)
July 18-August 7: Malta+Italy (Valletta, Catania, Salerno, Pompeii, Rome, Florence, Venice, Bologna, Pisa)
August 7-10: Finland (Tempere, Helsinki)
August 10-18: Iceland
August 18-29: Kyoto + Tokyo, Japan (Biohackathon 2011)
August 29-September 3: Madrid (Ontology Engineering Group : Alexander De Leon)
September 3-7: Heidelberg, Germany (COMBINE)
September 7-12: Ottawa (defense: Leonid Chepelev -> success!)
September 13-17: Nancy, France (INRIA/LORIA: Adrien Coulet)
September 19: Volendam (OpenPHACTS/Gen2Phen meeting on open data)
September 17-October 1: Paris, St Malo, Bordeaux, London (travel with parents)
October 1-11: Ottawa, Canada (defense: Natalia Villaneuva-Rosales -> success!)
October 11-December 1: Conception, Chile (Universite de Conception : Leo Ferres)
December 1-December 23: Santiago, San Pedro de Atacama, Mendoza, Buenos Aires, Colonia, Punta del Diablo and Montevideo)
December 24-January 31: Toronto, Ottawa
February - March: Stanford university (Nigam Shah, Mark Musen)
April - June : India, Nepal, Bhutan, Thailand, Singapore ?

Sabbatical 2011-2012: Formalizing Scientific Discourse

Objective: The goal of my research program is to enable biologists to compose and evaluate scientific hypotheses using a diverse set of informational sources (ontology, database, text, equations, and web services). The purpose of my 2011-2012 sabbatical is to develop expertise in formalizing scientific discourse, with a particular focus on formalizing textual descriptions and mathematical equations such that they interoperate with knowledge represented in databases and structured documents. In particular, I am interested in using high quality facts derived from text and dynamic computation from formalized equations to answer questions and provide evidence for scientific hypotheses.

Background: Advancing knowledge in the life sciences involves experimentally testing hypotheses and interpreting the results based on prior scientific work. In generating a valid hypothesis, biologists face the overwhelming challenge of collecting, evaluating and integrating large and increasing amounts of different kinds of information about organisms, cells, genes and proteins from thousands of articles, hundreds of databases and dozens of tools. A biologist’s ability to efficiently construct and evaluate a hypothesis over current knowledge requires that i) knowledge, data and hypotheses are formally represented so they may be reasoned about, ii) adequate software exists to manage and query formal knowledge, and iii) data can be obtained by searching relevant databases and invoking the right analytical tools. The inability to efficiently discover relevant information can negatively impact scientific research directions and proposed activities. Methods for facilitating the construction and evaluation of hypotheses against the current state of knowledge could translate into greater scientific insight and increased productivity. Innovative approaches for knowledge discovery could be applied to data on the emerging Semantic Web and be transformative on a global scale.

Proposed Activities
1. Text to triples: The purpose of this leg of the sabbatical is to gain an understanding of the current state of the art in natural language processing and develop skill in producing high quality triples from parsing scientific text.
Time Frame: July 2011-September 2011
Location: European Bioinformatics Institute, Hinxton, UK.
Host: Dr. Rebholz-Schuhmann

2. Formalizing equations: The purpose of this leg of the sabbatical is to investigate the ontology of equations, represent scientific equations using Semantic Web technologies (principally the Rule Interchange Format), and implement semantic web services that serve to compute over formalized scientific equations.
Time Frame: October 2011-December 2011
Location: Universidad de Concepcion, Concepcion, Chile.
Host: Dr. Leo Ferres

3. Formalizing Research Hypotheses: The purpose of this leg of the sabbatical is to explore the formalization of hypotheses concerning disease. Specifically, I will extract meaningful facts from AlzForum and integrate these with resources from the National Centre for BioOntology (NCBO) and Bio2RDF, our large scale Semantic Web project.

Time Frame: January 2012-March 2012
Location: Stanford University, Palo Alto, California, USA.
Host: Dr. Mark Musen

4. Integrated Framework for Knowledge Discovery: The last leg of the sabbatical will be focused towards maximizing interoperability between text, equations, ontologies and database-derived facts. I will use SADI, our platform semantic web services framework, towards achieving this objective.
Time Frame: April 2012-June 2012
Location: India, Thailand, Singapore

Scientific Value and Broader Beneficial Impacts
The development and application of efficient strategies for knowledge discovery is a major goal in bioinformatics. My research into new strategies for the representation and evaluation of scientific hypotheses using ontologies, scientific text, data and bioinformatic services will create a novel platform that will significantly contribute to scientific productivity and ultimately improve our understanding of biology. The proposed sabbatical will provide me with new skills that will be used to train a future training of young scientists in the areas of formal knowledge representation, text mining and the Semantic Web. Ultimately, it is expected that the sabbatical will cultivate new partnerships with leading scientists and open new doors to work with industry and government agencies.

Friday, June 4, 2010

SADI

(modified from an email that Mark Wilkinson sent)

SADI is a very lightweight "standard" (set of best-practices, really) for modeling and providing Web Services. It uses standards from the W3C Semantic Web initiative - in particular, it uses OWL for types, and RDF for instance data.

SADI is used to expose "resources" to the world in a manner that can be discovered automatically, and accessed automatically, by software. Those resources might be data inside databases (where SADI replaces the traditional Web Query page), or they might be analytical algorithms that consume data, chug away on it, and return output data. In both cases, the interfaces are structurally identical, so from the perspective of the client software, it doesn't have to know or care whether it is trying to get data out of a database or out of an analytical tool - the question/query structure is the same, and moreover, it is completely predictable.

This is critical advantage #1 for SADI over traditional Web Services frameworks - in traditional XML-based Web Services, you still must code your client software to access each service, since the service interfaces cannot be interpreted by the machine. In SADI, we can design ONE piece of software to access all resources exposed as SADI services - "one ring to rule them all!". (and we already have several different "rings" that expose SADI data in different ways)

Critical advantage #2 is a bit more obscure and hard to describe, but is likely to be the more important in the long-run. In SADI, data is "grounded" in explicit semantics. This means that all data in SADI carries with it information about what TYPE of data it is, and how that data relates to other data (e.g. genes transcribed into transcripts translated into proteins which regulate genes: Gene, Transcript, Protein are all data types, and "transcribed", "translated", "regulate" are relationships between them). With this explicit (and extensive!) grounding in semantics, we can start asking our machines to do a lot of the interpretation for us. For example, "what gene regulates gene X" is a nonsensical question biologically, but it's a question that biologists ask all the time! With a solid grounding in semantics, the machine would be able to follow the logical pathway above and say "well, to answer that question, I am going to have to go through transcripts and proteins to get there" and then automatically construct the pipeline of services that get to the answer. This is just one example of how Semantics can be used to facilitate question-answering.

There are several tutorials available.

for what it can do: http://www.slideshare.net/markmoby/sadi-swsip-09

then go to http://sadiframework.org to find the more specific tutorials on how to deploy services.

The current list of available services is at http://sadiframework.org/registry/services/ and that list will be growing rapidly over the next year (we have committed to having at least 400 more services, but I suspect that we'll go far beyond that number!)

Friday, May 7, 2010

Getting SNORQL to work with Virtuoso

SNORQL is an AJAX SPARQL browser that makes it easy to i) see if your queries work and ii) navigate your linked data. SNORQL comes packaged with D2R server, but one has to make a few modifications to make it work when a) installed in a directory or port that is different than the SPARQL endpoint.

You need 3 things to do the following to make SNORQL work with some host-located endpoint:

1. Download and install SNORQL. SNORQL comes as part of the D2R distribution. Download this and extract the snorql folder from the webapps director into some folder on your host, preferably one that is already accessible by the web server (e.g. in the htdocs directory). If you want to put SNORQL in a folder different than that, you must add an entry to the http.conf file.

Alias /snorql /usr/var/snorql

Options None

AllowOverride None

Order allow,deny

Allow from all

</Directory>

2. Configure the Apache server as a proxy to the endpoint. If the port of the apache server and the endpoint are different, you need to make them appear the same for the AJAX to work. Add this to your http.conf file

ProxyRequests Off

Order deny,allow

Allow from all

</Proxy>

ProxyPass /sparql http://localhost:8890/sparql

ProxyPassReverse /sparql http://localhost:8890/sparql

3. Configure SNORQL to use the endpoint.

Edit the snorql.js file and replace

this._endpoint = document.location.href.match(/^([^?]*)snorql\//)[1] + 'sparql';

with

this._endpoint = 'http://localhost/sparql';

4. Open your browser to the SNORQL URL (e.g. http://localhost/snorql) to query and navigate the results :-)