OntoQuest: Exploring Ontological Data Made Easy

OntoQuest: Exploring Ontological Data Made Easy Authors: Li Chen, Maryann Martone, Amarnath Gupta, Lisa Fong, Mona Wong-Barnum

Background • Many application domains in the natural sciences are rapidly building ontologies • To attempt to standardize the vocabulary of their domains • To record known relationships that have been established from years of scientific research in the discipline • To use the ontology as the common framework to exchange, assimilate and compare information • Experimental data collected by research groups • Curated data compiled from the literature • To establish relationships with data and ontologies from other domains to achieve interoperability and information integration

The Problem / Requirement • Need a system • To explore the ontology itself • To relate the terms and relationships in an ontology to data sources • To explore multiple data sources as part of the ontology exploration process • To update the databases through the ontology exploration tool • To update the ontology and propagate the effects of the update to the mappings between data sources and the ontology

The Problem / Requirement • Need a system • To explore the ontology itself (OWL) • To relate the terms and relationships in an ontology to data sources (RDBMS, RDF, XML) • To explore multiple data sources as part of the ontology exploration process (instance inference) • To update the databases through the ontology exploration tool(instance Inference triggered by update) • To update the ontology and propagate the effects of the update to the mappings between data sources and the ontology (mapping change triggered by update)

OntoQuest • Ingests any OWL-expressed ontology • Uses IBM’s IODT tool (modified) to shred the OWL ontology to a schema • Instances of ontology classes may reside locally or accessed from remote sources • Provides the ability for ontology exploration • By traversal of any transitive relationship • By SPARQL queries • Allows data exploration through ontology classes • Allows single instance updates

OntoQuest Builds on IODT • Our system is developed on top of an IBM integrated ontology toolkit • implements a high performance ontology repository built on relational database • A subset of W3C’s OWL and SPARQL query language • Uses description logic reasoner for class-level inference and a set of logic rules translated from DLP for instance-level inference • Hence, inference completeness and soundness on DLP can be guaranteed • Back-end database schema design supports efficient querying and inference, performance superior compared to Jena, Sesame etc.

Biologist-Friendly GUI SKIL APIs Updater Query Mediator Reasoner Cache IBM ToolKit SQL SQL . .

System Development Facts • OntoQuest has a domain user friendly GUI and a library of customized APIs • Updater: enable inserting classes and instances incrementally into the ontology repository • Query Mediator: form user’s request as a query against the global view; decompose it into sub-queries in forms of SQL and SPARQL and send to CCDB and CKB; reassemble the results and render an appropriate view (e.g. graphic) for the user • Reasoner: execute rules to compute indirect class memberships and properties • Cache: further enhance the system efficiency by caching or prefetching frequent query results • The system is still under development – some of the functionalities are not completed or need to be improved • e.g., propagation of ontology updates

Data Integration with OntoQuest • For every class, • the ids of the instances of the class are tracked from the respective data stores and maintained locally • a mapping is used to fetch instances of the class from the relevant store to a local instance store on demand • only the properties that are associated with the ontology classes are retrieved in a GAV fashion • all other properties are obtained (for now) only allowing the user to query the data source directly

The Application Setting for this Demo • The Ontology • Developed by the neuroscientists in our group • describes the subcellular anatomy of the nervous system, including cell types and their subcellular properties and multicellular domains • The knowledge base was constructed as a directed graph using the open source tool Protégé (http://protege.stanford.edu), a freely available knowledge management tool written in Java. • The ontology is expressed in OWL-DL • Since OWL-DL supports description logic, inferences are made from the property rules • e.g., protein Kv3.2 is located in the plasma membrane; if an instance of axon terminal expresses Kv3.2, then it must have a plasma membrane. • Data Sources • A Derby data store for literature-curated instances of subcellular anatomy (CKB) • A relational (MySQL) source containing experimental data from CCDB

Property Component Morphometrics Organelle Shape Cytoskeleton Distribution Cilium Orientation Specialization Inclusion Plasma Membrane Cytoplasm Subcellular Ontology Subcellular Space Nerve Cell Multi-cellular Domain Intercellular Junction Extracellular Space Synaptic Cleft Glomerulus Neuropil Node of Ranvier Pinceau Neuron Glia Compartment Dendrite Axon Cell body Spine Microglia Macroglia LEGEND Molecule Compartment subclass has-a Dendritic Spine Shaft Compartment Component Property Morphometrics Post synaptic SER Shape Actin Filament Component Distribution Ribosome PSD Orientation

Demo Scenarios

Step 0: startup screen

Step 1: click to show subclass hierarchy by default

Step 1: other options for expanding different types of hierarchies e.g., the compartment types for Neuroepithelial_Cell and those for Neuron

Step 2: get the detailed info (instances and properties) of the subclass Dendrite of Neuron_Compartment

Step 2a: accessing the property values for the selected class

Step 2b: the CCDB image page corresponding to the selected instance Dendritic_Tree_1 is shown here

Step 2’: some concept (like Cellular_Dependent_Continuant here) has properties but no instances in CKB

Step 3: right click on a concept in the hierarchy pops up a list of view functions to choose from

Step 4: aggregate the has_Component values of all Dendrite instances; the last row shows statistics summary You may also have noticed that instances of Dendrite include those of its subclasses (such as Dendrite_Tree)

Step 5: drill down to view instances of Dendrite_Tree, aggregate on several numeric type of property values

SPARQL Query

Add an Instnace

Edit Instance Properties

Ontology Store Properties

“Rules” for cellular assembly • What are the cellular components of a dendrite? • 29 instances of dendrite • 1. Microtubules • 2. Mitochondria • 3. Hypolemmal cisternae • 4. Plasma membrane • 5. Smooth endoplasmic reticulum • 6. Rough endoplasmic reticulum • 7. Polyribosomes • 8. Neurofilaments • Average diameter = 3.2 um • Average length = 150 um • How many dendrites does a Purkinje cell have? • 3 instances of Purkinje cell dendritic tree • 1. Avg branch order = 22 • 2. Number of primary dendrites = 1.3 • 3. Avg number of branches = 760 **Computes aggregate properties from instances

OntoQuest: Exploring Ontological Data Made Easy