770 likes | 842 Vues
The Art of Building Useful Ontologies. Barry Smith. where in the body ? where in the cell ?. where in the body ? where in the cell ?. what kind of organism ?. where in the body ? where in the cell ?. what kind of organism ?. what kind of disease process ?.
E N D
The Art of Building Useful Ontologies Barry Smith http://ontologist.com
where in the body ? where in the cell ? http://ontologist.com
where in the body ? where in the cell ? what kind of organism ? http://ontologist.com
where in the body ? where in the cell ? what kind of organism ? what kind of disease process ? http://ontologist.com
how create broad-coverage semantic annotation systems for biomedicine? • Unified Medical Language System, Semantic Web, ontowiki, ... • let a million flowers bloom, • and rely for integration on post hoc mappings problem: what to do with weeds ? problem: how support reasoning across the annotated data? http://ontologist.com
for science an alternative approach • based on prospective standardization designed to support annotation of data in ways which will be able to support reasoning with this data http://ontologist.com
The OBO Foundry • a family of interoperable gold standard biomedical reference ontologies built around the Gene Ontology at its core • http://obofoundry.org http://ontologist.com
A prospective standard designed to guarantee interoperability of ontologies from the very start (and to keep out weeds) initial set of 10 criteria tested in the annotation of • scientific literature • model organism databases • life science experimental results http://ontologist.com
Ontology • Scope • URL • Custodians • Cell Ontology • (CL) • cell types from prokaryotes • to mammals • obo.sourceforge.net/cgi- • bin/detail.cgi?cell • Jonathan Bard, Michael • Ashburner, Oliver Hofman • Chemical Entities of Bio- • logical Interest (ChEBI) • molecular entities • ebi.ac.uk/chebi • Paula Dematos, • Rafael Alcantara • Common Anatomy Refer- • ence Ontology (CARO) • anatomical structures in • human and model organisms • (under development) • Melissa Haendel, Terry • Hayamizu, Cornelius Rosse, • David Sutherland, • Foundational Model of Anatomy (FMA) • structure of the human body • fma.biostr.washington. • edu • JLV Mejino Jr., • Cornelius Rosse • Functional Genomics • Investigation Ontology • (FuGO) • design, protocol, data • instrumentation, and analysis • fugo.sf.net • FuGO Working Group • Gene Ontology • (GO) • cellular components, • molecular functions, • biological processes • www.geneontology.org • Gene Ontology Consortium • Phenotypic Quality • Ontology • (PaTO) • qualities of biomedical entities • obo.sourceforge.net/cgi • -bin/ detail.cgi? • attribute_and_value • Michael Ashburner, Suzanna • Lewis, Georgios Gkoutos • Protein Ontology • (PrO) • protein types and • modifications • (under development) • Protein Ontology Consortium • Relation Ontology (RO) • relations • obo.sf.net/relationship • Barry Smith, Chris Mungall • RNA Ontology • (RnaO) • three-dimensional RNA • structures • (under development) • RNA Ontology Consortium • Sequence Ontology • (SO) • properties and features of • nucleic sequences • song.sf.net • Karen Eilbeck http://ontologist.com
Building out from the original GO http://ontologist.com
Ontologies being built to satisfy Foundry principles ab initio • Clinical Trial Ontology (CTO) • Common Anatomy Reference Ontology (CARO, DB1 & DB2) • Mosquito Anatomy Ontology (MAO) • Ontology for Biomedical Investigations (OBI) • Phenotypic Quality Ontology (PATO, DB1 & DB2) • Protein Ontology (PRO) • Relation Ontology (RO) • RNA Ontology (RnaO) http://ontologist.com
Foundry Ontologies in planning phase Biobank/Biorepository Ontology (BrO, part of OBI) Environment Ontology (EnvO) Fish Multi-Species Anatomy Ontology (funding received; no acronym yet) Infectious Disease Ontology (IDO) Mouse Adult Neurogenesis Ontology (MANGO) Xenopus Anatomy Ontology (XAO) http://ontologist.com
OPENNESS: The ontology isopenand available to be used by all. • FORMAL LANGUAGE: The ontology is in, or can be instantiated in, a common formal language. • ORTHOGONALITY: The developers of the ontology agree in advance to collaboratewith developers of other OBO Foundry ontology where domains overlap. • CONVERGENCE: The developers agree to work torwards a single ontology for each domain. CRITERIA CRITERIA http://ontologist.com http://obofoundry.org/
CRITERIA CRITERIA • UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. • IDENTIFIERS: The ontology possesses a unique identifierspace within OBO. • VERSIONING: The ontology provider has procedures for identifying distinct successive versions. • DEFINITIONS: The ontology includes textual definitions for all terms. http://ontologist.com http://obofoundry.org/
CRITERIA • CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content. • DOCUMENTATION: The ontology is well-documented. • USERS: The ontology has a plurality of independent users. • COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology. http://ontologist.com http://obofoundry.org/
ORTHOGONALITY • annotations can be additive • ontologies do not need to create tiny theories of anatomy or chemistry within themselves • modularity ensures division of labor amongst domain experts http://ontologist.com
compare: legends for maps compare: legends for maps http://ontologist.com
ontologies are legends for data http://ontologist.com
natural language labels organized in a graph-theoretic structure, designed to make the data • cognitively accessible to human beings • algorithmically accessible to machines • linked up to other data resources because the same labels have been used http://ontologist.com
The OBO Foundry Idea GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem http://ontologist.com
The OBO Foundry Idea GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem http://ontologist.com
Five bangs for your GO buck • science base • cross-species database integration (human, mouse, fly ...) • cross-granularity database integration • through links to the entities in biological reality • semantic searchability links people to software http://ontologist.com
Applications for which GO has already been used: • integrating genomic and proteomic information from different organisms • finding functional similarities in genes that are overexpressed or underexpressed in diseases and as we age • predicting the likelihood that a particular gene is involved in diseases that haven't yet been mapped to specific genes • verifying models of genetic, metabolic and gene product interaction networks http://ontologist.com
Google: April 23, 2007 • ontology 14.80 Mill. • “Gene Ontology” 0.96 Mill. • “Dublin Core” Ontology 0.65 Mill. • SUMO Ontology 0.30 Mill. http://ontologist.com
The OBO Foundry Idea GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem http://ontologist.com
Reasons why GO has been successful • It is a system for prospective standardization built from the ground-up on the basis of what works and of real needs by domain specialists • It is built on the basis of community consensus but with considerable central leadership in imposition of best practice – authority is the only way to yield a coordinated system of interoperable ontologies • Subject to continuous update of content, documentation and formal architecture – updates every night • In such a way as to ensure backwards compatibility with prior annotations • Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution) http://ontologist.com
GO has learned the lessons of successful cooperation • Clear documentation • The ontology terms chosen are already familiar • Fully open source (no secrets, thorough testing in manifold combinations with other ontologies) • Subjected to considerable third-party critique • Embraces simple rules and simple technology wherever possible, but in such a way as to create an evolutionary path to logical reasoning and integration http://ontologist.com
Prospective standardization is a good thing • Prospective standardization is the only thing which will work in mission critical domains • Prospective standardization means that certain limits to tolerance must be imposed, authorities must be recognized http://ontologist.com
Prospective standardization is a good thing http://ontologist.com
But not every prospective standardization is a good thing • ISO 15926-2 • http://ontology.buffalo.edu/bfo/west.pdf • Proceedings of FOIS 2006 http://ontologist.com
How Not to Build Useful Ontologies • ISO/FDIS 15926-2 • Lifecycle integration of process plant data including oil and gas production facilities http://ontologist.com
Heh ! Let’s reinvent the wheel http://ontologist.com
What it is ... rigorous 4D ontology a full ISO standard (2003) 201 entities in upper ontology some 50,000 entities in all limited axiomatisation significant industrial support from Oil and Process industries ... good for: integrating diverse information systems engineering applications applications involving time and space managing change integrating/analyzing mid-level ontologies What ISO 15926 Part 2 says http://ontologist.com
2006 NIST Upper Ontology Summit • March 14-15, 2006,Gaithersburg, MD • ISO 15926 proposed for general use as an upper level ontology – for ‘integrating diverse information systems’ and ‘integrating [and] analyzing mid-level ontologies’ without restriction. • Matthew West, “ISO 15926 – Integration of Lifecycle Data” http://ontolog.cim3.net/file/work/UpperOntologySummit/UO-Summit-Meeting_20050315/UOS--west_ 20060315.ppt http://ontologist.com
ISO ISO 15926 as foundation ISO 15926 Entity Types can be referenced in one subject area from another Common Objects Common Interest Properties Time Intentional Thing Total of 1546 entity types so far. Locations Products and Materials Subject Areas Agreements Project/Activity Organizations Buy/Sell Accounts General Management CRM Carrier Demand Process Areas Manufacture Transport Constraint Movement http://ontologist.com
“The purpose of ISO 15926 is to provide a Lingua Franca for computer systems, thereby integrating the information produced by them.” http://ontologist.com
The importance of consensus-based uptake • An ontology is like a telephone network: it is designed to support exchange of information. • Its value depends on the number of users who agree to adopt and to help maintain this common network • Thus it depends also on the existence of a straightforward learning path for new users, and of clear and easily accessible documentation. http://ontologist.com
The importance of consensus-based uptake • is even greaterin the case of an upper level ontology • which is designed to support exchange of information about all subjects http://ontologist.com
This is not a problem of money • Robust ontologies need to be thoroughly tested by being critically examined and pulled apart, and above all by being combined dynamically with other artifacts in real use cases à la GO • An upper ontology should not be proprietary http://ontologist.com
Confusion of Data Models and Ontologies • Data Model • mass of plunger: an integer • location of plunger: a string • Ontology • mass of plunger: a quality (which can be measured) • location of plunger: a place (which can have a name) http://ontologist.com
Is ISO 15926 an ontology or a data model? • I do not know http://ontologist.com
Principle of intelligibility • an ontology that is advocated for general use should be understandable to its intended users. Its features should be explained in clear, simple English, extended where necessary with technical terms. http://ontologist.com
First Great Mystery • Of the 201 terms included in the ISO 15926 upper-level ontology, 88 are of the form ‘class of X’, for example: class_of_composite_materialclass_of_compoundclass_of_dimension_for_shape class_of_featureclass_of_feature_whole_partclass_of_functional_objectclass_of_inanimate_physical_objectclass_of_indirect_connection http://ontologist.com
Definition of ‘class’ • A <class> is a <thing> that is an understanding of the nature of things and that divides things into those which are members of the class and those which are not according to one or more criteria. • Example: ‘Centrifugal pump is a <class>’. http://ontologist.com
What logic governs classes in ISO 15926? • Not, say, ZF set theory • but the theory of ‘non-well-founded sets’ devised for the special purposes of logical modeling of certain non-terminating computational processes; • allows sets to contain themselves, thereby generating infinitely descending chains of the form: • …A A A A A A A A A A http://ontologist.com
The principle of simple tools • An ontology is an artifact created to support exchange of information; it is not the place to try out the latest new bits of mathematics you learned about last week http://ontologist.com
But worse • ISO 15926 complicates its theory of classes by allowing classes with both actual and possible members: • ‘Although there is only one <class> that has no members, there can be a <class> that has no members in the actual world, but which does have members in other possible worlds.’ • No standard theory of modal logic is addressed by ISO 15926 http://ontologist.com
The principle of re-using available resources • if an ontology deals with what is dealt with perfectly well already in some recognized resource, then it should utilize this recognized resource. http://ontologist.com