1 / 77

The Art of Building Useful Ontologies

The Art of Building Useful Ontologies. Barry Smith. where in the body ? where in the cell ?. where in the body ? where in the cell ?. what kind of organism ?. where in the body ? where in the cell ?. what kind of organism ?. what kind of disease process ?.

Télécharger la présentation

The Art of Building Useful Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Art of Building Useful Ontologies Barry Smith http://ontologist.com

  2. http://ontologist.com

  3. http://ontologist.com

  4. where in the body ? where in the cell ? http://ontologist.com

  5. where in the body ? where in the cell ? what kind of organism ? http://ontologist.com

  6. where in the body ? where in the cell ? what kind of organism ? what kind of disease process ? http://ontologist.com

  7. how create broad-coverage semantic annotation systems for biomedicine? • Unified Medical Language System, Semantic Web, ontowiki, ... • let a million flowers bloom, • and rely for integration on post hoc mappings problem: what to do with weeds ? problem: how support reasoning across the annotated data? http://ontologist.com

  8. for science an alternative approach • based on prospective standardization designed to support annotation of data in ways which will be able to support reasoning with this data http://ontologist.com

  9. The OBO Foundry • a family of interoperable gold standard biomedical reference ontologies built around the Gene Ontology at its core • http://obofoundry.org http://ontologist.com

  10. A prospective standard designed to guarantee interoperability of ontologies from the very start (and to keep out weeds) initial set of 10 criteria tested in the annotation of • scientific literature • model organism databases • life science experimental results http://ontologist.com

  11. Ontology • Scope • URL • Custodians • Cell Ontology • (CL) • cell types from prokaryotes • to mammals • obo.sourceforge.net/cgi- • bin/detail.cgi?cell • Jonathan Bard, Michael • Ashburner, Oliver Hofman • Chemical Entities of Bio- • logical Interest (ChEBI) • molecular entities • ebi.ac.uk/chebi • Paula Dematos, • Rafael Alcantara • Common Anatomy Refer- • ence Ontology (CARO) • anatomical structures in • human and model organisms • (under development) • Melissa Haendel, Terry • Hayamizu, Cornelius Rosse, • David Sutherland, • Foundational Model of Anatomy (FMA) • structure of the human body • fma.biostr.washington. • edu • JLV Mejino Jr., • Cornelius Rosse • Functional Genomics • Investigation Ontology • (FuGO) • design, protocol, data • instrumentation, and analysis • fugo.sf.net • FuGO Working Group • Gene Ontology • (GO) • cellular components, • molecular functions, • biological processes • www.geneontology.org • Gene Ontology Consortium • Phenotypic Quality • Ontology • (PaTO) • qualities of biomedical entities • obo.sourceforge.net/cgi • -bin/ detail.cgi? • attribute_and_value • Michael Ashburner, Suzanna • Lewis, Georgios Gkoutos • Protein Ontology • (PrO) • protein types and • modifications • (under development) • Protein Ontology Consortium • Relation Ontology (RO) • relations • obo.sf.net/relationship • Barry Smith, Chris Mungall • RNA Ontology • (RnaO) • three-dimensional RNA • structures • (under development) • RNA Ontology Consortium • Sequence Ontology • (SO) • properties and features of • nucleic sequences • song.sf.net • Karen Eilbeck http://ontologist.com

  12. Building out from the original GO http://ontologist.com

  13. Ontologies being built to satisfy Foundry principles ab initio • Clinical Trial Ontology (CTO) • Common Anatomy Reference Ontology (CARO, DB1 & DB2) • Mosquito Anatomy Ontology (MAO) • Ontology for Biomedical Investigations (OBI) • Phenotypic Quality Ontology (PATO, DB1 & DB2) • Protein Ontology (PRO) • Relation Ontology (RO) • RNA Ontology (RnaO) http://ontologist.com

  14. Foundry Ontologies in planning phase Biobank/Biorepository Ontology (BrO, part of OBI) Environment Ontology (EnvO) Fish Multi-Species Anatomy Ontology (funding received; no acronym yet) Infectious Disease Ontology (IDO) Mouse Adult Neurogenesis Ontology (MANGO) Xenopus Anatomy Ontology (XAO) http://ontologist.com

  15. OPENNESS: The ontology isopenand available to be used by all. • FORMAL LANGUAGE: The ontology is in, or can be instantiated in, a common formal language. • ORTHOGONALITY: The developers of the ontology agree in advance to collaboratewith developers of other OBO Foundry ontology where domains overlap. • CONVERGENCE: The developers agree to work torwards a single ontology for each domain. CRITERIA CRITERIA http://ontologist.com http://obofoundry.org/

  16. CRITERIA CRITERIA • UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. • IDENTIFIERS: The ontology possesses a unique identifierspace within OBO. • VERSIONING: The ontology provider has procedures for identifying distinct successive versions. • DEFINITIONS: The ontology includes textual definitions for all terms. http://ontologist.com http://obofoundry.org/

  17. CRITERIA • CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content. • DOCUMENTATION: The ontology is well-documented. • USERS: The ontology has a plurality of independent users. • COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology. http://ontologist.com http://obofoundry.org/

  18. ORTHOGONALITY • annotations can be additive • ontologies do not need to create tiny theories of anatomy or chemistry within themselves • modularity ensures division of labor amongst domain experts http://ontologist.com

  19. compare: legends for maps compare: legends for maps http://ontologist.com

  20. ontologies are legends for data http://ontologist.com

  21. natural language labels organized in a graph-theoretic structure, designed to make the data • cognitively accessible to human beings • algorithmically accessible to machines • linked up to other data resources because the same labels have been used http://ontologist.com

  22. The OBO Foundry Idea GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem http://ontologist.com

  23. The OBO Foundry Idea GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem http://ontologist.com

  24. Five bangs for your GO buck • science base • cross-species database integration (human, mouse, fly ...) • cross-granularity database integration • through links to the entities in biological reality •  semantic searchability links people to software http://ontologist.com

  25. Applications for which GO has already been used: • integrating genomic and proteomic information from different organisms • finding functional similarities in genes that are overexpressed or underexpressed in diseases and as we age • predicting the likelihood that a particular gene is involved in diseases that haven't yet been mapped to specific genes • verifying models of genetic, metabolic and gene product interaction networks http://ontologist.com

  26. Google: April 23, 2007 • ontology 14.80 Mill. • “Gene Ontology” 0.96 Mill. • “Dublin Core” Ontology 0.65 Mill. • SUMO Ontology 0.30 Mill. http://ontologist.com

  27. The OBO Foundry Idea GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem http://ontologist.com

  28. Reasons why GO has been successful • It is a system for prospective standardization built from the ground-up on the basis of what works and of real needs by domain specialists • It is built on the basis of community consensus but with considerable central leadership in imposition of best practice – authority is the only way to yield a coordinated system of interoperable ontologies • Subject to continuous update of content, documentation and formal architecture – updates every night • In such a way as to ensure backwards compatibility with prior annotations • Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution) http://ontologist.com

  29. GO has learned the lessons of successful cooperation • Clear documentation • The ontology terms chosen are already familiar • Fully open source (no secrets, thorough testing in manifold combinations with other ontologies) • Subjected to considerable third-party critique • Embraces simple rules and simple technology wherever possible, but in such a way as to create an evolutionary path to logical reasoning and integration http://ontologist.com

  30. Prospective standardization is a good thing • Prospective standardization is the only thing which will work in mission critical domains • Prospective standardization means that certain limits to tolerance must be imposed, authorities must be recognized http://ontologist.com

  31. Prospective standardization is a good thing http://ontologist.com

  32. But not every prospective standardization is a good thing • ISO 15926-2 • http://ontology.buffalo.edu/bfo/west.pdf • Proceedings of FOIS 2006 http://ontologist.com

  33. How Not to Build Useful Ontologies • ISO/FDIS 15926-2 • Lifecycle integration of process plant data including oil and gas production facilities http://ontologist.com

  34. Heh ! Let’s reinvent the wheel http://ontologist.com

  35. What it is ... rigorous 4D ontology a full ISO standard (2003) 201 entities in upper ontology some 50,000 entities in all limited axiomatisation significant industrial support from Oil and Process industries ... good for: integrating diverse information systems engineering applications applications involving time and space managing change integrating/analyzing mid-level ontologies What ISO 15926 Part 2 says http://ontologist.com

  36. 2006 NIST Upper Ontology Summit • March 14-15, 2006,Gaithersburg, MD • ISO 15926 proposed for general use as an upper level ontology – for ‘integrating diverse information systems’ and ‘integrating [and] analyzing mid-level ontologies’ without restriction. • Matthew West, “ISO 15926 – Integration of Lifecycle Data” http://ontolog.cim3.net­/file/work/UpperOntologySummit/UO-Summit-Meeting_20050315/UOS--west_ 20060315.ppt http://ontologist.com

  37. ISO ISO 15926 as foundation ISO 15926 Entity Types can be referenced in one subject area from another Common Objects Common Interest Properties Time Intentional Thing Total of 1546 entity types so far. Locations Products and Materials Subject Areas Agreements Project/Activity Organizations Buy/Sell Accounts General Management CRM Carrier Demand Process Areas Manufacture Transport Constraint Movement http://ontologist.com

  38. “The purpose of ISO 15926 is to provide a Lingua Franca for computer systems, thereby integrating the information produced by them.” http://ontologist.com

  39. The importance of consensus-based uptake • An ontology is like a telephone network: it is designed to support exchange of information. • Its value depends on the number of users who agree to adopt and to help maintain this common network • Thus it depends also on the existence of a straightforward learning path for new users, and of clear and easily accessible documentation. http://ontologist.com

  40. The importance of consensus-based uptake • is even greaterin the case of an upper level ontology • which is designed to support exchange of information about all subjects http://ontologist.com

  41. This is not a problem of money • Robust ontologies need to be thoroughly tested by being critically examined and pulled apart, and above all by being combined dynamically with other artifacts in real use cases à la GO • An upper ontology should not be proprietary http://ontologist.com

  42. Confusion of Data Models and Ontologies • Data Model • mass of plunger: an integer • location of plunger: a string • Ontology • mass of plunger: a quality (which can be measured) • location of plunger: a place (which can have a name) http://ontologist.com

  43. Is ISO 15926 an ontology or a data model? • I do not know http://ontologist.com

  44. Principle of intelligibility • an ontology that is advocated for general use should be understandable to its intended users. Its features should be explained in clear, simple English, extended where necessary with technical terms. http://ontologist.com

  45. First Great Mystery • Of the 201 terms included in the ISO 15926 upper-level ontology, 88 are of the form ‘class of X’, for example: class_of_composite_materialclass_of_compoundclass_of_dimension_for_shape class_of_featureclass_of_feature_whole_partclass_of_functional_objectclass_of_inanimate_physical_objectclass_of_indirect_connection http://ontologist.com

  46. Definition of ‘class’ • A <class> is a <thing> that is an understanding of the nature of things and that divides things into those which are members of the class and those which are not according to one or more criteria. • Example: ‘Centrifugal pump is a <class>’. http://ontologist.com

  47. What logic governs classes in ISO 15926? • Not, say, ZF set theory • but the theory of ‘non-well-founded sets’ devised for the special purposes of logical modeling of certain non-terminating computational processes; • allows sets to contain themselves, thereby generating infinitely descending chains of the form: • …A A  A  A  A  A  A  A  A  A http://ontologist.com

  48. The principle of simple tools • An ontology is an artifact created to support exchange of information; it is not the place to try out the latest new bits of mathematics you learned about last week http://ontologist.com

  49. But worse • ISO 15926 complicates its theory of classes by allowing classes with both actual and possible members: • ‘Although there is only one <class> that has no members, there can be a <class> that has no members in the actual world, but which does have members in other possible worlds.’ • No standard theory of modal logic is addressed by ISO 15926 http://ontologist.com

  50. The principle of re-using available resources • if an ontology deals with what is dealt with perfectly well already in some recognized resource, then it should utilize this recognized resource. http://ontologist.com

More Related