1 / 95

How to Build an Ontology

How to Build an Ontology. Barry Smith http://ontology.buffalo.edu/smith. Everywhere databases are being created. too often in such a way that the data is siloed leading to massive expense in integrating data in ad hoc ways

jalen
Télécharger la présentation

How to Build an Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Build an Ontology • Barry Smith • http://ontology.buffalo.edu/smith

  2. Everywhere databases are being created • too often in such a way that the data is siloed • leading to massive expense in integrating data in ad hoc ways • if the data could be collected on the basis of shared controlled vocabularies from the start, much of this massive expense could be avoided

  3. Uses of ‘ontology’ in PubMed abstracts

  4. By far the most successful: GO (Gene Ontology)

  5. Consequences of the Human Genome Project • we can match gene sequences very effectively, for example finding patterns shared between humans and mice • but we can make sense of these gene sequences only if we know • where in the cell they occur • with what molecular functions they are associated • to what biological processes they contribute

  6. GO provides a controlled system of terms for use in annotating (describing, tagging) data • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds in formulating experimental results

  7. Hierarchical view representing relations between represented types

  8. Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura

  9. US $100 mill. invested in literature and data curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO experimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO

  10. GO has learned the lessons of successful cooperation • Clear documentation • The terms chosen are already familiar • Fully open source (allows thorough testing in manifold combinations with other ontologies) • Subjected to constant third-party critique • Updated every night

  11. ontologies used to annotate databases GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem

  12. annotation using common ontologies yields integration of databases GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem

  13. annotation using common ontologies can yield integration of image data

  14. annotation using common ontologies can support comparison of image data

  15. annotation with Gene Ontology • supports reusability of data • supports search of data by humans • supports reasoning with data by humans and machines • but the method works only to the degree that many, many people use the GO to annotate their data

  16. GO has been amazingly successful in overcoming the data balkanization problem but it covers only generic biological entities of three sorts: • cellular components • molecular functions • biological processes and it does not provide representations of diseases, symptoms, …

  17. Original OBO Foundry ontologies (Gene Ontology in yellow)

  18. environments are here Environment Ontology

  19. order

  20. Ontology success stories, and some reasons for failure chaos

  21. http://obofoundry.org

  22. The OBO Foundry: a step-by-step, evidence-based approach to expand the GO • Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology • and agree in advance to collaboratewith developers of ontologies in adjacent domains. http://obofoundry.org

  23. OBO Foundry Principles • Common governance (coordinating editors) • Common training • Common architecture • simple shared top level ontology • shared Relation Ontology: www.obofoundry.org/ro

  24. Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura

  25. Open Biomedical Ontologies Foundry Seeks to create high quality, validated terminology modules across all of the life sciences which will be • one ontology for each domain, so no need for mappings • close to language use of experts • evidence-based • incorporate a strategy for motivating potential developers and users • revisable as science advances

  26. Benefits of coordination • Can profit from lessons learned through mistakes made by others • Can more easily reuse what is made by others • Can more easily inspect and criticize results of others’ work • Can more easily train people to do the necessary work

  27. BFO Top-Level Ontology Continuant Occurrent (always dependent on one or more independent continuants) Independent Continuant Dependent Continuant

  28. RELATION TO TIME GRANULARITY OBO Foundry coverage

  29. List of BFO users http://www.ifomis.org/bfo/users

  30. BFO Users

  31. How to build an ontology • import BFO into Protégé • work with domain experts to create an initial mid-level classification • find ~50 most commonly used terms corresponding to types in reality • arrange these terms into an informal is_a hierarchy according to the principle • A is_a B  every instance of A is an instance of B • fill in missing terms to give a complete hierarchy • (leave it to domain experts to populate the lower levels of the hierarchy)

  32. Example: The Cell Ontology

  33. Basic distinction among entities • type vs. instance • (science text vs. diary) • (human being vs. Tom Cruise) • (science diagram vs. photograph)

  34. Terms in ontologies denote types (‘universals’) • it is generalizations that are important = types, types, kinds, species

  35. Catalog vs. inventory

  36. types vs. instances

  37. names of instances

  38. names of types

  39. An ontology is a representation of types • We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories • experiments relate to what is particular science describes what is general

  40. object organism animal cat siamese types mammal frog instances

  41. Ontologies are here

  42. or here

  43. Ontologies represent general structures in reality (leg)

  44. Ontologies do not represent concepts in people’s heads

  45. They represent types in reality

  46. Inventory vs. Catalog:Two kinds of representational artifact • Databases represent instances • Ontologies represent types

  47. How do we know which general terms designate types? • Types are repeatables: • cell, electron, weapon, F16, citizen, refugee, ... • Instances are one-off: Bill Clinton, this laptop

  48. BFO Top-Level Ontology Continuant Occurrent (always dependent on one or more independent continuants) Independent Continuant Dependent Continuant

  49. Two kinds of entities • occurrents (processes, events, happenings) • continuants (objects, qualities, states...)

  50. You are a continuant • Your life is an occurrent • You are 3-dimensional • Your life is 4-dimensional

More Related