1 / 175

Semantics for Biodiversity

Semantics for Biodiversity. Barry Smith http://ontology.buffalo.edu/smith. A brief history of the Semantic Web. html demonstrated the power of the Web to allow sharing of information

Télécharger la présentation

Semantics for Biodiversity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantics for Biodiversity • Barry Smith • http://ontology.buffalo.edu/smith

  2. A brief history of the Semantic Web • html demonstrated the power of the Web to allow sharing of information • can we use semantic technology to create a Web 2.0 which would allow algorithmic reasoning with online information based on XLM, RDF and above all OWL (Web Ontology Language)? • can we use RDF and OWL to break down silos, and create useful integration of on-line data and information

  3. people tried, but the more they were successful, they more they failed OWL breaks down data silos via controlled vocabularies for the description of data dictionaries Unfortunately the very success of this approach led to the creation of multiple, new, semantic silos – because multiple ontologies are being created in ad hoc ways

  4. Ontology success stories, and some reasons for failure A fragment of the “Linked Open Data” in the biomedical domain

  5. What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet)

  6. What you get with ‘mappings’ HPO: all phenotypes (excess hair loss, duck feet ...) NCIT: all organisms

  7. What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar)

  8. What you get with ‘mappings’ all phenotypes (excess hair loss, duck feet) all organisms allose (a form of sugar) Acute Lymphoblastic Leukemia (A.L.L.)

  9. Mappings are hard They are fragile, and expensive to maintain Need a new authority to maintain, yielding new risk of forking The goal should be to minimize the need for mappings Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible

  10. Why should you care? • you need to create systems for data mining and text processing which will yield useful digitally coded output • if the codes you use are constantly in need of ad hoc repair huge resources will be wasted • relevant data will not be found • serious reasoning will be defeated from the start

  11. How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • where the number of ontologies needing to be linked is small • where links are stable • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

  12. Uses of ‘ontology’ in PubMed abstracts

  13. By far the most successful: GO (Gene Ontology)

  14. GO provides a controlled system of terms for use in annotating (describing, tagging) data • multi-species, multi-disciplinary, open source • contributing to the cumulativity of scientific results obtained by distinct research communities • compare use of kilograms, meters, seconds in formulating experimental results

  15. Hierarchical view representing relations between represented types

  16. Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura

  17. Reasons why GO has been successful • It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists • Based on community consensus • Updated every night • Clear versioning principles ensure backwards compatibility; prior annotations do not lose their value • Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though GO community still recommending caution)

  18. GO has learned the lessons of successful cooperation • Clear documentation • Fully open source (allows thorough testing in manifold combinations with other ontologies) • Subjected to considerable third-party critique • Rapid turnaround tracker and help desk • Usable also for education • The terms chosen are already familiar

  19. natural language labels to make the data cognitively accessible to human beings

  20. GO has been amazingly successful in overcoming the data balkanization problem but it covers only generic biological entities of three sorts: • cellular components • molecular functions • biological processes and it does not provide representations of diseases, symptoms, …

  21. Original OBO Foundry ontologies (Gene Ontology in yellow)

  22. environments are here Environment Ontology

  23. http://obofoundry.org

  24. http://obofoundry.org

  25. The OBO Foundry: a step-by-step, evidence-based approach to expand the GO • Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology • and agree in advance to collaboratewith developers of ontologies in adjacent domains. http://obofoundry.org

  26. OBO Foundry Principles • Common governance (coordinating editors) • Common training • Common architecture • simple shared top level ontology (Basic Formal Ontology) • shared Relation Ontology: www.obofoundry.org/ro

  27. Open Biomedical Ontologies Foundry Seeks to create high quality, validated terminology modules across all of the life sciences which will be • close to language use of experts • evidence-based • incorporate a strategy for motivating potential developers and users • revisable as science advances • modularity: one ontology for each domain

  28. Modularity • ensures • annotations can be additive • no need for mappings • division of labor amongst domain experts • high value of training in any given module • lessons learned in one module can benefit work on other modules • incentivization of those responsible for individual modules

  29. The Modular Approach • Create a small set of plug-and-play ontologies as stable monohierarchies with a high likelihood of being reused • Create ontologies incrementally • Reuse existing ontology resources • Use these ontologies incrementally in annotating heterogeneous data • Annotating = arms length approach; the data and data-models themselves remain as they are

  30. Logical standards can be only part of the solution OWL … bring benefits primarily on the side of syntax (language) What we need are standards on the semantics (content) side (via top-level ontologies), including standards for • top-level ontologies • common relations (part_of …) • relation of lower-level ontologies to each other and to the higher levels

  31. 120+ ontology projects using BFO http://www.ifomis.org/bfo/ • Open Biomedical Ontologies Foundry • Ontology for General Medical Science • eagle-I, VIVO, CTSAconnect • AstraZeneca • Elsevier

  32. How a common upper level ontology can help resist ontology chaos • something to teach • training (expertise) is portable • each new ontology you confront will be more easily understood at the level of content • and more easily criticized, error-checked • provides starting-point for domain-ontology development • provides platform for tool-building and innovations • lessons learned in building and using one ontology can potentially benefit other ontologies • promote shareability of data across discilinary and other boundaries

  33. Basic Formal Ontology (BFO) top level mid-level domain level OBO Foundry Modular Organization

  34. BFO A simple top-level ontology to support information integration in scientific research No overlap with domain ontologies (organism, person, society, information, …) Based on realism No abstracta Tested in many natural science domains

  35. Basic Formal Ontology Continuant Occurrent process, event Independent Continuant entity Dependent Continuant property property depends on bearer

  36. depends_on Continuant Occurrent process, event Independent Continuant thing Dependent Continuant property event depends on participant

  37. Basic Formal Ontology continuant occurrent biological processes independent continuant cellular component dependent continuant molecular function

  38. roles, qualities Continuant Occurrent process, event Independent Continuant Dependent Continuant Quality Disposition

  39. instance_of types Continuant Occurrent process, event Independent Continuant thing Dependent Continuant property .... ..... ....... instances

  40. RELATION TO TIME GRANULARITY rationale of OBO Foundry coverage

  41. Example: The Cell Ontology

  42. Example: Ontology for General Medical Science

  43. http://code.google.com/p/ogms/

  44. coronary heart disease in nature, no sharp boundaries here CHD in phase of early lesions and small fibrous plaques CHD in phase of asymptomatic (‘silent’) infarction CHD in phase of surface disruption of plaque unstable angina stable angina instantiates at t1 instantiates at t2 instantiates at t3 instantiates at t4 instantiates at t5 John’s coronary heart disease

  45. human in nature, no sharp boundaries here embryo fetus neonate infant child adult instantiates at t1 instantiates at t2 instantiates at t3 instantiates at t4 instantiates at t5 instantiates at t6 John

  46. A disease is a disposition produces bears realized_in etiological process disorder disposition pathological process produces diagnosis interpretive process signs & symptoms abnormal bodily features produces used_in recognized_as

  47. Cirrhosis - environmental exposure • Symptoms & Signs • used_in • Interpretive process • produces • Hypothesis - rule out cirrhosis • suggests • Laboratory tests • produces • Test results - elevated liver enzymes in serum • used_in • Interpretive process • produces • Result - diagnosis that patient X has a disorder that bears the disease cirrhosis • Etiological process - phenobarbitol-induced hepatic cell death • produces • Disorder - necrotic liver • bears • Disposition (disease) - cirrhosis • realized_in • Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death • produces • Abnormal bodily features • recognized_as • Symptoms - fatigue, anorexia • Signs - jaundice, splenomegaly

  48. Dispositions and Predispositions Some dispositions are predispositions to other dispositions.

  49. HNPCC - genetic pre-disposition • Etiological process - inheritance of a mutant mismatch repair gene • produces • Disorder - chromosome 3 with abnormal hMLH1 • bears • Disposition (disease) - Lynch syndrome • realized_in • Pathological process - abnormal repair of DNA mismatches • produces • Disorder - mutations in proto-oncogenes and tumor suppressor genes with microsatellite repeats (e.g. TGF-beta R2) • bears • Disposition (disease) - non-polyposis colon cancer • realized in • Symptoms (including pain)

  50. Ontology modules extending of OGMS Sleep Domain Ontology (SDO) Ontology of Medically Relevant Social Entities (OMRSE) Vital Sign Ontology (VSO) Mental Disease Ontology (MD) Neurological Disease Ontology (ND) Infectious Disease Ontology (IDO)

More Related