1 / 91

A biological ontology is:

A machine interpretable representation of some aspect of biological reality. A biological ontology is:. what kinds of things exist?. eye disc. sense organ. develops from. is_a. what are the relationships between these things?. eye. part_of. ommatidium.

Télécharger la présentation

A biological ontology is:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A machine interpretable representation of some aspect of biological reality A biological ontology is: • what kinds of things exist? eye disc sense organ develops from is_a • what are the relationships between these things? eye part_of ommatidium

  2. Three practical considerations • Acknowledge the substantive issues of sociology and operations to build community • Follow the principles of ontology construction hygiene • Annotation is the rate-limiting step. It can be facilitated by improved ontologies

  3. Meeting the goal: Drawing inferences PMID:5555 PMID:4444 Direct evidence Direct evidence ? SP:1234 SP:8723 SP:19345 A B C D Human Indirect evidence SP:48392 PMID:8976 B Xenopus Indirect evidence SP:48291 SP:38921 B C PMID:3924 Drosophila PMID:9550

  4. Overview • Following basic rules helps make better ontologies • We will work through the principles-based treatment of relations in ontologies, to show how ontologies can become more reliable and more powerful

  5. Why do we need rules for good ontology? • Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking) • Unintuitive rules for classification lead to entry errors (problematic links) • Facilitate training of curators • Overcome obstacles to alignment with other ontology and terminology systems • Enhance harvesting of content through automatic reasoning systems

  6. The Challenge of Univocity:People call the same thing by different names Taction Tactition Tactile sense ?

  7. Univocity: GO uses 1 term and many characterized synonyms Taction Tactition Tactile sense perception of touch ; GO:0050975

  8. = bud initiation = bud initiation = bud initiation The Challenge of Univocity: People use the same words to describe different things

  9. Bud initiation? How is a computer to know?

  10. = bud initiation sensu Dentists = bud initiation sensu yeasties = bud initiation sensu agronomists Univocity: GO adds “sensu” descriptors to discriminate among organisms

  11. The Importance of synonyms for utility:How do we represent the function of tRNA? • Biologically, what does the tRNA do? • Identifies the codon and inserts the amino acid in the growing polypeptide Molecular_function Triplet_codon amino acid adaptor activity GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis. Synonym: tRNA

  12. The Challenge of Positivity Some organelles are membrane-bound. A centrosome is not a membrane bound organelle, but it still may be considered an organelle.

  13. The Challenge of Positivity: Sometimes absence is a distinction in a Biologist’s mind non-membrane-bound organelle GO:0043228 membrane-bound organelle GO:0043227

  14. Positivity • Note the logical difference between • “non-membrane-bound organelle” and • “not a membrane-bound organelle” • The latter includes everything that is not a membrane bound organelle!

  15. The Challenge of Objectivity: Database users want to know if we don’t know anything (Exhaustiveness with respect to knowledge) We don’t know anything about the ligand that binds this type of GPCR We don’t know anything about a gene product with respect to these

  16. Objectivity • How can we use GO to annotate gene products when we know that we don’t have any information about them? • Currently GO has terms in each ontology to describe unknown • An alternative might be to annotate genes to root nodes and use an evidence code to describe that we have no data. • Similar strategies could be used for things like receptors where the ligand is unknown.

  17. Single Inheritance • GO has a lot of is_a diamonds • Some are due to incompleteness of the graph • Some are due to a mixture of dissimilar classes within the graph at the same level

  18. Is_a diamond in GO Process behavior locomotory behavior larval behavior larval locomotory behavior

  19. Is_a diamond in GO Function enzyme regulator activity enzyme activator activity GTPase regulator activity GTPase activator acivity

  20. Is_a diamond in GO Cellular Component organelle intracellular organelle non-membrane bound organelle non-membrane bound intracellular organelle

  21. Technically the diamonds are correct, but could be eliminated locomotory behavior larval behavior GTPase regulator activity enzyme activator activity non-membrane bound organelle intracellular organelle What do these pairs have in common?

  22. They are all differentiated from the parent term by a different factor locomotory behavior larval behavior Type of behavior vs. what is behaving GTPase regulator activity enzyme activator activity What is regulated vs. type of regulator non-membrane bound organelle intracellular organelle Type of organelle vs. location of organelle

  23. Insert an intermediate grouping term behavior behavior of a thing descriptive behavior locomotory behavior larval behavior larval locomotory behavior

  24. locomotory behavior larval behavior rhythmic behavior adult behavior Why insert terms that no one would use? behavior By the structure of this graph, locomotory behavior has the same relationship to larval behavior as to rhythmic behavior

  25. locomotory behavior larval behavior rhythmic behavior adult behavior Why insert terms that no one would use? behavior This type of single step differentiation of terms between levels would allow us to use distances between nodes and levels to compare similarity. Behavior of a thing Descriptive behavior But actually, locomotory behavior/rhythmic behavior and larval behavior/adult behavior group naturally

  26. GO Definitions A definition written by a biologist: necessary & sufficient conditions written definition (not computable) Graph structure: necessary conditions formal (computable)

  27. Relationships and definitions • The set of necessary conditions is determined by the graph • This can be considered a partial definition • Important considerations: • Placement in the graph- selecting parents • Appropriate relationships to different parents • True path violation

  28. The importance of relationships • Cyclin dependent protein kinase • Complex has a catalytic and a regulatory subunit • How do we represent these activities (function) in the ontology? • Do we need a new relationship type (regulates)? Molecular_function Catalytic activity Enzyme regulator activity protein kinase activity Protein kinase regulator activity protein Ser/Thr kinase activity Cyclin dependent protein kinase activity Cyclin dependent protein kinase regulator activity

  29. Relationships provide the computable definitions • The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia. • Differentia tells us what marks out instances of the defined class within the wider parent class.

  30. Structured definitions contain both genus and differentiae Essence = Genus + Differentiae neuron cell differentiation = Genus: differentiation (processes whereby a relatively unspecialized cell acquires the specialized features of..) Differentiae: acquires features of a neuron

  31. id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 Alignment of the Two Ontologies will permit the generation of consistent and complete definitions GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition

  32. Alignment of the Two Ontologies will permit the generation of consistent and complete definitions id: GO:0001649 name: osteoblast differentiation synonym: osteoblast cell differentiation genus: differentiation GO:0030154 (differentiation) differentium: acquires_features_of CL:0000062 (osteoblast) definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms

  33. Basis in Reality But, since GO is representing a science, GO actually represents paradigms. Therefore, it is essential that GO is able to change! • GO is designed by a consortium • As long as egos don’t get in the way, GO represents universals rather than concepts • Large-scale developments of the GO are a result of compromise • Gene Annotators have a large say in GO content • Annotators are experts in their fields • Annotators constantly read the scientific literature

  34. Classes and Instances • When should we create a new class as opposed to multiple annotations? • When the the biology represents a universal principal. Receptor signaling protein tyrosine kinase activity does not represent receptor signaling protein activity and tyrosine kinase activity independently.

  35. Consequences of inconsistencies • Hard to synchronize manually • can be automated • currently requires text mining • Inconsistent user-search results • Problems likely to resurface with other ontologies embedded in the GO • chemical, protein • multiple species-specific anatomical ontologies • OBO ontologies should inform GO

  36. Combinatorial annotation • Allow combinatorial annotations in appropriate context • phenotypic annotation • eg “tail finplacementventralized at stage pharyngula:prim25” • combinatorial GO annotation • eg “activation of MAPK in mesenchymal cell” • New AmiGO will allow for these annotations

  37. Animal disease models Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype

  38. Animal disease models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease) Humans Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease model)

  39. Animal disease models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease) Humans Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease model)

  40. Animal disease models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease) Humans Animal models Mutant Gene Mutant or missing ProteinMutant Phenotype (disease model)

  41. SHH-/+ SHH-/- shh-/+ shh-/-

  42. Phenotype (clinical sign) = entity + attribute

  43. Phenotype (clinical sign) = entity + attribute P1 = eye + hypoteloric

  44. Phenotype (clinical sign) = entity + attribute P1 = eye + hypoteloric P2 = midface + hypoplastic

  45. Phenotype (clinical sign) = entity + attribute P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied

  46. Phenotype (clinical sign) = entity + attribute P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied PATO: hypoteloric hypoplastic hypertrophied ZFIN: eye midface kidney +

  47. Phenotype (clinical sign) = entity + attribute Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process molecular function cellular component + PATO (phenotype and trait ontology)

  48. Phenotype (clinical sign) = entity + attribute P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied Syndrome = P1 + P2 + P3 (disease) = holoprosencephaly

  49. Human holo- prosencephaly Zebrafish shh Zebrafish oep

More Related