1 / 139

Emily Dimmer edimmer@ebi.ac.uk GOA group European Bioinformatics Institute

Gene Ontology (GO). Emily Dimmer edimmer@ebi.ac.uk GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK. GO Tutorial Outline:. Introduction to GO Description of the GO ontologies How groups annotate to GO Practical:

judah
Télécharger la présentation

Emily Dimmer edimmer@ebi.ac.uk GOA group European Bioinformatics Institute

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene Ontology (GO) Emily Dimmer edimmer@ebi.ac.uk GOA group European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK

  2. GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim

  3. GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim

  4. GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim

  5. GO Tutorial Outline: • Introduction to GO • Description of the GO ontologies • How groups annotate to GO • Practical: • Investigating the GO and OBO web sites • Browsing the GO using the AmiGO Browser. • Open Biomedical Ontologies • How GO is being used • Available Tools • GO slims • Practical: • Creating your own GO slim

  6. Why is GO needed ? THE PROBLEM: • Huge body of knowledge with an extremely large vocabulary to describe it • Vocabulary used is poorly defined • i.e. one word can have different meanings • or different names for the same concept • Biological systems are complex and our knowledge of such systems is incomplete RESULT: Large databases which are difficult to manage and impossible to mine computationally

  7. What is GO? • A (part of the) solution: • GO: • “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing”

  8. What can scientists do with GO? • Access gene product functional information • Provide a link between biological knowledge and … • gene expression profiles • proteomics data • Find how much of a proteome is involved in a process/ function/ component in the cell • using a GO-Slim • (a slimmed down version of GO to summarize biological attributes of a proteome) • Map GO terms and incorporate manual GOA annotationinto own databases • to enhance your dataset • or to validate automated ways of deriving information about gene function (text-mining).

  9. Taction Tactition Tactile sense ?

  10. Tactile sense Taction Tactition perception of touch ; GO:0050975

  11. GOThree (Orthogonal) Ontologies • Molecular Function:elemental activity or task • e.g. DNA binding, catalysis of a reaction • Biological Process: broad objective or goal • e.g. mitosis, signal transduction, metabolism • Cellular Component: location or complex • e.g. nucleus, ribosome

  12. GOThree (Orthogonal) Ontologies • Molecular Function: elemental activity or task • e.g. DNA binding, catalysis of a reaction • Biological Process: broad objective or goal • e.g. mitosis, signal transduction, metabolism • Cellular Component: location or complex • e.g. nucleus, ribosome

  13. GOThree (Orthogonal) Ontologies • Molecular Function: elemental activity or task • e.g. DNA binding, catalysis of a reaction • Biological Process: broad objective or goal • e.g. mitosis, signal transduction, metabolism • Cellular Component: location or complex • e.g. nucleus, ribosome

  14. GOThree (Orthogonal) Ontologies • Molecular Function: elemental activity or task • e.g. DNA binding, catalysis of a reaction • Biological Process: broad objective or goal • e.g. mitosis, signal transduction, metabolism • Cellular Component: location or complex • e.g. nucleus, ribosome

  15. How does GO work? • Provides a standard, species-neutral way of representing biology • GO covers ‘normal’ functions and processes • No pathological processes • No experimental conditions

  16. Content of GO • Molecular Function 7,493 terms • Biological Process 9,640terms • Cellular Component1,634 terms • Total 18,767 terms • Definitions: 16,696 (93.9 %)

  17. What is GO? • NOT a system of nomenclature or a list of gene products • GO doesn’t attempt to cover all aspects of biology or evolutionary relationships Open Biomedical Ontologies http://obo.sourceforge.net • NOT a dictated standard • NOT a way to unify databases

  18. http://www.geneontology.org Reactome

  19. Anatomy of a GO term • GO terms are composed of: • Term name • Unique GO ID • Definition (93 % of GO terms are defined) • Synonyms (optional) • Database references (optional) • Relationships to other GO terms

  20. I. The GO Ontologies Ontologies • “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing”(Gruber 1993)

  21. Ontology applications Can be used to: • Formalise the representation of biological knowledge • Describe a common and definedvocabulary for database annotation • Standardise database submissions • Provide unified access to information through ontology-based querying of databases, both human and computational • Improve management and integration of data within databases. • Facilitate data mining

  22. Ontology Structure node edge node node • Ontologies can be represented as graphs, where the vertices (nodes and leaves) are connected by edges. • The nodes are concepts in the ontology. • The edges are the relationships between the concepts

  23. Ontology Structure • The Gene Ontology is structured as a hierarchical directed acyclic graph (DAG). • Terms are linked by two relationships • is-a • part-of • Terms can have more than one parent

  24. Simple hierarchies Directed Acyclic (Trees) Graphs

  25. Directed Acyclic Graph cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of

  26. True Path Rule • The path from a child term all the way up to its top-level parent(s) must always be true cell • cytoplasm • chromosome • nuclear chromosome • nucleus • nuclear chromosome • is-a  • part-of 

  27. Obsolete Biological Process Obsolete Molecular Function Obsolete Cellular Component Ensuring Stability in a Dynamic Ontology • Terms become obsolete when they are removed or redefined • GO IDs are never deleted • For each term, a comment is added to explains why the term is now obsolete Biological Process Molecular Function Cellular Component

  28. Access to the Gene Ontology • Downloads • formats available: • OBO GO • XML OWL • MySQL • (http://www.geneontology.org/GO.downloads) • Web-based tools • AmiGO • (http://www.godatabase.org) • QuickGO • (http://www.ebi.ac.uk/ego)

  29. II. Annotating to GO • Use of GO terms to represent the activities and localizations of gene products. • Basic information needed: • 1.Database object (e.g. a protein or gene identifier) • e.g. Q9ARH1 • 2.Reference ID • e.g. PubMed ID: 12374299 • 3.GO term ID • e.g. GO:0004674 • 4.Evidence code • e.g. TAS

  30. GenNav: http://etbsun2.nlm.nih.gov:8000/perl/gennav.pl

  31. J. Clark et al. Plant Physiology 2005 (in press)

  32. Two types of GO Annotation:  Electronic Annotation  Manual Annotation • All annotations must: • be attributed to a source. • indicate what evidence was found to support the GO term-gene/protein association.

  33. Electronic Annotation • Provides large-coverage • High-quality • BUT annotations tend to use high-level GO terms and provide little detail.

  34. Electronic Annotation • Assignment of GO terms to gene products using existing information within database entries • Manual mapping of GO terms to concepts external to GO (‘translation tables’). • Proteins then electronically annotated with the relevant GO term(s). • Automatic sequence analyses to transfer annotations between highly similar gene products

  35. Fatty acid biosynthesis ( Swiss-Prot Keyword) EC:6.4.1.2 (EC number) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) MF_00527: Putative 3-methyladenine DNA glycosylase (HAMAP) GO:Fatty acid biosynthesis (GO:0006633) GO:acetyl-CoA carboxylaseactivity (GO:0003989) GO:acetyl-CoA carboxylase activity (GO:0003989) GO:DNA repair (GO:0006281) Electronic Annotation

  36. Mappings of external concepts to GO http://www.geneontology.org/GO.indices.shtml

  37. Evaluation of precision of annotation electronic techniques (InterPro2GO, SPKW2GO, EC2GO) • Compared manually-curated test set of GO annotated proteins with the electronic annotations • InterPro2GO = most coverage • EC2GO = 67 % of predictions exactly match the manual GO annotation. • 91-100 % of time the 3 mappings predicted GO terms within the same lineage Camon et al. BMC Bioinformatics 2005 in press

  38. Manual Annotation • High–quality, specific gene/gene product associations made, using: • Peer-reviewed papers • Evidence codes to grade evidence BUT – is very time consuming and requires trained biologists

  39. Finding GO terms …for B. napus PERK1 protein (Q9ARH1) In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… serine/threonine kinase activity, integral membrane protein wound response PubMed ID: 12374299 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611

  40. GO Evidence Codes • IDA: • Enzyme assays • In vitro reconstitution (transcription) • Immunofluorescence • Cell fractionation *With column required Manually annotated • TAS: • In the literature source the original experiments referred to are traceable (referenced).

  41. GO Evidence Codes • additional needed identifier for annotations using certain evidence codes • IGI: • a gene identifier for the "other" gene involved in the interaction *With column required • IPI: • a gene or protein identifier for the "other" protein involved in the interaction Manually annotated • IC: • GO term from another annotation used as the basis of a curator inference

  42. …some extra things: • Annotation of a gene product to one ontology is independent from its annotation to other ontologies. • Terms reflecting a normal activity or location are only annotated to. • Usage of ‘unknown’ GO terms • (e.g. Molecular function unknown GO:0005554)

  43. …some extra things: Qualifier Information • A set of ‘Qualifier’ terms is also available to curators modify the interpretation of an annotation. • Allowable values: • 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. • 2. Contributes to • distinguishes between individual subunits functions and whole complex functions • (used with GO Function Ontology) • 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. • (used with GO Component Ontology)

  44. …some extra things: • The Qualifier column can be used to modify the interpretation of an annotation. • Allowable values: • 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. • 2. Contributes to • distinguishes between individual subunits functions and whole complex functions • (used with GO Function Ontology) • 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. • (used with GO Component Ontology)

  45. …some extra things: • The Qualifier column can be used to modify the interpretation of an annotation. • Allowable values: • 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. • 2. Contributes to • distinguishes between individual subunits functions and whole complex functions • (used with GO Function Ontology) • 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. • (used with GO Component Ontology)

  46. …some extra things: • The Qualifier column can be used to modify the interpretation of an annotation. • Allowable values: • 1. NOT • a gene product is not associated with the GO term • to document conflicting claims in the literature. • 2. Contributes to • distinguishes between individual subunit functions and whole complex functions • (used with GO Function Ontology) • 3. Colocalizes with • Transiently or peripherally associated with an organelle or complex • where the resolution of an assay is not accurate. • (used with GO Component Ontology)

  47. Accessing annotations to the Gene Ontology • 1. Downloads • Annotations – gene association files • Ontologies and annotations – MySQL and XML • 2. Web-based access • AmiGO • (http://www.godatabase.org) • QuickGO • (http://www.ebi.ac.uk/ego) • …among others…

  48. Gene Association File DB DB_Object_ID DB_Object_SymbolQualifierGOid DB:Reference EvidenceWith Aspect UniProt P06703 S106_HUMAN GO:0008083 GOA:spkw IEA F UniProt P06703 S106_HUMAN NOT GO:0007409 PMID:12152788 NAS P UniProt P06703 S106_HUMAN GO:0005515 PMID:12577318 IPI UniProt:P50995 F DB_Object_Name DB_Object_SynonymDB_Object_Type taxon Date Assigned by Calcyclin IPI00027463 protein taxon:9606 20040426 UniProt Calcyclin IPI00027463 protein taxon:9606 20030721 UniProt Calcyclin IPI00027463 protein taxon:9606 20030721 UniProt • via web (GO consortium page) • http://www.geneontology.org/GO.current.annotations.shtml

  49. http://www.geneontology.org/GO.current.annotations.shtml

  50. Summary • GO is still being developed and updated - it requires a serious and ongoing effort. • the biological community is involved • New model organism databases are joining the GO Consortium annotation effort

More Related