1 / 22

The Gene Ontology and its insertion into UMLS

Learn about the Gene Ontology (GO), a set of three structured vocabularies that provide functional annotation of gene products. Discover how GO is dynamically cross-referenced to external databases. Find out how GO terms can be inserted into the Unified Medical Language System (UMLS) to expand its biomedical meaning and improve information retrieval.

eferrara
Télécharger la présentation

The Gene Ontology and its insertion into UMLS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Gene Ontology and its insertion into UMLS Jane Lomax

  2. The Gene Ontology • Set of three structured vocabularies • Provide functional annotation of gene products • Dynamic • Cross-references to external databases

  3. The vocabularies • Molecular function — elemental activity or task • Biological process — broad objective or goal • Cellular component — location or complex

  4. The vocabularies • Molecular function — elemental activity or task • nuclease, DNA binding, microtubule motor • Biological process — broad objective or goal • Cellular component — location or complex

  5. The vocabularies • Molecular function — elemental activity or task • nuclease, DNA binding, microtubule motor • Biological process — broad objective or goal • mitosis, signal transduction, metabolism • Cellular component — location or complex

  6. The vocabularies • Molecular function — elemental activity or task • nuclease, DNA binding, microtubule motor • Biological process — broad objective or goal • mitosis, signal transduction, metabolism • Cellular component — location or complex • nucleus, ribosome

  7. GO structure • Directed acyclic graph (DAG) • Allows multiple parentage

  8. True-path rule • Every path from a node back to the root must be biologically accurate

  9. Relationship types • is_a • subclass: a is a type of b • part_of • physical part of (component) • sub-process of (process)

  10. What makes up a GO term? • term name • go_id • definition and definition dbxref • GO synonym • general dbxref • comment

  11. GO cross-links • Cross-references within GO • EC • RESID • MetaCyc • Mappings • SWISS-PROT keywords • Links in other databases • InterPro • UMLS/MeSH – in progress

  12. Why insert GO into UMLS? • A rich, widely used source for expanding UMLS • Can be used to improve areas of MeSH • Potential for ‘non-fuzzy’ text mining using GO terms • MeSH terms manually assigned to papers

  13. Unified Medical LanguageSystem (UMLS) • Research project maintained by the National Library of Medicine (NLM) • Aims to • allow computers to ‘understand’ biomedical meaning • improve retrieval and integration of computer readable info • Has three ‘Knowledge sources’: • UMLS Metathesaurus • SPECIALIST lexicon • semantic network

  14. Knowledge sources • UMLS Metathesaurus • links multiple source vocabularies into unified concepts, includes MeSH (Medical Subject Headings) • GO to become source vocabulary • SPECIALIST lexicon • provides biomedical/English lexical info • semantic network • for categorizing concepts

  15. Inserting GO into UMLS • inversion • converting GO to correct format for UMLS • insertion • inserting GO using matching algorithms • editing • all concepts containing GO term reviewed by hand

  16. 23.03% GO terms in concepts with other sources 76.97% GO terms in concepts where they are the only source Statistics • Approximately 23% of GO terms ‘match’ something in another source vocabulary

  17. Statistics biological process cellular component molecular function 4.6% 27.8% 45.2% • % of GO in sources with other concepts, by GO vocabulary

  18. 19.74 % MSH2003_2002_08_14 (Medical Subject Headings) Statistics 7.34 % CSP2002 (Computer Retrieval of Information on Scientific Projects Thesaurus) 11.05 % • % of GO in sources with other concepts, by source SNMI98 (Systemized Nomenclature of Human and Veterinary Medicine) SNOMED CRISP GO MeSH

  19. concept name concept id definition MeSH atoms GO atoms contexts EC number relationships to other concepts

  20. Challenges with insertion • GO synonyms • As GO evolved - now not all synonymous • GO enzymes • GO separates enzyme function from enzyme ‘complexes’ - most vocabularies don’t • Semantic types • What semantic types now apply to concepts with GO atoms?

  21. Future of insertion • Hoped that GO can be released with UMLS early next year • dependent on ironing out problems • Maintenance of insertion • GO changing continually - large differences between UMLS releases

  22. www.geneontology.org • FlyBase & Berkeley Drosophila Genome Project • Saccharomyces Genome Database • PomBase (Sanger Institute) • Rat Genome Database • Genome Knowledge Base (CSHL) • The Institute for Genomic Research • Compugen, Inc • The Arabidopsis Information Resource • WormBase • DictyBase • Mouse Genome Informatics • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit • (Sanger Institute) • National Library of Medicine • Alexa McCray • Stuart Nelson • Bill Hole • Oak Ridge Institute for Science and Education • National Library of Medicine • U. S. Department of Energy The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the National Science Foundation [grant DBI-9978564]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].

More Related