170 likes | 339 Vues
Joined up ontologies: incorporating the Gene Ontology into the UMLS. The Gene Ontology (GO). Controlled vocabulary for describing molecular biology hierarchical multiple parentage allowed defined terms. Structure of GO. (Created using the tool GenNav, developed at NLM). The ontologies.
E N D
Joined up ontologies: incorporating the Gene Ontology into the UMLS
The Gene Ontology (GO) • Controlled vocabulary for describing molecular biology • hierarchical • multiple parentage allowed • defined terms
Structure of GO (Created using the tool GenNav, developed at NLM)
The ontologies What does it do? What processes is it involved in? Where does it act? gene product
The ontologies What does it do? molecular function What processes is it involved in? Where does it act? gene product
The ontologies What does it do? molecular function What processes is it involved in? biological process Where does it act? gene product
The ontologies What does it do? molecular function What processes is it involved in? biological process Where does it act? cellular component gene product
Gene annotation: assigning GO terms to gene products • Genes or gene products • GO terms “linked” to gene products • Gene products annotated to all 3 ontologies • May be linked to more than one term in each ontology nucleus regulation of transcription ATP dependent helicase DNA binding
Queries across databases fly rat nuclease signal transducer cytoplasm yeast membrane mouse toxin catabolism osmosensory signaling pathway DNA binding DNA binding nucleus helicase mitotic cell cycle regulation of transcription nucleus Find me all gene products with ‘DNA binding activity’…
Associating with different levels of ontology (Created using the tool GenNav, developed at NLM)
GO and other systems • Useful to equate GO with other systems • Mappings files • e.g. ec2go • References in GO • as dbxrefs e.g. BioCyc • References in other systems • e.g. BRENDA (in process) • UMLS Metathesaurus
GO into UMLS • Unified Medical Language System • Long-term project at NLM • Three parts: specialist lexicon; sematic network; Metathesaurus • Metathesaurus interrelates biomedical vocabularies • Includes ~60 vocabularies including SNOMED and MeSH.
Inserting GO into UMLS • inversion • converting GO to correct format for UMLS • insertion • inserting GO using matching algorithms • editing • all concepts containing GO term reviewed by hand
19.74 % MSH2003_2002_08_14 (Medical Subject Headings) Statistics 7.34 % CSP2002 (Computer Retrieval of Information on Scientific Projects Thesaurus) 11.05 % • % of GO in sources with other concepts, by source SNMI98 (Systemized Nomenclature of Human and Veterinary Medicine) SNOMED CRISP GO MeSH
Potential applications • Mining abstracts using GO terms: DNA helicase ; GO:0003678 UMLS MeSH term GO <-> MeSH
Status of GO into UMLS • Molecular function ontology already inserted • Hope to insert other two ontologies by April • Release GO with UMLS by end of year
www.geneontology.org • FlyBase & Berkeley Drosophila Genome Project • Saccharomyces Genome Database • PomBase (Sanger Institute) • Rat Genome Database • Genome Knowledge Base (CSHL) • The Institute for Genomic Research • Compugen, Inc • The Arabidopsis Information Resource • WormBase • DictyBase • Mouse Genome Informatics • Swiss-Prot/TrEMBL/InterPro • Pathogen Sequencing Unit • (Sanger Institute) • National Library of Medicine • Alexa McCray • Stuart Nelson • Bill Hole • Oak Ridge Institute for Science and Education • National Library of Medicine • U. S. Department of Energy The Gene Ontology Consortium is supported by an R01 grant from the National Human Genome Research Institute (NHGRI) [grant HG02273]. SGD is supported by a P41, National Resources, grant from the NHGRI [grant HG01315]; MGD by a P41 from the NHGRI [grant HG00330]; GXD by the National Institute of Child Health and Human Development [grant HD33745]; FlyBase by a P41 from the NHGRI [grant HG00739] and by the Medical Research Council, London. TAIR is supported by the National Science Foundation [grant DBI-9978564]. WormBase is supported by a P41, National Resources, grant from the NHGRI [grant HG02223]; RGD is supported by an R01 grant from the NHLBI [grant HL64541]; DictyBase is supported by an R01 grant from the NIGMS [grant GM064426].