1 / 55

EECS 730 Introduction to Bioinformatics Function

EECS 730 Introduction to Bioinformatics Function. Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/. Overview. Gene ontology Challenges What is gene ontology construct gene ontology

tex
Télécharger la présentation

EECS 730 Introduction to Bioinformatics Function

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 730Introduction to BioinformaticsFunction Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/

  2. Overview • Gene ontology • Challenges • What is gene ontology • construct gene ontology • Text mining, natural language processing and information extraction: An Introduction • Summary EECS 730

  3. Ontology • <philosophy> A systematic account of Existence. • <artificial intelligence> (From philosophy) An explicit formal specification of how to represent the objects, concepts and other entities that are assumed to exist in some area of interest and the relationships that hold among them. • <information science> The hierarchical structuring of knowledge about things by subcategorising them according to their essential (or at least relevant and/or cognitive) qualities. • This is an extension of the previous senses of "ontology" (above) which has become common in discussions about the difficulty of maintaining subject indices. • The philosophy of indexing everything in existence? EECS 730

  4. Aristotele’s (384-322 BC) Ontology • Substance • plants, animals, ... • Quality • Quantity • Relation • Where • When • Position • Having • Action • Passion EECS 730

  5. Ontology and -informatics • In information sciences, ontology is better defined as: “a domain of knowledge, represented by facts and their logical connections, that can be understood by a computer”. (J. Bard, BioEssays, 2003) • “Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing” (Gruber, 1993) EECS 730

  6. Information Exchange in Bio-sciences • Basic challenges: • Definition, definition, definition • What is a name? • What is a function? EECS 730

  7. Cell EECS 730

  8. Cell EECS 730

  9. Cell EECS 730

  10. Cell EECS 730

  11. Cell EECS 730 Image from http://microscopy.fsu.edu

  12. What’s in a name? • The same name can be used to describe different concepts EECS 730

  13. What’s in a name? • Glucose synthesis • Glucose biosynthesis • Glucose formation • Glucose anabolism • Gluconeogenesis • All refer to the process of making glucose from simpler components EECS 730

  14. What’s in a name? • The same name can be used to describe different concepts • A concept can be described using different names  Comparison is difficult – in particular across species or across databases EECS 730

  15. What is Function? The Hammer Example Function (what)Process (why) Drive nail (into wood)Carpentry Drive stake (into soil) Gardening Smash roach Pest Control Clown’s juggling object Entertainment EECS 730

  16. Information Explosion EECS 730

  17. Entering the Genome Sequencing Era Eukaryotic Genome SequencesYear Genome # Genes Size (Mb) Yeast (S. cerevisiae) 1996 12 6,000 Worm (C. elegans) 1998 97 19,100 Fly (D. melanogaster) 2000 120 13,600 Plant (A. thaliana) 2001 125 25,500 Human (H. sapiens, 1st Draft) 2001 ~3000 ~35,000 EECS 730

  18. What is the Gene Ontology? A Common Language for Annotation of Genes from Yeast, Flies and Mice …and Plants and Worms …and Humans …and anything else! EECS 730

  19. http://www.geneontology.org/ EECS 730

  20. What is the Gene Ontology? • Gene annotation system • Controlled vocabulary that can be applied to all organisms • Organism independent • Used to describe gene products • proteins and RNA - in any organism EECS 730

  21. The 3 Gene Ontologies • Molecular Function = elemental activity/task • the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity • Biological Process = biological goal or objective • broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions • Cellular Component= location or complex • subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme EECS 730

  22. Cellular Component • where a gene product acts EECS 730

  23. Cellular Component EECS 730

  24. Cellular Component EECS 730

  25. Cellular Component • Enzyme complexes in the component ontology refer to places, not activities. EECS 730

  26. Molecular Function insulin binding insulin receptor activity EECS 730

  27. Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity EECS 730

  28. Molecular Function • A gene product may have several functions; a function term refers to a single reaction or activity, not a gene product. • Sets of functions make up a biological process. EECS 730

  29. cell division Biological Process a commonly recognized series of events EECS 730

  30. Biological Process transcription EECS 730

  31. Biological Process Metabolism: degradation or synthesis of biomelecules EECS 730

  32. Biological Process Development: how a group of cell become a tissue EECS 730

  33. Biological Process social behavior EECS 730

  34. Ontology applications • Can be used to: • Formalise the representation of biological knowledge • Standardise database submissions • Provide unified access to information through ontology-based querying of databases, both human and computational • Improve management and integration of data within databases. • Facilitate data mining EECS 730

  35. Gene Ontology Structure • Ontologies can be represented as directed acyclic graphs (DAG), where the nodes are connected by edges • Nodes = terms in biology • Edges = relationships between the terms • is-a • part-of EECS 730

  36. Parent-Child Relationships Chromosome Cytoplasmic chromosome Mitochondrial chromosome Nuclear chromosome Plastid chromosome A child is a subset or instances of a parent’s elements EECS 730

  37. Parent-Child Relationships cell membrane chloroplast mitochondrial chloroplast membrane membrane is-a part-of EECS 730

  38. Annotation in GO • A gene product is usually a protein but can be a functional RNA • An annotation is a piece of information associated with a gene product • A GO annotation is a Gene Ontology term associated with a gene product EECS 730

  39. Terms, Definitions, IDs • Term: MAPKKK cascade (mating sensu Saccharomyces) • Goid: GO:0007244 • Definition: OBSOLETE. MAPKKK cascade involved in transduction of mating pheromone signal, as described in Saccharomyces. • Evidence code: how annotation is done • Definition_reference: PMID:9561267 EECS 730

  40. PMID: 11956323 nek2 Reference Gene Product IDA Inferred from Direct Assay Evidence Code Annotation Example centrosome GO:0005813 GO Term EECS 730

  41. GO Annotation EECS 730

  42. GO Annotation EECS 730

  43. GO Annotation EECS 730

  44. Evidence Code • Indicate the type of evidence in the cited source that supports the association between the gene product and the GO term http://www.geneontology.org/GO.evidence.html EECS 730

  45. Types of evidence codes • Types of evidence code • Experimental codes - IDA, IMP, IGI, IPI, IEP • Computational codes - ISS, IEA, RCA, IGC • Author statement - TAS, NAS • Other codes - IC, ND • Two types of annotation •  Manual Annotation •  Electronic Annotation EECS 730

  46. Beyond GO – Open Biomedical Ontologies • Orthogonal to existing ontologies to facilitate combinatorial approaches • Share unique identifier space • Include definitions EECS 730

  47. Gene Ontology and Text Mining • Derive ontology from text data • More general goal: understand text data automatically EECS 730

  48. Finding GO terms …for B. napus PERK1 protein (Q9ARH1) In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… PubMed ID: 12374299 Function: protein serine/threonine kinase activity GO:0004674 Component: integral to plasma membrane GO:0005887 Process: response to wounding GO:0009611 EECS 730

  49. <a href>Frank Rizzo </a> Bought <a hef>this home</a> from <a href>Lake View Real Estate</a> In <b>1992</b>. <p>... Frank Rizzo bought his home from Lake View Real Estate in 1992. He paid $200,000 under a15-year loan from MW Financial. HomeLoan ( Loanee: Frank Rizzo Lender: MWF Agency: Lake View Amount: $200,000 Term: 15 years ) Loans($200K,[map],...) Mining Text Data Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext (Taken from ChengXiang Zhai, CS 397cxz, UIUC, CS – Fall 2003) EECS 730

  50. Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or … nation – 5 civil - 1 war – 2 men – 2 died – 4 people – 5 Liberty – 1 God – 1 … Bag-of-Tokens Approaches Documents Token Sets Feature Extraction Loses all order-specific information! Severely limits context! EECS 730

More Related