1 / 52

CSCI6904 Genomics and Biological Computing

CSCI6904 Genomics and Biological Computing. Lecture 3 – Conceptual Biology Cells, Gene circuits Conceptual Biology. Overview. Computing in Biological systems

zahur
Télécharger la présentation

CSCI6904 Genomics and Biological Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCI6904Genomics and Biological Computing Lecture 3 – Conceptual Biology Cells, Gene circuits Conceptual Biology

  2. Overview Computing in Biological systems Cells are computing information and react programatically to various situations. We will have a brief look at what is a cell and how they “compute”. Evolutionary emergence of Networks These Circuits of gene products are arising in a stochastic manner. We will have a quick look on how this random walk results in a combinatorial strategy to evolve solutions. Investigating Networks None of these network is visible, investigating the relationships in the physical world is a resource consuming operation. Building Knowledge models of cells using text mining Present a test case called GENEWAY.

  3. Cells

  4. Molecular biology tries to organize a stochastically evolved system comprising hundreds of thousands components. None of these components can be seen, even under the most powerful microscopes. They are usually present in the 10-8 – 10-12 grams scale. They degrade in a matter of second to hours. The bottomline is: Everything we know about this system comes from fragments of information. Many of these are going to be refuted over time. Scope of molecular Biology

  5. Cells as processors

  6. Research is usually structured such that individual contributions Can be pieced together into a “pathway” Scope of Biological research

  7. Research is usually structured such that individual contributions Can be pieced together into a “pathway” Essential oils (plants) Sugar Amino-Acids Scope of Biological research Eye Pigments Vitamin K Sexual Hormones Bile

  8. Networks How do they come into being? Combinatorial assembly during a stochastic process. What is done to understand the main pathways? Grasping event the smallest facts about 1 edge in the graph is a feat.

  9. Evolutionary Quandary Intelligent design opposition to evolution of complex systems a b g A B D C

  10. Useless metabolites Evolutionary Quandary Intelligent design opposition to evolution of complex systems a b g A B D C

  11. Impossible Evolutionary Quandary Intelligent design opposition to evolution of complex systems A D

  12. Evolutionary Quandary Intelligent design opposition to evolution of complex systems a b g A B D C Therefore, the pathway A->D had to be designed by an intelligent entity which had the knowledge of the intended purpose of the pathway!

  13. Closer look at high-level genes organization A modular system Proteins can be broken down into domains. A combinatorial effect Domains can assemble in a combinatorial fashion to try together a vast array of potential biological activities.

  14. Proteins are organized into domains Proteins are made of domains http://www.ncbi.nlm.nih.gov Transcription factor eF1eF1/ (PDB: 1IJF)

  15. Domains have several interesting properties. Proteins are made of domains http://www.ncbi.nlm.nih.gov Transcription factor eF1eF1/ (PDB: 1IJF)

  16. Proteins are made of domains Domains fold onto themselves such that it is possible to express them separately (in most case). They are small relative to actual proteins. Which may make it easier to rapidly fold into the right conformation. Transcription factor eF1eF1/ (PDB: 1IJF)

  17. Proteins are made of domains They usually provide a biological function through binding or catalysis. Transcription factor eF1eF1/ (PDB: 1IJF)

  18. A stochastic process

  19. A molecular network = An interaction

  20. Interfaces are very sensitive to mutation as they must provide a perfect match. Interfaces are expensive to evolve Transcription factor eF1/ (PDB: 1IJF)

  21. Network of Metabolites Metabolites are essentially forming network with a scale-free property, which parallels the stochastic assembly of domains. At least, this appears to be true with the data there are so far. http://www.genego.com/about/products.shtml Rzhetsky and Gomez, 2001. Bioinformatics, 17:988-996

  22. Evolutionary Quandary Back to our A to D problem. a b g A B D C An observed pathway therefore is simply a path connecting an input molecule and a required output. Each edge can be seen as a gene product (protein). Overall, the pathway offers some kind of advantage to the host organism. With positive selection, the pathway gets better and look as if it was designed for a specific purpose.

  23. Density of knowledge generating statements per article with respect to source journals Scope of Biological research

  24. Nature of the problem Building a global model from plain English text sources. Size Complexity What is done in the GeneWays project The workflow of their integrated system Where it becomes a bioinformatic’s problem: What I think it really means in the long run The relationship between research and researchers (The right information system will be the next big thing)

  25. Human limitations and Data-heavy and knowledge-heavy Disciplines Synthesizing Hypothesis building Visualizing Records keeping Motivation Modeling Knowledge Streamlining Structuring (Directing) (Changing the way research is communicated?)

  26. In knowledge-intensive field, the connection between investigators and background information is thinning down. Experiment Hypothesis Information (data, concepts) Motivation Data Knowledge This arrow does not scale up as quickly as the others Bioinformatics Computational Biology

  27. Build from plain-English publications a model for molecular biology Allow a more holistic approach to hypothesis formulation. Scope of GeneWays

  28. ~ 3 million statements 150 K full text articles Scope of GeneWays

  29. What are we looking for, ultimately ? protein A binds gene B gene B regulates gene C gene C express protein D protein D inactivates protein A Scope of GeneWays

  30. Doc Sorting Terms identification Disambiguation Information extraction Ontology Visualization Scope of GeneWays

  31. Doc Sorting From Abstracts, using either clustering (unsupervised) or Naïve Bayes. This system is using a mixture of methods to achieve the binary classification: Relevant / irrelevant Details of GeneWays

  32. Tagging terms Especially hard in biology(?) Morphological rules Grammatical rules Rules/dictionary methods SVM HMM Naïve Bayes Decision Trees Recall in the 70’s to 80’s Details of GeneWays

  33. Tagging terms HTML -> XML-like format Details of GeneWays

  34. Tagging terms Vertices: Gene Protein Geneorprotein Process Smallmolecules Species Complex Disease Domain (protein) Details of GeneWays

  35. Tagging terms Edges: N-acylate acetylate N-glycosylate O-glycosylate Bind Degrade (De-)methylate (De-)phophorylate [Make|break]bond Express Transcribe Release Interact Substitute … n = 125 (2001) Details of GeneWays

  36. Learning new verbs: AVAD system Χ2 statistics of occurrence of terms before and after tagged items. Log-likelihood test based on frequency of occurrence in corpus-specific literature Co-localize and synergize were discovered using AVAD Details of GeneWays

  37. There are obscure ways to agree: Protein kinase A phosphorylates protein B Is the same as : Nomenclature

  38. There are obscure ways, period: Gene named: “Forever Young” in Arabidopsis Thaliana (mustard familly) “Mother against decapentaplegic” in Fruit fly Nomenclature

  39. Fight fire with fire: They developed a method that uses BLAST, a popular sequence database search algorithm to mine for biological terms. (Krauthammer et al., 2000. Gene. 259:245-252) Nevermind the jargon!

  40. Fight fire with fire: N-(2-Hydroxyethyl)piperazine-N'-(2-ethanesulfonic acid) (HEPES) 2-(N-Morpholino)ethanesulfonic acid (MES) 3-(N-Morpholino)propanesulfonic acid (MOPS) N-tris[Hydroxymethyl]methyl-3-aminopropanesulfonic acid (TAPS) tris(Hydroxymethyl)aminomethane (TRIS) Nevermind the jargon!

  41. Disambiguation il2 and interleukine-2 can both be used to refer to either the gene, the protein or the mRNA. Details of GeneWays

  42. Disambiguation Use canonical name as much as possible. Learn Semantic classes Details of GeneWays

  43. Information extraction Correlation methods HMM Formal grammar (lexicon) GeneWays uses NLP GENIES Attempts complete parsing, then default to segmenting and partial parsing. Details of GeneWays

  44. GENIES (GENomics Information Extraction System) Based on MedLEE (medical NLP system) Term tagging component uses rules and external knowledge Nested relationships, normalized and agentive forms of verbs inhibit, inhibition and inhibitor . Details of the NLP system

  45. Information simplification Convert nested relationships into a collection of binary statements. Details of GeneWays

  46. Ontology Knowledge Models Details of GeneWays

  47. Visualization Synthesis and querying facility Uses for GeneWays The only filter described at the time of the publication is a filter based on the number of statement supporting an edge.

  48. Visualization Synthesis and querying facility Uses for GeneWays

  49. Expert Review 125 statements / 2500 were erroneous or “phantoms”. Of these 125: - 100 due to term identification. - 12 NLP errors. - 5 Simplifier errors. - 8 Actually correct! System’s precision: 95% Expert’s precision : 93.5% Such as system should be seen as a mean to enrich Validation of GeneWays

  50. Redundancy Redundant statements are not necessarily “more true”. Redundancy due to indirect relationships. Validation of GeneWays

More Related