1 / 35

Overture (Motivating bioinformatics)

Overture (Motivating bioinformatics). CS 498 SS Saurabh Sinha. Special issue of journal Science, July 1, 2005. .

ellis
Télécharger la présentation

Overture (Motivating bioinformatics)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overture(Motivating bioinformatics) CS 498 SS Saurabh Sinha

  2. Special issue of journal Science, July 1, 2005.

  3. >What Is the Universe Made Of?>What is the Biological Basis of Consciousness?>Why Do Humans Have So Few Genes?>To What Extent Are Genetic Variation and Personal Health Linked?>Can the Laws of Physics Be Unified?>How Much Can Human Life Span Be Extended?>What Controls Organ Regeneration?>How Can a Skin Cell Become a Nerve Cell?>How Does a Single Somatic Cell Become a Whole Plant?>How Does Earth's Interior Work?>Are We Alone in the Universe?>How and Where Did Life on Earth Arise?>What Determines Species Diversity?>What Genetic Changes Made Us Uniquely Human?>How Are Memories Stored and Retrieved?>How Did Cooperative Behavior Evolve?>How Will Big Pictures Emerge from a Sea of Biological Data?>How Far Can We Push Chemical Self-Assembly?>What Are the Limits of Conventional Computing?>Can We Selectively Shut Off Immune Responses?>Do Deeper Principles Underlie Quantum Uncertainty and Nonlocality?>Is an Effective HIV Vaccine Feasible?>How Hot Will the Greenhouse World Be?>What Can Replace Cheap Oil -- and When?>Will Malthus Continue to Be Wrong?

  4. Where does bioinformatics come in ?

  5. “Why do humans have so few genes ?”

  6. A simple organism Environmental signal GENE Response (protein) Raw materials

  7. A simple organism GENE1 GENE2 GENE3

  8. GENE1 GENE2 GENE3 GENE4 GENE5 GENE6 GENE7 GENE8 GENE9 GENE10 A simple organism

  9. GENE1 GENE2 GENE3 GENE4 GENE5 GENE6 GENE7 GENE8 GENE9 GENE10 A complex organism Complex circuit of interactions

  10. Regulatory networks • Genes are switches, transcription factors are (one type of) input signals, proteins are outputs • Proteins (outputs) may be transcription factors and hence become signals for other genes (switches) • This may be the reason why humans have so few genes (the circuit, not the number of switches, carries the complexity) • Bioinformatics can unravel such networks, given the genome (DNA sequence) and gene activity information

  11. Decoding the regulatory network: Method 1 • Find patterns (“motifs”) in DNA sequence that occur more often than expected by chance • These are likely to be binding sites for transcription factors • Knowing these can tell us if a gene is regulated by a transcription factor (i.e., the “switch”)

  12. How to find motifs ? • One method called “Gibbs sampling” • A special kind of “Markov chain Monte Carlo” sampling technique popular in physics and computer science • We shall see such an approach (Lawrence et al. 1992) later in the course

  13. Decoding the regulatory network: Method 2

  14. Common problem in bio- informatics: Integration of different types of information, in principled manner More on method 2 • Paper from Nature journal, Sep 2004. • We shall see this later in the course • Integrates multiple, heterogeneous sources of information • Multiple species conservation • What is functional is probably conserved • How to find “conserved” sequence ? (Next topic) • “ChIP-on-chip” data • Which transcription factors bind which sequences ? A high throughput, low resolution measurement

  15. Sequence alignment • The staple of a bioinformatician’s diet • Are two sequences similar ? Are portions of two sequences similar ? Which portions are these ? • Several algorithms that use dynamic programming, a popular technique in computer science • Realistic versions of the problem are NP-hard, approximation algorithms abound. • Can we handle sequences of length ~109 ?

  16. Sequence alignment • Blanchette et al. (2004) in the journal Genome Research • We shall see this paper later in the course • Efficient algorithm for genome-wide multiple sequence alignment • Uses the “divide and conquer” approach, as well as “dynamic programming” • Can help in reconstruction of “ancestral genomes” • Jurassic Park MMVI ? ! ? !

  17. On counting genes • The original question was “Why do humans have so few genes?” • How do we know how many genes there are in the human genome ? (And where they are in the genome) • Experiments can be designed, but bioinformatics plays a major role

  18. Gene prediction • The task of predicting the locations of genes in a new genome (“annotation”) • Several gene prediction software exist • We shall see one such approach (Lukashin & Borodovsky 1998, in the journal Nucleic Acids Research) later in the course. • The more sophisticated ones use Hidden Markov models (HMM) and multiple species comparison

  19. “What controls organ regeneration ?”“How does a single somatic cell become a whole plant ?”

  20. Regeneration is mysterious, but do we understand the “generation” part of “regeneration” • Developmental biology • The timeline from a single cell (with genetic material from mother and father) to a multicellular embryo, and to an adult • A paradox : All cells in the adult body have the same DNA, then how come different cells are different ?

  21. … and to this ? Drosophila (fruitfly) How does a single cell lead to this ? …

  22. Answer: Regulatory networks (Again !) • Bioinformatics used to scan entire genome for regions that participate in “segmenting” the embryo • Hidden Markov models, a popular technique in signal processing, used to detect such regions • Rajewsky et al. (2002) in the journal “BMC Bioinformatics” • Multiple species comparison aids discovery • Evolutionary models to integrate multiple species information in the HMM framework • Sinha et al. (2004) in the journal “BMC Bioinformatics”

  23. “How did cooperative behavior evolve?”

  24. A related question • What is the genetic (molecular) basis of social behavior ? • Social behavior in honey bees • Young worker bees are nurses in the hive; older ones go out to forage • This behavioral maturation is determined by needs of colony • Shortage of foragers => some nurses will become foragers prematurely • Bees respond to social cues. • What is the genetic basis of this ?

  25. Bioinformatics of social behavior • UIUC team (Sinha, Ling, Whitfield, Zhai, Robinson) scanned the genome to understand this • Attend the BIO seminar to learn about this • Regulatory network of social behavior • Statistical tools, such as Hypergeometric test, partial correlation analysis, threshold-based classification, support vector machine classifier, etc. used for this project

  26. “How will big pictures emerge from a sea of biological data?”

  27. The sea • Genomes: 3 x 109 bp of human genome • Similar numbers for other genomes: mouse, rat, dog, chicken, chimp etc. • Microarray: snapshots of 1000s of genes’ activities at one time and condition. Thousands of microarrays. • ChIP-on-chip data: measurements of a transcription factor’s binding affinity for 1000s of genes (promoters).

  28. Big pictures • An example: Segal et al. (2004) in the journal Nature Genetics • We shall see this work later in the course • “A module map showing conditional activity of expression modules in cancer” (Segal et al.) • Integrates sequence, motif, and microarray data through statistical tests to predict involvement of particular genes in particular types of cancer

  29. The sea of biological data • Biological literature, capturing decades of painstaking experimental work on genetics and molecular biology • Can we glean useful information from this vast body of knowledge ? • Biological literature mining. • Natural language processing • Text Information Retrieval (statistical approaches) • Ling et al. (2005) in Pacific Symposium on Biocomputing. On “Gene summarization” • We shall see this paper later in the course

  30. Some other challenges • Protein structure prediction • Can we predict the 3-D structure of a protein from its amino acid sequence ? • Why ? • One good reason: structure gives clues about function. If we can tell the structure, we can perhaps tell the function • We can design amino acid sequences that will fold into proteins that do what we want them to do. Drug design !! • One approach: neural networks, a popular technique in comptuer science, applied to this problem (Jones, 1999 in the J. of Mol Bio.)

  31. Some other challenges • “Metagenomics” • Most studies to date are on genomes of one species • A sample from the soil contains hundreds of bacteria, thousands of viruses. Can we study all of these ? • Bioinformatics is indispensable !! • New type of data, new types of algorithms

  32. Many more challenges • New types of data come due to technological breakthroughs in biology • High throughput data carries unprecedented amount of information • Too much noise • Bioinformatics removes the noise and reveals the truth

  33. Bioinformatics • Is not about one problem (e.g., designing better computer chips, better compilers, better graphics, better networks, better operating systems, etc.) • Is about a family of very different problems, all related to biology, all related to each other • How can computers help solve any of this family of problems ?

  34. Bioinformatics and You • You can learn the tools of bioinformatics • These tools owe their origin to computer science, information theory, probability theory, statistics, etc. • You can learn the language of biology, enough to understand what the problems are • You can apply the tools to these problems and contribute to science

More Related