1 / 34

History and Advances in Genome Sequencing

Explore the history of genome sequencing, from the first bacterial genomes to the sequencing of the human genome. Discover the latest developments in ancient DNA sequencing and the current state of genome sequencing projects.

kingv
Télécharger la présentation

History and Advances in Genome Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome sizes (sample)

  2. Some genomics history • 1995: first bacterial genome, Haemophilus influenza, 1.8 Mbp, sequenced at TIGR • first use of whole-genome shotgun for a bacterium • Fleischmann et al. 1995 became most-cited paper of the year • 2869 citations to date • 1995-6: 2nd and 3rd bacteria published by TIGR: Mycoplasma genitalium, Methanococcus jannaschii • 1996: first eukaryote, S. cerevisiae (yeast), 13 Mbp, sequenced by a consortium of (mostly European) labs • 1997: E. coli finished (7th bacterial genome) • 1998-2001: T. pallidum (syphilis), B. burgdorferi (Lyme disease), M. tuberculosis, Vibrio cholerae, Neisseria meningitidis, Streptococcus pneumoniae, Chlamydia pneumoniae [all at TIGR] • 2000: fruit fly, Drosophila melanogaster • 2000: first plant genome, Arabidopsis thaliana • 2001: human genome, first draft • 2002: malaria genome, Plasmodium falciparum • 2002: anthrax genome, Bacillus anthracis • TODAY (Sept 4, 2008): • 744 complete microbial genomes! • 1199 microbial genomes in progress! • 476 eukaryotic genomes in progress!

  3. New directions:sequencing ancient DNA(some assembly required)

  4. J. P. Noonan et al., Science 309, 597 -599 (2005)

  5. Fig. 1. Schematic illustration of the ancient DNA extraction and library construction process J. P. Noonan et al., Science 309, 597 -599 (2005) Published by AAAS

  6. Fig. 2. Characterization of two independent cave bear genomic libraries Fig. 2. Predicted origin of 9035 clones from library CB1 (A) and 4992 clones from library CB2 (B) are shown, as determined by BLAST comparison to GenBank and environmental sequence databases. Other refers to viral or plasmid-derived DNAs. Distribution of sequence annotation features in 6,775 nucleotides of carnivore sequence from library CB1 (C) and 20,086 nucleotides of carnivore sequence from library CB2 (D) are shown as determined by alignment to the July 2004 dog genome assembly. J. P. Noonan et al., Science 309, 597 -599 (2005) Published by AAAS

  7. Fig. 1. Characterization of the mammoth metagenomic library, including percentage of read distributions to various taxa H. N. Poinar et al., Science 311, 392 -394 (2006) Published by AAAS

  8. Journals • The very best: • Science • www.sciencemag.org • Nature • www.nature.com/nature • PLoS Biology • www.plosbiology.org

  9. Bioinformatics Journals • Bioinformatics • bioinformatics.oxfordjournals.org • BMC Bioinformatics • www.biomedcentral.com/bioinformatics • PLoS Computational Biology • compbiol.plosjournals.org • Journal of Computational Biology • www.liebertpub.com/cmb

  10. Radically new journals • PLoS ONE • www.plosone.org • Biology Direct • www.biology-direct.com • Reviewers’ comments are public Both journals can be annotated by readers Papers can be negative results, confirmations of other results, or brand new

  11. Genomics Journals • Genome Biology • genomebiology.com • Genome Research • www.genome.org • Nucleic Acids Research • nar.oxfordjournals.org • BMC Genomics • www.biomedcentral.com/bmcgenomics

  12. Before assembly… • … we need to cover a basic sequence alignment algorithm

  13. Sequence Alignment • When we have very similar sequences: • Closely related species • Very little changed sequence • Small differences can be very important • Computationally “easy” to align • Assembly ONLY deals with these • When sequences are not so similar: • Distantly related species • Most positions changed • Sequences that are most highly conserved are under the strongest selective (evolutionary) pressure. • E.g., some genes in humans and E. coli clearly have a common ancestor, the proteins can be aligned • Computationally “difficult” to align

  14. Sequence Alignment • Algorithms for sequence alignment • Choose best alignment, subject to some mutation model. • A common (but overly simplistic) model for DNA mutations is called “edits”, which counts the number of substitutions, insertions and deletions. • The resulting alignment suggests a possible “history” for the sequence. This slide and subsequent alignment slides courtesy of Nathan Edwards, available at www.umiacs.umd.edu/~nedwards/teaching/CMSC858E_Fall_2005/

  15. Example Alignments • ACGTCTAG • ||*****^ • ACTCTAG- • 2 matches, 5 mismatches, 1 not aligned

  16. Example Alignments • ACGTCTAG • ^**||||| • -ACTCTAG • 5 matches, 2 mismatches, 1 not aligned

  17. Example Alignments • ACGTCTAG • ||^||||| • AC-TCTAG • 7 matches, 0 mismatches, 1 not aligned • Edit distance here = 1

  18. Example Alignments • ...AACTGAGTTTACGCGCATAGA... • |^^^||^|^^| • T---CG-A--G • Many equally good alignments! • Even exact matching sequence can be found (at random) in long enough sequences

  19. Global Alignment problem • Given two related sequences, S (length n) and T (length m), find an alignment of S and T. • Edit distance: minimum number of substitutions, insertions and deletions.

  20. Dynamic Programming for pairwise alignment

  21. Dynamic Programming Formulation • Definition: Let D(i,j) be the edit distance of the alignment of S[1...i] and T[1...j]. • Edit distance of S and T, then, is D(n,m). • Dynamic programming solves the global alignment problem by computing D(i,j) for all i=0...n and j=0...m.

  22. Recurrence Relation for D • Computation of D is a recursive/iterative process. • D(i,j) in terms of D(i’,j’) for i’ < i and j’ < j. • Base conditions for D(i,j): • D(i,0) = i, for all i = 0,...,n • D(0,j) = j, for all j = 0,...,m

  23. Recurrence relation for D • For i > 0, j > 0: • D(i,j) = min { • D(i-1,j) + 1, • D(i,j-1) + 1, • D(i-1,j-1) + δ(S(i),T(j)) }

  24. Dynamic programming • D(i,j) is computed by optimally solving sub-problems • The optimal solution to D(i,j) is a simple combination (addition) of two optimally solved subproblems

  25. Using the recurrence • We could code this as a recursive function call... • ...but an exponential number of function evaluations • each position explores 3 alternatives • There are only (n+1)x(m+1) pairs i and j • We must be evaluating D(i,j) multiple times • Why not cache the results?

  26. Using the recurrence • Compute D(i,j) bottom up. • Store the intermediate results in a table (the table we already saw). • Start with smallest (i,j) = (1,1). • Compute D(i,j) • after • D(i-1,j), D(i,j-1), and D(i-1,j-1) have been determined. • (n+1)(m+1) cells to fill, so O(nm) time.

  27. Traceback • Our dynamic programming table helps us compute the edit distance “score” • We need the actual alignment corresponding to this edit distance • The corresponding alignment can be read off, by doing a little extra accounting.

  28. Traceback • If D(i,j) == D(i-1,j) + 1, Pointer(i,j) = (i-1,j) • If D(i,j) == D(i,j-1) + 1, Pointer(i,j) = (i,j-1) • If D(i,j) == D(i-1,j-1) + δ(S(i),T(j)), Pointer(i,j) = (i-1,j-1) • Break ties arbitrarily, or keep multiple pointers

  29. Traceback • Follow the pointers from cell (n,m). • Any path to (0,0) corresponds to the (reverse of the) edits of the optimal alignment • “horizontal” pointers: insertion in S • “vertical” pointers: insertion in T • “diagonal” pointers: match or substitution • An optimal alignment can be found in O(n+m) time.

  30. Original references • T.F. Smith and M.S. Waterman, Identification of common molecular subsequences. J. Molecular Biology (1981), 147(1):195-7. • Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ. Basic local alignment search tool. J. Molecular Biology (1990), 215(3):403-10. • - 24,113 citations!

More Related