1 / 25

The Past, Present, and Future of DNA Sequencing

The Past, Present, and Future of DNA Sequencing . Craig A. Praul Co- Director Genomics Core Facility Huck Institutes of the Life Sciences Penn State University. A very short history of DNA sequencing.

aretha
Télécharger la présentation

The Past, Present, and Future of DNA Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Past, Present, and Future of DNA Sequencing Craig A. Praul Co- Director Genomics Core Facility Huck Institutes of the Life Sciences Penn State University

  2. A very short history of DNA sequencing

  3. I started from the conviction that, if different DNA species exhibited different biological activities, there should also exist chemically demonstrable differences between deoxyribonucleic acids. Edwin Chargaff

  4. Milestones • First Isolation of DNA : 1867 (FreidrichMeisher) • Composition of nucleic acids; tetranucleotide theory : 1909 - 1940 (Phoebus Levine) • G=C and A=T however, the G/C and A/T content of different organisms vary : 1950 (Edwin Chargaff) • G/C content measured by annealing : 1968 (Mandel and Marmur) • Maxam-Gilbert and Sanger Sequencing : 1977 • Next-Generation Sequencing : 2005

  5. Genomes Sequenced • Virus – 3222 (Bacteriophage phi X 174, 5386 nt – 1977) • Bacteria – 2289 (Haemophilus influenza, 1.8 x 106nt– 1995) • Eukarya – 168 (S. cerevisiae1.2 x 107nt– 1995; H. sapien, 3x 109nt-2001) • Archaea – 152 (Methanococcusjannaschi , 1.7 x 106nt– 1996)

  6. Next-Generation Sequencing Liu et al. Journal of Biomedicine and Biotechnology Volume 2012 (2012), Article ID 251364, 11 pages doi:10.1155/2012/251364

  7. Changes in instrument capacity* ER Mardis. Nature470, 198-203 (2011) doi:10.1038/nature09796

  8. Sequencing Cost Source - NHGRI : http://www.genome.gov/sequencingcosts/

  9. Central Dogma of Molecular Biology James Watson version - 1965 DNA RNA Protein So once we have the genomic DNA sequence of a species we have all of the information there is? Really?

  10. No, not really.

  11. IlluminaHiSeq and MiSeq • Massively parallel • HiSeq : 150 or 180 million reads per lane • MiSeq : 15 million reads per run • Intermediate Read Length • HiSeq : 100 nt or 150 nt • MiSeq : 250 nt • High total output per run • HiSeq : 90 GB or 288 GB • MiSeq : 8 GB

  12. Sequencing Types Single Read Paired-end read Mate-pair read

  13. Library Types • Many different library preps : DNA, mate-pair, mRNA, miRNA, ChIP • Fragmentation • DNA : 300 – 500 nt • RNA : 150 – 200 nt • Attachment of appropriate adapters • Complex : flow cell binding, F & R sequencing, BC • Custom : Avoid if possible • Removal of dimers/small inserts • Amplification (or not)

  14. Applications • de Novo sequencing (genomes, transcriptomes) • Resequencing (genomes, exomes, custom sequence capture) • RNA-seq (mRNA, miRNA, degradome) • Chip-Seq • Methyl-seq • RIP-seq • Amplicon

  15. de Novo Experimental Design • Estimate of genome size • Coverage (30 x – 100 x) • Sequencing Type (paired-end or mate-pair) • Example 100 MB genome, 100 x 100 nt paired-end reads • (100 MB) x (30 x coverage) = 3 GB • 3 GB / (200 nt for each pair of paired-end reads) = 15 million read pairs • Replicates

  16. Resequencing : Sequence Capture

  17. RNA-seq Experimental Design • Estimate of transcriptome size (1-5% of genome ?) • Coverage (30 x ?) • mRNA or rRNA depleted RNA • Relative abundance of transcripts you are interested in • Sequencing Type (single read or paired-end) • Simple transcriptome vs. complex transcriptome • Splice variants • Example 3 GB genome, 100 nt single reads • (3 GB genome) x ( 5% transcriptome ) = 120 MB Transcriptome • (120 MB transcriptome) x (30 x coverage) = 4.5 GB total sequence • 4.5 GB / (100 nt for each read) = 45 million read pairs • Replicates : Yes!!!! • Biological not technical

  18. ChIP-Seq http://www.nature.com/nmeth/journal/v4/n8/images/nmeth0807-613-F1.gif

  19. RIP-seq Source : http://openi.nlm.nih.gov/imgs/rescaled512/3269675_ijms-13-00097f6.png

  20. Methyl-seq 20 different types of base modifications in DNA are known and there are perhaps 200 modifications of RNA

  21. Experimental Space: Next-Gen Platform • PacBio: 0.075 x 106 reads/sample, 1000 – 3000 nt • Whole transcript • Roche 454 FLX+ : 0.5 -1 x 106 reads/sample, 800 -1000 nt • Small – Medium Genome de novo sequencing • Long Amplicon • Transcriptome • PGM: 1-2 x 106 reads per sample, 400 nt • Small genome de novo • Medium Amplicon • MiSeq: 1-2 x 106 reads per sample, 50 – 250 nt • Small genome de Novo • Small Amplicon • HiSeq : 10-100 x 106 reads per sample, 50 – 150 nt • Counting Applications : RNA-seq, ChIP-seq, RIP-seq, Methyl-seq • Large genome de novo and resequencing

  22. Experimental Space: The Relevancy of “Classic” Techniques Differential Gene Expression • Northern blotting (1977) : 1 Probe – 20 samples • Dot Blots (1987) : 100s of probes – 1 sample • RT-PCR (1992) : 100s of probes – 10 -100 samples • Microarrays (1995 ) : 100,000s of probes – 1 sample • Next-gen sequencing (2005) : 10-100 x 106 reads – 1 sample

  23. The Future • More Reads • Longer Reads • Faster Sequencing • Cheaper Sequencing • New Applications

More Related