1 / 29

Next Generation Sequencing Technologies

Lecture Outline. Odds and Ends from Next Generation Sequencing Lecture 1Third Generation SequencingThe Poisson Distribution and Fold CoverageAligning Reads to a Reference - A Practical Introduction. Read Length is Not As Important For Resequencing. Jay Shendure. Sequencing a complete cancer genome.

Lucy
Télécharger la présentation

Next Generation Sequencing Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Next Generation Sequencing Technologies Rob Mitra 5488 Lecture 1/24/10

    3. Read Length is Not As Important For Resequencing

    4. Sequencing a complete cancer genome

    5. What did we learn? Mutation spectrum carries a signature of damage by UV light. Transcription coupled repair is a important DNA repair mechanism. Plausible candidate genes.

    6. What will we learn?

    7. What will we learn part 2? In non-small cell lung cancer, the tyrosine kinase domain of EGFR is mutated in some patients (often non-smokers) This mutation activates an anti-apoptotic pathway The tyrosine kinase inhibitor Gefitinib is effective in patients with this mutation (60% show no growth or remission after 1 year versus 7% on chemotherapy) Very few side effects! Personalized Therapy

    8. Pacific Biosciences: A Third Generation Sequencing Technology

    12. Real Time Sequencing

    13. How did they do? 150 bp circular template ~93% raw accuracy 15x coverage 99.3% accuracy Still early days

    14. Where are they going? Commercial specifications: 80,000 zero mode waveguides 10-15 minute runs Throughput ~140 MB per hour* (currently Illumina is at ~100 MB per hour,expect 625 MB per hour this summer) Methyl C? Dark reads? Phi29 so long read lengths possible (3kb now, up to 70kb later?) Ease of sample prep Camera costs

    15. Math Aside: Sequencing coverage calculations Lets say you need a base to be sequenced 5x for an accurate base call If you sequence at 10x coverage how much of the genome will be sequenced at least 5 times?

    16. Poisson Distribution

    18. Example Average coverage = 5x Probability of a given base being sequenced exactly 10 times is: 510e-5/10! = 0.018 or about 2% of bases will have 10x coverage.

    19. Math Aside: Sequencing coverage calculations If you sequence at 10x coverage how much of the genome will be sequenced at least 5 times? 1 [f(0,10) + f(1,10) + f(2,10) + f(3,10) + f(4,10)] = 0.97

    20. Nuts and Bolts of Sequencing Resequencing Map reads back to genome Call bases RNA-seq Map reads back to genome Count tags to determine gene expression levels Chip Seq Map reads back to genome Peaks determine binding sites.

    21. Mapping Reads Back Hash Table (Lookup table) FAST, but requires perfect matches Array Scanning Can handle mismatches, but not gaps Dynamic Programming (Smith Waterman, Forward, Viterbi) Indels Mathematically optimal solution Slow (most programs use Hash Mapping as a prefilter) Burrows-Wheeler Transform (BW Transform) FAST (memory efficient) But for gaps/mismatches, it lacks sensitivity

    22. Aligners Evaluated

    23. CPU Time 2M Reads to Hs36: SE/PE Benchmark: Maq (~8 hours)

    24. Placement: 2M Reads on Hs36 Most aligners place ~80% of reads uniquely in Hs36.

    25. PE Mode Increases Unique Mapping

    26. What we find BW transform algorithms (Bowtie) are great for RNA-seq, ChIP-Seq We prefer Maq, or an in-house alignment program that uses a seed-based approach followed by DP to find SNPs and INDELs

    27. A Case Study in Resequencing

    28. Exon Capture

    29. Four individuals is enough!

    30. A case study in RNA seq

    31. Workflow Reads per kilobase of exonic sequence per million mapped readsReads per kilobase of exonic sequence per million mapped reads

    33. RNA-Seq Versus Microarray Splice forms Allele Specific Expression Accuracy Dynamic Range

More Related