1 / 27

Error model for massively parallel (454) DNA sequencing

Error model for massively parallel (454) DNA sequencing. Sriram Raghuraman (working with Haixu Tang and Justin Choi). Sequencing Preparation. Randomly fragment entire genome Nebulize fragments. Add adapters. Attach to DNA capture beads in water oil emulsion

emily
Télécharger la présentation

Error model for massively parallel (454) DNA sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Error model for massively parallel (454) DNA sequencing Sriram Raghuraman (working with Haixu Tang and Justin Choi)

  2. Sequencing Preparation Randomly fragment entire genome Nebulize fragments. Add adapters. Attach to DNA capture beads in water oil emulsion PCR amplify fragments attached to beads Place beads bound to multiple copies of same fragment in a PicoTiterPlate. Add enzymes including polymerase and luciferase.

  3. Sequencing Process Place plates in a sequencer. Wash nucleotides (A,C,G,T) in series over plate. When a complementary nucleotide enters a well, the template strand is extended by DNA polymerase. Addition of the nucleotide releases light which is recorded by a CCD camera. Hundreds of thousands of beads are then sequenced in parallel. Genome sequencing in microfabricated high-density picolitre reactors-Nature 437, 376-380 (15 September 2005) Genome sequencing in microfabricated high-density picolitre reactors-Nature 437, 376-380 (15 September 2005) Genome sequencing in microfabricated high-density picolitre reactors-Nature 437, 376-380 (15 September 2005) Genome sequencing in microfabricated high-density picolitre reactors-Nature 437, 376-380 (15 September 2005) Genome sequencing in microfabricated high-density picolitre reactors-Nature 437, 376-380 (15 September 2005)

  4. Speed of sequencing • ~25 million bases at >=99% accuracy in a 4 hour run • ~230,000 reads • Average read length 110 bases

  5. Data Sets (Newbler) • 984766 reads aligned by Newbler • Bases 98878209 • Matches 97793963 (98.90%) • Mismatches 10643 (0.01%) • Inserts 368332 (0.37%) • Deletes 668451 (0.67%) • ‘N’ terms 36820 (0.03%)

  6. Data Set (Sanger) • Staphylococcus aureus subsp. aureus COL from NCBI Assembly Archive • 50000 reads • Bases 27173366 • Matches 27094113 (99.70%) • Mismatches 71203 (0.26%) • Inserts 1827 (0.006%) • Deletes 6223 (0.02%)

  7. Length Distributions • Newbler reads are shorter than Sanger reads • Newbler • Average read length ~100 bases • Sanger • Average read length ~545 bases

  8. Accuracy % • Newbler reads show a prevalence of gaps as compared to mismatches • Newbler mismatches are indirect • AA-CT • AAG-T • Sanger reads contain more mismatches than gaps

  9. Biases in Substitutions and Gaps

  10. Substitutions

  11. The case for homogeneous gaps

  12. Homogeneous gaps • Newbler reads often exhibit homogeneous gaps • Insertions • R:-CGGGATCAGTGATGGCGTACGTTTACCGGGTTAAAAGAGGGCCGG • G:-CGGGATCAGTGATG-CG-A--TT--CCGG-TTAAA-GAGG-C-GG • Deletions • R:-TTTACA-TCGTGGTCGTGACAC-ATCGACACTGTAT-AAAA-CCAT • G:-TTT-CAATC-TGGTCGTGACACCATCGACACTGTATTAAAAACCAT

  13. Insert Transitions

  14. Delete Transitions

  15. Insert Strings

  16. Delete Strings

  17. Some examples • Blast 1st hit • CTCCGCATC-AAAG....TTT-GATGCGGAG • CTCCGCATCCAAAG....TTTGGATGCGGAG • Newbler Alignment • CCTCCGCATC-AAAG....TTTG-ATGCGGAG • C-TCCGCATCCAAAG....TTTGGATGCGGAG • No difference between homogeneous and regular gaps as far as BLAST is concerned

  18. Markov Model

  19. General Ideas • Incorporate provisions for homogeneous gaps • Train model on Newbler data • A Markov model that accounts for homogeneous gaps should perform better than one that doesn’t (i.e. BLAST)

  20. G- -T -G -C T- C- -A TT GG CC AA A- AG AC AT MM MM-MisMatch

  21. Procedure • Get initial, transition and emission probabilities from Newbler reads • Use Markov model to perform pairwise alignment of unaligned reads by employing Viterbi’s algorithm • Compare results to BLAST alignment of same reads

  22. Procedure • Get initial, transition and emission probabilities from Newbler reads • Use Markov model to perform pairwise alignment of unaligned reads by employing Viterbi’s algorithm • Compare results to BLAST alignment of same reads

  23. Results

  24. Results

  25. Limitations • Global Alignment only • Local Alignment hinges on good alignment extension metric/method

More Related