1 / 74

PDCB BioC for HTS topic Understanding the tech. 02

PDCB BioC for HTS topic Understanding the tech. 02. LCG Leonardo Collado Torres lcollado@wintergenomic.com lcollado@ibt.unam.mx September 2 nd , 2010. Topics. Basecalling Quality Filtering FASTQ format Error rates A gamma of problems / reports

finian
Télécharger la présentation

PDCB BioC for HTS topic Understanding the tech. 02

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PDCB BioC for HTS topicUnderstanding the tech. 02 LCG Leonardo Collado Torres lcollado@wintergenomic.com lcollado@ibt.unam.mx September 2nd, 2010

  2. Topics • Basecalling • Quality Filtering • FASTQ format • Error rates • A gamma of problems / reports • Fragment of James Huntley’s ppt on best practices

  3. Basecalling: Illumina

  4. Cross-talk

  5. SWIFT: cross-talk correction

  6. Phasing and Prephasing options

  7. Some warnings!

  8. Describe each case

  9. Quality Filtering: Purity and Chastity

  10. What artifact can be derived from this step?

  11. FASTQ format @ is the seq id sequence + is the qual id Quality in ASCII chars

  12. Originally…

  13. Q to error probability (p) formulas Qphred Qsolexa1.3

  14. FASTQ types What is the quickest way to distinguish fastq-sanger from fastq-illumina? Tip: Check the ASCII table 

  15. phred.R

  16. It is NOT clear what quals of 1 and 2 mean in Illumina (version 1.5+)

  17. FASTQ in CS Base 1 does not include a quality value! (It’s a 0)

  18. Error rates

  19. IlluminavsSOLiD: % per cycle

  20. IlluminavsSOLiD: num of errs

  21. Understanding 454 (GS20) a bit more

  22. 454 error types

  23. 454 errors

  24. Presence of Ns correlates with error rate (454)

  25. IlluminavsSOLiD

  26. Helicos

  27. A gamma of problems / reports • Aligned to the wrong reference • Did not use the correct quality encoding • Barcodes are trimmed or have mismatches • Trimming the 1st and last base  losing barcodes • GC bias • Sample degradation will affect your data!

More Related