1 / 32

Omixon Workshops

Omixon Workshops. Considerations for Analyzing Targeted NGS Data - Introduction. Tim Hague, CEO. Targeted Data. Introduction. Many mapping, alignment and variant calling algorithms

bran
Télécharger la présentation

Omixon Workshops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OmixonWorkshops Considerations for Analyzing Targeted NGS Data - Introduction Tim Hague, CEO

  2. Targeted Data

  3. Introduction • Many mapping, alignment and variant calling algorithms • Most of these have been developed for whole genome sequencing and to some extent population genetic studies

  4. Premise • In contrast, NGS based diagnostics deals with particular genes or mutations of an individual • Different diagnostic targets present specific challenges

  5. Goal Present analysis issues related to differences in: • Sequencing technologies • Targeting technologies • Target specifics • Pseudogenes and segmental duplication

  6. NGS Sequencers • Illumina • Ion Torrent • Roche 454 • (SOLiD) Illumina Roche 454 IonTorrentt

  7. Mind The Gap Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.

  8. Sequencing Technology Differences: • Homopolymer error rates • G/C content errors • Read length • Sequencing protocols (single vs paired reads)

  9. Targeting Methods • PCR primers (e.g. amplicons) • Hybridization probes (e.g. exome kits)

  10. Targeting Technology Differences: • Exact matching regions vs regions with SNPs Results in: • Need for mapping against whole chromosomes to avoid false positives

  11. Analysis Targets Differences: • Rate of polymorphism • Repetitive structures • Mutation profiles • G/C content • Single genes vs multi gene complexes

  12. BRCA1/2 HLA CFTR 1/2000 1/29 1/2000 Distributionsof insertions and deletions Distribution of repeatelements

  13. Segmental Duplications • Sometimes called Low Copy Repeats (LCRs) • Highly homologous, >95% sequence identity • Rare in most mammals • Comprise a large portion of the human genome (and other primate genomes) • Important for understanding HLA

  14. Segmental Duplications • Many LCRs are concentrated in "hotspots„ • Recombinations in these regions are responsible for a wide range of disorders, including: • Charcot-Marie-Tooth syndrome type 1A • Hereditary neuropathy with liability to pressure palsies • Smith-Magenis syndrome • Potocki-Lupski syndrome

  15. Data Analysis Tools Differences: • Detection rates of complex variants (sensitivity) • False positive rates (accuracy) • Speed • Ease of use Data analysisshouldn’t be likethis!

  16. “Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools—nearly as big as the two Life Tech/Illumina genomes.” Mark Yandel in BioIT-World.com, June 8, 2011

  17. Examples • Missing variants • SNPs, a DNP and deletions

  18. Identify More ValidVariants

  19. Find HomopolymerIndels

  20. Examples • Coverage differences

  21. [0-96] [0-432] Four TimesExonCoverage

  22. [0-10] [0-24] Higher ExomeCoverage

  23. FirstConclusion Read accuracy is notthe limiting factorinaccuratevariantanalysis

  24. Example - Dense Region of SNPs

  25. Second Conclusion As variant density increases the performance of most tools goes down

  26. Variant Calling • There are few popular variant callers: GATK, SAMtoolsmpileup, VarScan • The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step • These recalibration and realignment steps are highly recommended to be run before any variant call • Deduplication and removing non-primary alignments may also be required

  27. IndelRealignerProblem

  28. Variants That Can be Hardto Find • DNPs • TNPs • Small indels next to SNPs • 30+ bpindels • Homopolymerindels • Homopolymerindel and SNP together • Indels in palindromes • Dense regions of variants

  29. Contact Tim Hague, CEO Omixon Biocomputing Solutions Tim.Hague@omixon.com +36 70 318 4878

  30. Download our Omixon Target™ Evaluation Version Today omixon.com

More Related