1 / 32

NGS cancer genomics data processing and analysis

NGS cancer genomics data processing and analysis. Somak Roy, MD Clinical fellow Division of Urologic Surgical Pathology University of Pittsburgh Medical Center. Outline. Introduction to NGS technology Buzz words Bioinformatics analysis Laboratory workflow and information management QA.

trynt
Télécharger la présentation

NGS cancer genomics data processing and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NGS cancer genomics data processing and analysis Somak Roy, MD Clinical fellow Division of Urologic Surgical Pathology University of Pittsburgh Medical Center

  2. Outline • Introduction to NGS technology • Buzz words • Bioinformatics analysis • Laboratory workflow and information management • QA

  3. Background • Next generation sequencing (NGS) technology is rapidly evolving. • Massively parallel processing. • Dramatic decrease in cost of sequencing has led to wide spread use. http://www.genome.gov/sequencingcosts/

  4. Application in Cancer Genomics NGS Gene fusion detection Mutation profiling Structural variants Copy number variations Epigenetic profiling

  5. NGS in clinical domain

  6. Theme of DNA Sequencing Sequence the sample DNA to obtain a string of characters (ATGC) Compare the obtained sequence to the reference sequence (expected normal) Any deviation from the reference (single or multiple base(s)) is a variant.

  7. Evolution of Sequencing Sanger Shotgun approach Next generation sequencing

  8. Semiconductor Sequencing • Robison. Nat Biotechnol 2011;29:805-7 • Rothberg et al. Nature. 2011;475:348-52

  9. Optics-based Sequencing • Arch Pathol Lab Med. 2012;136:000–000; doi: 10.5858/arpa.2012-0107-RA

  10. NGS data processing elements Signal processing Alignment / mapping Assembly / de-novo Variant calling Annotation / Visualization Reporting, storage and sharing of results

  11. Signal processing

  12. Signal Processing – Non-optical • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG

  13. Signal Processing - Optical • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG • CCGCTAGCTATATTATATCGGGGCCCTAGATAGCTAGATATAGAGGGCTCTAGAGATCGATAGCTAGAG • CTAGCTCGCCGGGGCCCTAGAGTATATTATAGGCTCTAGAGATCGATAGCTGATAGCTAGATATAAGAG • ATATAAGCGCGGCTCGATCGGTCTAGAGAGGCCCTAGAGTATATTACTAGCTTAAGCTGATAGCTAGAG

  14. Signal Processing - Homopolymer Semiconductor sequencing and Pyrosequencing technology

  15. Take a Peak into FASTQ ! Header: Sequence ID, additional info Sequence Optional header Quality score Phred Score / Phred-like score Per Base Call score Q = -10*log10p

  16. Take a Peak into FASTQ ! Q = -10*log10p 30 = -10*log10(10-3) 20 = -10*log10(10-2) What are these characters ? ASCII format 67 = -10*log10(p=?)

  17. Mapping, Assembly & Variant Identification Read ATTGCGCTATTATAGCTCTAGAGAAAAGCGCTAGCGGGCCCGCGATAGCTAGCG Var (G) frequency =3/5 (60%) 3x 5x ATTGCGCTATTATAGCTCTAGGGAAAAGCGCTAGCGGGCCCGCGATAGCTAGCG Pile-up ATTATAGCTCTAGAGAAAAGCGCTAGCGGGCCCGCGATAGCTAGCGCTT GGCCAATCGATTGCGCTATTATAGCTCTAGAGAAAAGCGCTAGCGGGCCCGCGATAGCTAGCG ATTGCGCTATTATAGCTCTAGGGAAAAGCGCTAGCGGGCCCGC CTAGGGAAAAGCGCTAGCGGGCCCGCGATAGCTAGCGCTTA Depth of Coverage Variant frequency

  18. Mapping / Alignment • Mapping algorithms • Dynamic programming algorithms • Needleman-Wunsch • Smith-Waterman • Heuristic algorithms • BLAST • Newer algorithms for NGS data • Modified hash-table method • Modified seed-and-extend method • Burrows-Wheeler transformation • Next-Generation DNA Sequencing Informatics. Ed. Brown SM. Cold Spring Harbor Laboratory Press. 2013

  19. Mapping / Alignment Mapping algorithms Ungapped Gapped - better for indel detection Mapping applications BWA Bowtie SOAP2 ELAND MAQ T-map • Pabinger et al. Briefings in Bioinformatics. Jan 2013

  20. Mapping / Alignment - QC P value assignment for each aligned read based on MAPPING QUALITY SCORE Base quality scores Position of mismatch Issues with mapping short reads Gaps due to true indels Heterogeneity in coverage across the genome – Poisson distribution • Next-Generation DNA Sequencing Informatics. Ed. Brown SM. Cold Spring Harbor Laboratory Press. 2013

  21. Variant identification • Pertains to detection of SNV, INDELs, structural variants, CNV • Different applications exists • Stand alone applications • Input – aligned reads (BAM / SAM) • Integrated with alignment process

  22. Variant identification • Variant callers • GATK • VCFtools • SAMtools • DiIndel • ATLAS-2 • CONTRA • ExomeCNV • BreakDancer • CLEVER • BreakPointer • ……. • Pabinger et al. Briefings in Bioinformatics. Jan 2013

  23. Variant identification - QC Variety of filters Depth of coverage Base quality score Mapping quality score Presence of gaps and homopolymer runs F/R bias • Next-Generation DNA Sequencing Informatics. Ed. Brown SM. Cold Spring Harbor Laboratory Press. 2013

  24. Annotation • Crucial step in the analysis pipeline • 1522648G>A ? • 8124526T>A • 44512584G>C • 55124785GA>CC • 2544856_2544860 AATGC .. • Public / custom databases • Nomenclature • Biological implication (Gene, transcript and protein level) • Genotype-phenotype correlations • Prognostic implication • Predictive implication

  25. Variant Classification

  26. Visualization Easily interpret the data

  27. User interface << Level 1 << Level 2 << Level 3 << Level 4 << Level 5

  28. User interface

  29. Result Reporting, Management and Sharing An area of active development in clinical laboratory No consensus yet in terms of format and data points CAP / CDC / ACMG / AMP – recommendations for reporting Major issues in clinical implementation of NGS Variant management LIS / NGS system interoperability Transmission of results to EMR Knowledgebase development Generation of data warehouse Proficiency testing

  30. Whole genome sequencing Whole exome sequencing Targeted sequencing

  31. Future direction Huge scope for NGS in research and clinical domain Better technology – Quantum bioinformatics? Better information management systems Large and highly curated public domain knowledgebase Better and affordable healthcare

  32. Thank You Questions

More Related