Download
exome sequencing data analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Exome Sequencing Data Analysis PowerPoint Presentation
Download Presentation
Exome Sequencing Data Analysis

Exome Sequencing Data Analysis

151 Vues Download Presentation
Télécharger la présentation

Exome Sequencing Data Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Exome Sequencing Data Analysis Chunhua Yan, Ph.D. NCI Center for Biomedical Informatics and Information Technology (CBIIT) 3/18/2015

  2. Sample preparation Genomic DNA Exome sequencing workflow Library construction Sequencing Illumina, Roche, Life Tech. Raw reads FASTQ, QC report Bwa-Mem, Novoalign, SAMtools Mapped reads Bam files, Alignment stats MuTect, VarScan, GATK Possible mutations and indels ANNOVAR, MutSig, HotNet, MEMo Annotated and filtered functional variants Manual review, Target re-sequencing Confirmed novel functional variants

  3. Next generation sequencing (NGS) • Platform • Productivity • Exomesequencing • Experimental design • Mutation study resources

  4. Sanger sequencing was the 20th century technology

  5. Next generation sequencing technology is just a decade old

  6. Comparison of sequencing methods https://www.qiagen.com/us/resources/e-learning/webinars/next-generation-sequencing/

  7. DNA sequencing – The past decade Mardis, E.R. (2011) Nature. 470(7333):198-203. .

  8. http://www.genome.gov/sequencingcosts/

  9. http://dnatech.genomecenter.ucdavis.edu/next-generation-sequencing/http://dnatech.genomecenter.ucdavis.edu/next-generation-sequencing/

  10. Next generation sequencing (NGS) • Exome sequencing • Benefit • Capture technology • Experimental design • Mutation study resource

  11. NextGenerationSequencing Applications Genomics Transcript- omics Epi- genomics • Dna-seq • ChIP-Seq, Methyl-Seq • RNA-seq • Global mapping of DNA-protein interactions • DNA methylation • Histone modification • Expression level • Novel transcripts • Fusion transcripts • Splice variants • Mutation, SNVs, • Indels, • CNVs, • Translocation

  12. Whole Exome Sequencing, Why? • Focuses on the part of the genome we understand best, the exons of genes • About 85% of known mutations in Mendelian diseases affect the exome • Nonsense, missense, splice, indel mutations • Depending on the annotation and coverage of flanking sequencing: ~35-60Mb => 1-2% of human genomes • There are ~200,000 coding exons in ~20,000 genes • A whole exome is 1/6 the cost of whole genome and 1/15 the amount of data Biesecker, L. et al. (2011) Genome Biology 12:128

  13. Exomesequencing balances the coverage and cost Sanger Targeted Exome Whole Genome • Accurate • Cheapperexon • Highturn‐around • Optimizationpossible • Lowchance ofincidentalfindings • Easy analysis • Nobiasforgenes • Standardizedworkflow • Re‐useofperformedexomesto interpretnew ones • No sequencing bias • Detect SVs and SNVs • Lowdiagnosticyieldforgenetically heterogeneousdiseases • Designandre‐designrequired • Differentdesignsfordifferentdisorders • No non-coding regions • Sequencing bias • Incidental findings • Data analysis bottleneck • Interpretation of non-coding regions • Expensive, time-consuming http://webinar.sciencemag.org/webinar/archive/exome-sequencing-today%E2%80%99s-lab

  14. Exome sequencing detects mutations http://www.stjude.org

  15. Somatic mutation calls require tumor-normal paired samples normal tumor

  16. The SureSelect Target Enrichment workflow http://www.genomics.agilent.com/article.jsp?pageId=3083

  17. Comparison of four exomecapture systems Chilamakuri, C. et al. (2014) BMC Genomics. 15:449..

  18. Coverage of exome capture systems (33.3Mb coding exons) Chilamakuri, C. et al. (2014) BMC Genomics. 15:449..

  19. Next generation sequencing (NGS) • Exomesequencing • Experimental design • Sample size • Sequencing coverage • Mutation study resource

  20. Complexity of the cancer genome OVCAR-3, NCI60 cell line, Ovarian cancer http://www.ncbi.nlm.nih.gov/projects/sky/skyquery.cgi

  21. WholeexomeDNAsources • Tumor DNA • Fresh frozen (FF) • Paraffin embedded tissue (FFPE) • Cell line • Xenograft • Normal tissue • Blood • Neighboring tissue • Clinical information • DOB • Date of diagnosis • Malignancy stage • Location of primary tumor • Location of metastatic tumor • Therapies

  22. The Number of samples needed to detect significantly mutated genes Lawrence, M.S. et al. (2014) Nature. 505(7484):495-501

  23. Variants detected in exome sequencing data from the paired FF/FFPE samples Hedegaard, J. et al. (2014) PLoSONE 9(5): e98187.

  24. Sequencing terminology 1. Insert 2. Read 3. Single Read (SR) 4. Paired End (PE) 5. Multiplexing 6. Flowcell 7. Lane Normand, R. et al. (2013) Methods Mol Biol. 1038:1-26.

  25. Sequencing Coverage Average coverage = read length * number of mapped reads/genome size Normand, R. et al. (2013) Methods Mol Biol. 1038:1-26.

  26. Diploid genome and coverage Wendl, M.C. et al. (2008) BMC Bioinformatics. 9:239.

  27. Polyploid genome and coverage Wendl, M.C. et al. (2008) BMC Bioinformatics. 9:239.

  28. Single cell exome sequencing demonstrates the sample heterogeneity Triple-negative breast cancer (TNBC) Wang, Y. et al. (2014) Nature. 512(7513):155-60.

  29. Certain mutations only occur in a subset of TNBC cell populations Wang, Y. et al. (2014) Nature. 512(7513):155-60.

  30. Clonal evolution in the TNBC inferred from single cell exomeand copy number data Wang, Y. et al. (2014) Nature. 512(7513):155-60.

  31. High coverage is needed for low tumor fraction samples. Ding, L. et al. (2014) Nat Rev Genet. 15(8):556-70

  32. Mutation detection sensitivity as a function of sequencing depth and allelic fraction Cibulskis, K. et al. (2013) Nat Biotechnol. 31(3):213-9.

  33. Steps to Bring in Projects to CCR–SF Make sure to consult your bioinformatics for experiment design https://ostr.cancer.gov/sequence/request Courtesy of Yongmei Zhao, CCR-SF

  34. Next generation sequencing (NGS) • Exomesequencing • Experimental design • Mutation study resources • Genome in a Bottle • DREAM mutation challenge

  35. Genome in a Bottle Consortium • No widely accepted set of metrics to characterize the fidelity of variant calls from NGS… • Genome in a Bottle Consortium is developing standards to address this… • well-characterized human genomes as Reference Materials (RMs) • characterized and disseminated by NIST • tools and methods to use these RMs • Global Alliance for Genomics and Health Benchmarking Team http://genomeinabottle.org

  36. The data sets for NA12878 are available at the Genome in a Bottle ftp site at NCBI ftp://ftp-trace.ncbi.nih.gov/giab/ftp/data/NA12878 Zook, J.M. et al. (2014) Nat Biotechnol. 32(3):246-51.

  37. Integration Methods to Establish Benchmark Variant Calls Zook, JM et al (2014) Nat Biotechnol. 32(3):246-51. http://www.slideshare.net/GenomeInABottle/presentations

  38. ~2.7M high confident snps are detected by multiple algorithms Exome SNPs Exome SNPs Zook, J.M. et al. (2014) Nat Biotechnol. 32(3):246-51.

  39. The mutation caller performance varies drastically 16 LUSC tumor-normal exome-seq pairs Kim, S.Y., Speed, T.P. (2013) BMC Bioinformatics. 10;14:189.

  40. http://dreamchallenges.org/

  41. Challenge Data and Assessment http://dreamchallenges.org/

  42. In silico challenge #4 (SEPT, 2014) Tumor sample data, early 2015. http://dreamchallenges.org/

  43. What can we learn from exome sequencing Mudvari, P. et al. (2013) J Metabolomics Syst Biol. 1(1). pii: 7.