Download
an introduction to next generation sequencing n.
Skip this Video
Loading SlideShow in 5 Seconds..
An Introduction to Next Generation Sequencing PowerPoint Presentation
Download Presentation
An Introduction to Next Generation Sequencing

An Introduction to Next Generation Sequencing

956 Vues Download Presentation
Télécharger la présentation

An Introduction to Next Generation Sequencing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. An Introduction toNext Generation Sequencing Hanlee Ji, M.D. ã Stanford University

  2. Overview • Principles of next generation DNA sequencing • Analysis of genetic variation and research applications

  3. Advances in DNA sequencing technology M. Stratton et al.Nature 458 (2009)

  4. Applications • Identifying genetic variants • Whole genome • Exome • Subsets • Transcriptomes (e.g. RNASeq) • Chip-seq • Epigenomes (methylation) • Many others!

  5. Sequencing-by-synthesis • Individual DNA molecules from a “sequencing library”. • Sequencing via multiple cycles of nucleotide incorporation. • Solid phase support • High density reads using a photodetector (i.e. CCDS) or solid state system • Images per cycle provides sequence data. J. Shendure and Ji. Nat Biotech (2008)

  6. Sequencing-by-ligation • Complete Genomics • DNA nanoballs from circles • Combinatorial probe anchor ligation • 10 base reads adjacent to 8 anchor sites • 31- to 35-base mate-paired reads Dramanac et al. Science(2010)

  7. Solid state detection of DNA synthesis “Nanowell” solid-state detection Rothberg et al. Nature(2011)

  8. Single molecule sequencing “nanowell” sequencing-by-synthesis New technologies • Single molecule detection • Pacific Biosciences • Sequencing by synthesis • Single base incorporation

  9. Nanopore sequencing • DNA inserted in a nanopore in lipid membrane • speed control provided by a phi29 DNA polymerase • Translocation via an electrical field and polymerase DNA sequence via changes in the ionic current

  10. Issues with next generation DNA sequencing • Higher sequencing error rates • <0.1 to 10% or greater depending on sequencing chemistry and configuration • Systematic bias based on approach • Short sequence reads (<250 bases) • Massive data output • Data storage anagement • Variant calling analysis

  11. DNA sequencing library preparation. Processing of sequence reads Types of reads (e.g. “mate pairs”) Alignment Fold coverage Assembly Variant calling Aspects of sequencing next generation sequencing

  12. Overview of the process in whole genome sequencing D Koboldt et al.Briefings in Bioinformatics (2010)

  13. Sequencing library preparation – 454 system

  14. Sequencing process

  15. Sequencing data generation and analysis D Koboldt et al.Briefings in Bioinformatics (2010)

  16. Quality metrics to improve variant calls • Sequencing fold coverage based on alignment. • Higher fold coverage required in cancer genomes • Elimination of duplicate reads. • Bottlenecks which propagate errors from DNA amplification. • Using high quality base calls • Quality scores 30 or higher • Repeat sequences in genomes. • Significance or confidence values for variants

  17. DNA sequence data format and visualization • Sequence alignment map (SAM) • Viewing “pileups”

  18. Genetic variation • Point mutations • Nonsynonymous versus synonymous • Insertion / deletions (indels) • Copy number variations (CNVs) • Structural variants (SV) • Intrachromosomal • Large indels • Duplications • Inversions • Interchromosomal • Balanced translocations • Imbalanced translocations

  19. Single nucleotide variants from cancer genomes P Sohrab et al.Nature, 461 (2010)

  20. Variant callers • Genome Analysis Toolkit • Varscan • SAMTools • SNVmix • Others…

  21. Single nucleotide mutations • Silent = synonymous • Substitution = nonsynonymous • Nonsense = premature stop http://commons.wikimedia.org

  22. Transitions versus transversion mutations Transition Transversion • Transition • A <-> G • C <-> T • Transversions • A <-> T • A <-> C • G <-> T • G <-> C Transversion Transversion Transition Transversion Ding et al.Nature (2010)

  23. Small insertion and deletions

  24. Targeting strategies for resequencing genomic subsets In-solution capture (e.g. molecular inversion probes) Array-based hybridization capture In-solution Hybridization capture

  25. a Preparation Processing Sequence Data Target-specific oligonucleotide Flow cell Rapid targeted mutation analysis from cancer genomes Single-adaptor library b STEP 2 Target capture STEP 3 Cluster preparation STEP 1 Primer-probe preparation Hybridization, extension and denaturation Immobilized DNA Primer-probe Immobilized Primer ‘D’ Sequencing Primer 1 Immobilized Primer ‘C’ Sequencing Primer 2

  26. “Onconomic” diagnostic mutations analysis • Rapid mutation for point-of-care analysis • Analysis of identified cancer drivers • Determination of pathogenic mutations • Example, nonsense mutation in SMAD4 Normal Tumor

  27. Visualizing sequence 1.5 Mb region on Chromosome 18 SNP genotyping

  28. Whole genome sequencing M. Stratton et al.Nature, 458 (2009)

  29. ..GATC..ERROR..TTCCAA.. A needle in a human genome haystack? • A human genome has 23 chromosomes. • 6 billion individual DNA basepairs per genome. • A single basepair error can be a disease mutation. X

  30. Exome sequencing M. Clark et al.Nature Biotechnology (2012)

  31. A cancer family pedigree AP 43 y/o 42 y/o Male Female No Cancer Colorectal Cancer Colon Polyps

  32. AP Assessment of a cancer family – unaffected versus affected Father Mother

  33. 1 2 3 4 5 6 7 8 10 11 etc. Father 1 2 3 4 5 6 7 8 9 10 11 etc. 1 2 3 4 5 6 7 8 9 10 11 etc. Exome sequencing analysis for identifying inherited disease AP’s unique family variants 9 • Identify the variants unique to the affected members. Mother AP

  34. Interpretation of genetic variants IDH1 mapping of Arg132 cancer mutation • Substitutions translation bioinformatically • SIFT - probability that a substitution is tolerated • < 0.05 is deleterious. • PolyPhen – categorical definitions • "benign", "possibly damaging" and "probably damaging” • Protein structural mapping Parson et al., Science, (2008)

  35. Sequence assembly • Assembling fragments of random sequence to form a set of larger contiguous sequences (contigs). • Used to assemble de novo genomes of new organisms. • Useful for reconstruction regions of high complexity such as SVs. Zerbino DR, Birney E, Genome Research, 18 (2010)

  36. Metagenomic characterization of bacterial flora

  37. Copy number from genome sequencing • Genome shotgun sequencing comparison. • Copy number variation derived directly from sequence reads. • 15 Kb windows with sequence tag counting Campbell et al., Nature Genetics, (2008)

  38. Copy number variations (CNVs) from genomic sequencing Breast cancer – Chromosome 1 Genomic sequence analysis Array CGH CNV analysis

  39. Structural variations in human genomes Deletion Duplication Inversion Intrachromosomal Insertion Translocation Interchromosomal http://commons.wikimedia.org

  40. 300 nts 300 nts Structural variation Normal • Mate pair sequences dependent on the genomic DNA insertion size (population). Intact region Exon in Exon in+1 Tumor Deleted region Exon in

  41. Genomic deletion analysis Normal • Breast cancer genome sequencing. • Mate pair sequences used in indel analysis. • Changes in the location of mapped reads that are not concordant with the sequencing library insert size. Primary Metastasis Xenograft Ding et al, Nature, (2010)

  42. Structural variants from small cell lung cancer genome Duplication Inversion Campbell et al., Nature Genetics, (2008)

  43. Translocations in colorectal cancer genomes • Balanced tranlsocations between chr 8 and 20 p arms • Structural changes can only be delineated based on sequencing Bass et al., Nature Genetics, (2011)

  44. Cancer transcriptome sequencing (RNASeq) • Mate pair analysis from prostrate cancer mRNA • Identification of reads indicating gene fusions. N Palanisamy et al, Nat Med 16 (2007)

  45. Sequenced cancer genomes – nonsmall cell lung Lee et al.Nature465, (2010)

  46. Whole genome analysis of colorectal cancer • Cancer Genome Atlas analysis of colon adenocarcinoma • “Circos” plots of whole genome data

  47. Gene expression and RNASeq

  48. CHIP-Seq

  49. Ultrasensitive mutation detection • Robust detection of 1 mutant allele from 1,000 wildtype alleles in heterogeneous mixtures • Application to viral infections • Analysis of cancer point mutations Flaherty et al., Nucleic Acids Research, 2012

  50. Deep resequencing for rare variants • Derived from reassortment of swine and human flu in swine • More than 214 countries in 2009 • More than 622,482 infections confirmed • 18,449 deaths confirmed by WHO Smith et al., Nature 2009