1 / 39

Genome-wide Copy Number Analysis

Genome-wide Copy Number Analysis. Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine 02 - 08 – 2006 Course: M 21-621 Computational Statistical Genetics. Four Questions. What is Copy Number ?

debra
Télécharger la présentation

Genome-wide Copy Number Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome-wide Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University School of Medicine 02 - 08 – 2006 Course: M 21-621 Computational Statistical Genetics

  2. Four Questions • What is Copy Number ? • What can Copy Number tell us? • How to measure/quantify Copy Number? • How to analyze Copy Number?

  3. What is Copy Number ? • Gene Copy Number The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells. For instance, the EGFR copy number can be higher than normal in Non-small cell lung cancer. …Elevating the gene copy number of a particular gene can increase the expression of the protein that it encodes. From Wikipedia www.wikipedia.org

  4. DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a DNA fragment that is ~1 kilobases or larger. From Nature Reviews Genetics, Feuk et al. 2006 • DNA Copy Number≠ DNA Tandem Repeat Number (e.g. micro satellites) <10 bases • DNA Copy Number≠RNA Copy Number • RNA Copy Number = Gene Expression Level DNA transcription mRNA • Copy Numberis the amount of copies of a particular fragment of nucleic acid molecular chain. It refers to DNA Copy Number in most publications.

  5. What can Copy Number tell us? • Genetic Diversity/Polymorphisms - restriction fragment length polymorphism (RFLP) - amplified fragment length polymorphism (AFLP) - random amplification of polymorphic DNA (RAPD) - variable number of tandem repeat (VNTR; e.g., mini- and microsatellite) - single nucleotide polymorphism (SNP) - presence/absence of transportable elements … - structural alterations (e.g., deletions, duplications, inversions … ) - DNA copy number variant (CNV) Association with phenotypes/diseases genes/genetic factors

  6. Normal cell CN=2 Homologous repeats Segmental duplications Chromosomal rearrangements Duplicative transpositions Non-allelic recombinations …… Tumor cells deletion amplification CN=0 CN=1 CN=2 CN=3 CN=4 Genetic Alterations in Tumor Cells (DNA Copy Number Changes)

  7. Quantitative Polymerase Chain Reaction (Q-PCR) : DNA Amplification (dNTPs, primers, Taq polymerase, fluorescent dye) PCR less CN amplification less DNA low fluorescent intensity more CN amplification more DNA high fluorescent intensity (one fragment each time) • Microarray : DNA Hybridization (dNTPs, primers, Taq polymerase, fluorescent dye) PCR less CN amplification less DNA arrayed probes low intensities more CN amplification more DNA arrayed probes high intensities (multiple/different fragments, mixed pool) Hybridization How to measure/quantify Copy Number?

  8. Tumor Normal Affymetrix Mapping 250K Sty-I chip ~250K probe sets ~250K SNPs probe set (24 probes) CN=2 CN=2 CN=2 Deletion CN=1 CN=0 CN>2 Deletion Amplification more DNA copy number more DNA hybridization higher intensity Microarray: From Image to Copy Number

  9. ~400 cancer patients Normal tissue & tumor tissue (~400 pairs, ~800 DNA samples) Affymetrix 250K Sty-I Human Mapping SNP Array DNA hybridization signals (intensities on chip images) Genotype calling SNP genotypes LOH analysis DNA copy number analysis (genotypic changes) (DNA copy number changes) How to Analyze Copy Number? • A Real Example ?

  10. Finished chips (scanner) Raw image data [.DAT files] (experiment info [ .EXP]) (image processing software) Probe level raw intensity data [.CEL files] Background adjustment, Normalization, Summarization Summarized intensity data Raw copy number (CN) data [log ratio of tumor/normal intensities] Significance test of CN changes Estimation of CN Smoothing and boundary determination Concurrent regions among population Amplification and deletion frequencies among populations Association analysis chip description file [.CDF] Preprocessing : • General Procedures for Copy Number Analysis

  11. Background Adjustment/Correction Reduces unevenness of a single chip Makes intensities of different positions on a chip comparable Before adjustment After adjustment Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B) For each region i, B(i) = Mean of the lowest 2% intensities in region i AffyMetrix MAS 5.0

  12. Background Adjustment/Correction Eliminates non-specific hybridization signal Obtains accurate intensity values for specific hybridization sense or antisense strands 25 oligonucleotide probes quartet probe set PM only, PM-MM, Ideal MM, etc.

  13. S – Mean of S S’ = STD of S S’ ~ N(0,1 ) Base Line Array (linear); Quantile Normalization;Contrast Normalization; etc. Normalization Reduces technical variation between chips Makes intensities from different chips comparable Before normalization After normalization

  14. Summarization Combines the multiple probe intensities for each probe set to produce a summarized value for subsequent analyses. Average methods: PM only or PM-MM, allele specific or non-specific Model based method : Li & Wong , 2001 Gene Expression Index

  15. after Log transformation Log(S) before Log transformation S S : Summarized raw intensity S’ : Log transformation, S’ = log2(S) Raw CN: Log ratio of tumor / normal intensities CN = S’tumor - S’normal = log2(Stumor/Snormal) Pair design Snormal = S of the paired normal sample Group design Snormal = average S of the group of normal samples Raw CN Raw Copy Number Data

  16. Individual Level Analysis Analysis for each individual sample (or each sample pair) • Significance test of CN amplification and deletion • Boundary finding (smoothing and segmentation) • CN estimation

  17. Intensities and Raw CNs, Chr. 1 (Piar#101)Black: Normal, Red: Tumor, Green: Tumor- Normal

  18. Window-based t test Window size = 0.5 Mbp (~30 SNPs); N = SNP number in window Mean CN of window t = X N ~ t (df=N -1) SD of widow -log(p) Window Position (Mbp) Significance Test for Copy Number Changes: -log(p) values, chr. 1, pair#101

  19. Genome-wide Raw CN Changes (Piar#105)

  20. Genome-wide Widow-based Test of CN Changes (Piar#105) - Log (p)

  21. SegmentationBioConductor R Packages (www.bioconductor.org)GLAD package, adaptive weights smoothing (AWS) methodDNAcopy package, circular binary segmentation method

  22. … SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … CN=? CN=? CN=? CN=? CN=? log ratio log ratio log ratio log ratio log ratio CN Estimation: Hidden Markov Model (HMM)CNAT(www.affymetrix.com); dChip (www.dchip.org) ; CNAG (www.genome.umin.jp) position hidden status (unknown CN ) observed status (raw CN = log ratio of intensities) CN estimation:finding a sequence of CN values which maximizes the likelihood of observed raw CN. Algorithm: Viterbi algorithm (can be Iterative) Information/assumptions below are needed Background probabilities: Overall probabilities of possible CN values. P(CN=x); x=-2,-1,0,1,2,3,…, n (usually,n<10) Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one. P(CN_i+1=x|CN_i=y); x=-2,-1,0,1,2,3,…, or n; y=-2,-1,0,1,2,3, …, or n Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status. P(log ratio<x|CN=y)=f(x|CN=y); x=one of real numbers; y=-2,-1,0,1,2,3, …, or n

  23. CN=4 CN=3 CN=2 CN=1 HMM Estimation of CN for Chr. 1 (Piar#101)Black: Normal Intensities, Red: Tumor Intensities, Green: Tumor- Normal Blue: HMM estimated CNs in Tumor Tissue

  24. Population Level Analysis Analysis for the whole group (or sub-group) of samples • Overall significance test • Amplification and deletion frequencies summarization • Common/concurrent region finding • Associations (with mutations, LOHs, clinical variables …)

  25. Genome-wide Raw CN Changes(average over ~400 pairs )

  26. Raw CN Changes of Chr. 14(average over ~400 pairs )

  27. … .. … … . . . . .. …… …… .. … … . . . . .. …… … .. …… … .. Window k Window N Window 10 Window 9 Window 6 Window 8 Window 4 Window 3 Window 2 Window 1 Window 7 Window 5 ……….. ……….. Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29) Sliding Window Analysis

  28. Genome-wide Raw Copy Number Changes(sliding window plot, averaged over ~400 pairs )

  29. Sliding Window Test of Significance of CN Changes -log(p) values, based on ~ 400 pairs

  30. CN Change Frequencies in Population( Chr.14,~400 pairs)Black: Freq.(CN>0) Red: Freq.(CN>0, significant amplification at 0.01 level) Green: Freq.(CN<0, significant deletion at 0.01 level)

  31. Population Level Segmentation Analysis (~400 pairs)Circular Binary Segmentation approach, Bioconductor Package DNAcopy

  32. Segmentation of Chr. 14(average result of ~400 pairs)

  33. Visualization of Concurrent Regions of Chr. 14(~400 pairs) samples positions

  34. Group-specific AnalysisBlack: non-smokers, Red: non-smokers

  35. Separate Tumor Samples from Normal Samples Using Six Chromosomal Peaks with Significant CN Changes (Classification Based on RAW CN) Tumor Normal

  36. Mapping Known Cancer-related Genes onto the Copy Number Map

  37. Software Affymetrix Chips (www.affymetrix.com) Illumina Chips (www.illumina.com) CNAT(www.affymetrix.com); dChip (www.dchip.org) ; CNAG (www.genome.umin.jp) GenePattern www.broad.mit.edu/cancer/software/genepattern/ BioConductor R Packages (www.bioconductor.org) GLAD package, adaptive weights smoothing (AWS) method DNAcopy package, circular binary segmentation method Widows ? Unix ? Parallel Computation ?

  38. References • R Gentlemen et al. Bioinformatics and computational biology solutions using R and Bioconductor. Springer, 2005 • JL Freeman et al. Genome Research 2006; 16:949-961 • J Huang et al. Hum Genomics. 2004;1(4):287-99 • X Zhao et al. Cancer Research 2004; 64:3060-3071 • Y Nannya et al. Cancer Research 2005, 65: 6071-6079 • … see google …

  39. Acknowledgements Aldi Kraja Li Ding Ingrid Borecki John Osborne Michael Province Ken Chen Division of Statistical Genomics Medical Sequencing Group Center for Genome Sciences Washington University School of Medicine

More Related