1 / 27

Global Variation in Copy Number in the Human Genome

Global Variation in Copy Number in the Human Genome. Nature, Genome Research, Genome Research, 2006. Speaker: Yao-Ting Huang. References. Redon et al . Global variation in copy number in the human genome . Nature , 2006.

lholcombe
Télécharger la présentation

Global Variation in Copy Number in the Human Genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global Variation in Copy Number in the Human Genome Nature, Genome Research, Genome Research,2006. Speaker: Yao-Ting Huang

  2. References • Redon et al. Global variation in copy number in the human genome. Nature, 2006. • Fiegler et al. Accurate and reliable high-throughput detection of copy number variation in the human genome. Genome Research, 2006. • Komura et al. Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. Genome Research, 2006. • Komura et al. Noise Reduction from genotyping microarrays using probe level information. In Silico Biology, 2006. • Price et al. SW-Array: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data. NAR, 2005.

  3. Copy Number Variation (CNV) • Copy Number Variation (CNV) is a DNA segment • with length at least 1kb and • presents at variable copy number compared with a reference genome. • The cause of a CNV is speculated due to non-allelic homologous recombination. • CNVs may disrupt genes, alter gene dosage, and confer risk to complex diseases such as HIV-1.

  4. Examples of CNVs (1) PaternalCopy # = 2 Maternal Copy # = 2 PaternalCopy # = 2 Maternal Copy # = 2 … ATAATAC … OffspringCopy # = 1 OffspringCopy # = 3 Deletion Duplication

  5. PaternalCopy # = 3 Maternal Copy # = 3 OffspringCopy # = 2 OffspringCopy # = 4 Examples of CNVs (2) • Hard to tell the actual type of a CNV even within a family. Mendelian inheritance,deletion, duplication.

  6. Use of Two Array Platforms • (1) Whole Genome TilePath array (93.7% of euchromatin); (2) Affymetrix 500KSNP array.

  7. Results • There are a total of 1,447 CNVs identified and merged from these two arrays. • 913 CNVs from tiling array and 980 CNVs from SNP genotyping array. • These CNVs cover 360Mb (12%) of the human genome. • The mean sizes of CNVs are 341kb in tiling array and 206 kb in SNP array. • The use of large insert clones (~170kb) on tiling array tends to overestimate the size of CNV.

  8. Strength and Weakness of these Two Arrays • The 500k SNP array is better for detecting smaller CNVs. • The tiling array has more power than SNP array in segmental-duplicated region.

  9. Location of CNVs • CNVs are preferentially located outside of genes and ultra-conserved elements.

  10. Other Results • 48% of gaps in the human genome assembly are flanked or overlapped by CNVs. • 24% of 1,447 CNVs are associated with segmental duplications. • A portion of segmental duplications are CNVs and thus will not be fixed in the human genome. • 12% of 1,447 CNVs are validated by locus-specific quantitative assay (e.g., quantitative PCR).

  11. Linkage Disequilibrium between bi-allelic CNVs and Tag SNPs Linkage disequilibrium between bi-allelic CNVs and flanking SNPs can guide the selection of tag SNPs. e.g, the copy number of CNV1 can be predicted by SNP2. A single SNP array is sufficient to detect both SNP and CNV.

  12. Linkage Disequilibrium between bi-allelic CNVs and Tag SNPs Linkage disequilibrium between bi-allelic CNVs and flanking SNPs can guide the selection of tag SNPs. e.g., Suppose SNP2 is selected as tag SNP. SNP2 SNP2 C A Copy # = 3 Copy # = 2

  13. Results of Linkage Disequilibrium around bi-allelic CNVs • 51% of CNVs in non-African populations have tag SNPs, whereas only 22% of CNVs in African population can be tagged. • Duplications would generate linkage disequilibrium at acceptor locus instead of donor locus. • The Phase I HapMap project has a paucity of SNPs in segmental-duplicated regions, where their CNVs are enriched. • Given false-positive CNVs inside and the uncertainty of CNV boundary, these results are bias (Conrad et al, Nat. Genet., 2006).

  14. SNP1 C or A Copy # = 0, 1, 2, or 3 Linkage Disequilibrium around multi-allelic CNVs • Linkage disequilibrium between multi-allelic CNVs and each flanking SNP are computed by square of Pearson’s correlation coefficient. • No SNPs with strong linkage disequilibrium are found. • Mistakes in comparing bi-allelic SNP with multi-allelic CNV.

  15. Lunch Break - Method Intensity preprocessing CNV detection Copy number inference

  16. Intensity Preprocessing • The signal intensity could be skewed due to • length of restriction enzyme fragment, • GC content of the probe sequence, • GC content of the restriction fragment, or • Affinity differences of different SNP genotypes (e.g., AA, AC, CC). • Probe selection, noise reduction, and normalization are done at this stage (Komura et al, In Silico Biology, 2006).

  17. Relative gain of copy No copy number change Relative loss of copy CNV Detection • For each pair of samples, we can test the relative intensity ratio at each SNP position. 2 1 0 Log2 intensity ratio -1 -2 1 2 3 4 5 6 7 8 9 10 … 45 … 65 … SNP position

  18. CNV Detection • CNV is detected by finding clusters of sufficiently high (or low) ratios. 2 1 0 Log2 intensity ratio -1 -2 1 2 3 4 5 6 7 8 9 10 … 45 … 65 … SNP position

  19. CNV Detection • The intensity ratios at all SNPs can be regarded as a sequence of real numbers. • We seek for a consecutive subsequence of maximum sum. Log2 intensity ratio SNP position 0, 0.54, 1.21, 0.26, 2.34, …, 0, 0.1, -1.43, -0.2, …,-2.4, -2.6, -1.83

  20. CNV Detection A dynamic programming algorithm called SW-Array is used to find the subsequence (NAR, 2005). This algorithm has been proposed by Bentley in 1984. 0, 0.54, 1.21, 0.26, 2.34, …, 0, 0.1, -1.43, -0.2, …,-2.4, -2.6, -1.83 P1 P2P3P4P5 …

  21. Copy Number Inference • These clusters implies a putative CNV. • But we still don’t know the exact copy number. 2 1 0 Log2 intensity ratio -1 -2 1 2 3 4 5 6 7 8 9 10 … 45 … 65 … SNP position

  22. Pairwise Comparison for All Samples • The above algorithm is repeated for each pair of samples. Sample a / Sample b

  23. Copy Number Inference • The largest group of samples with the same copy number is called a diploid group. • This diploid group is used as a reference representing two copies. • They assume the mutation events are rare, and thus two copies should present highest frequency in the population.

  24. Steps of Copy Number Inference

  25. Copy Number Inference • Samples c, d, and e are the largest group.

  26. Copy Number Inference • The copy numbers of samples a and b are inferred by comparing its intensity ratio with the average ratio of the diploid group.

  27. Concluding Remarks • The authors identify 1,447 CNVs using whole genome tiling and SNP genotyping arrays. • Given the low resolution of their arrays and flawed methods, I believe JJ’s results should be much more promising. • Linkage disequilibrium between CNVs and SNPs requires more sophisticated statistics and algorithms.

More Related