250 likes | 456 Vues
Population Approaches to Detecting and Genotyping Copy Number Variation. Lachlan Coin July 2010. Outline. Population-haplotype approach to CNV detecting and genotyping Application to SNP and CGH data Application to NGS sequence data. cnvHap approach to CNV discovery and genotyping.
E N D
Population Approaches to Detecting and Genotyping Copy Number Variation Lachlan Coin July 2010
Outline • Population-haplotype approach to CNV detecting and genotyping • Application to SNP and CGH data • Application to NGS sequence data
cnvHap approach to CNV discovery and genotyping Coin et al, 2010, Nature Methods 7, 541 - 546 (2010)
cnvHap models haploid CN transitions copy number to • Specify an per-base global transition rate matrix 0 1 2 3 4 q00 q10 …. 0 1 2 3 4 … copy number from • Rate matrix multiplied by position specific scalar rate • Values trained using EM, following the approach of Klosterman et al, used in Xrate for finding substitution rates
Cluster positions modelled using a linear model Model fitted using Ridge regression carried at each iteration of E-M algorithm
Combined Illumina and Agilent arrays Illumina Agilent Illumina Agilent Illumina Agilent
Improved CNV genotyping accuracy Cumulative Frequency of Squared Pearson Correlation
MLPA probes Segmental duplication +1 0 log2 ratio - 1 - 2 - 3 28.9 Mb 29.2 Mb 29.5 Mb 29.8 Mb 30.1 Mb 30.4 Mb 30.7 Mb q21 q12.2 q23.1 p12.3 p12.1 q22.2 p13.2 p11.2 q23.3 q24.2 p13.12 chromosome 16 A deletion at 16p11.2 in a patient with ‘extreme obesity’ • estimated by aCGH to be 546kb-700kb • flanked by segmental duplication (>99% sequence identity) • probably arises by NAHR, implying deletion is 739kb • BMI = 29.2 kg.m-2 at age 7½ • learning difficulties, delayed speech RG Walters et al.Nature463, 671-675 (2010) doi:10.1038/nature08727
Cohort Obese Lean/Normal Weight French child obesity case:control 4/643 0/530 British extreme early-onset obesity (SCOOP) 3/931 - French adult obesity case:control 4/705 0/669 French bariatric surgery patients 2/141 - Swedish discordant siblings 2/159 0/140 Population cohorts(NFBC1966, CoLaus, EGPUT) 3/1592 1/6235 16p11.2 deletions in obesity and population cohorts Obesity: P = 5.8x10-7 OR = 29.8 [3.9–225] Morbid obesity: P = 6.4x10-8 OR = 43.0 [5.6–329]
Loess curves fit to remove residual spatial variation of coverage
Detecting CNVS with NGS data Depth/haploid coverage B-allele frequency
NGS versus CGH data NGS data chrom1:350mb-351mb CGH data chrom1:350mb-351mb
NGS amplification Depth/coverage
Polyploid phasing and imputation Switch error rate Imputation error rate
Conclusions • Population-haplotype model enables joint CNV discovery and genotyping using array data • Preliminary results indicate this will also help using NGS data • Combining information from multiple platforms improves sensitivity • Imputation still works for ploidy > 2, phasing becomes more difficult
Acknowledgements Evangelos Bellos Shu-Yi Su Robin Walters David Balding (UCL) Rob Sladek (McGill) Julian Asher Alex Blakemore Adam de Smith Phillipe Froguel Julia El-Sayed Moustafa