250 likes | 539 Vues
On genome-wide association studies (GWAS). association linkage disequilibrium population structure. case/control design single nucleotide polymorphism data. TTCAGTCAGATCC T AGCCC. Chromosome 1. TTCAGTCAGATCC C AGCCC. Chromosome 2. AAGTCAGTCTAGG G TCGGG. SNP. AAGTCAGTCTAGG A TCGGG.
E N D
association • linkage disequilibrium • population structure
case/control design • single nucleotide polymorphism data
TTCAGTCAGATCCTAGCCC Chromosome 1 TTCAGTCAGATCCCAGCCC Chromosome 2 AAGTCAGTCTAGGGTCGGG SNP AAGTCAGTCTAGGATCGGG
Population structure explained part of the significant +11.2% inflation of test statistics we observed in an analysis of 6,322 nonsynonymous SNPs in 816 cases of type 1 diabetes and 877 population-based controls from Great Britain. The remainder of the inflation resulted from differential bias in genotype scoring between case and control DNA samples, which originated from two laboratories, causing false-positive associations. Nature Genetics37, 1243 - 1246 (2005) Published online: 9 October 2005; | doi:10.1038/ng1653 Population structure, differential bias and genomic control in a large-scale, case-control association study David G Clayton1, Neil M Walker1, Deborah J Smyth1, Rebecca Pask1, Jason D Cooper1, Lisa M Maier1, Luc J Smink1, Alex C Lam1, Nigel R Ovington1, Helen E Stevens1, Sarah Nutland1, Joanna M M Howson1, Malek Faham2, Martin Moorhead2, Hywel B Jones2, Matthew Falkowski2, Paul Hardenbol2, Thomas D Willis2 & John A Todd1
Genomic Control (Devlin and Roeder) • premise: pop structure causes variance inflation of test statistic under null • Y_i^2 ~ chi-square(1) ideally • Y_i^2 ~ inflation factor lambda * chi-square(1) • so use T_i = Y_i^2/lambda.hat • lambda.hat = median(Y_i^2)/[ null median ]
Handling population structure • genomic control (Devlin & Roeder) • structured association (Pritchard et al) • principal components (Price et al)
Article Nature447, 661-678 (7 June 2007) | doi:10.1038/nature05911; Received 26 March 2007; Accepted 11 May 2007 Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium
UK population; european ancestry • seven diseases; 50 research groups (BD, CAD,CD,HT,RA,T1D,T2D) • 2000 cases per disease • 3000 common controls (two distinct sets) • Affymetrix 500K mapping array set
Quality Control • 16179 samples included (809 dropped considering contamination, non-Caucasian ancestry) • 469,557 SNPs included (93.8%) • Average call rate 99.63% • 392,575 have MAF > 1%
There may be important population structure that is not well captured by current geographical region of residence. Present implementations of strongly model-based approaches such as STRUCTURE11, 12 are impracticable for data sets of this size, and we reverted to the classical method of principal components13, 14, using a subset of 197,175 SNPs chosen to reduce inter-locus linkage disequilibrium. Nevertheless, four of the first six principal components clearly picked up effects attributable to local linkage disequilibrium rather than genome-wide structure. The remaining two components show the same predominant geographical trend from NW to SE but, perhaps unsurprisingly, London is set somewhat apart
The overall effect of population structure on our association results seems to be small, once recent migrants from outside Europe are excluded. Estimates of over-dispersion of the association trend test statistics (usually denoted ; ref. 15) ranged from 1.03 and 1.05 for RA and T1D, respectively, to 1.08–1.11 for the remaining diseases. Some of this over-dispersion could be due to factors other than structure, and this possibility is supported by the fact that inclusion of the two ancestry informative principal components as covariates in the association tests reduced the over-dispersion estimates only slightly (Supplementary Table 6), as did stratification by geographical region. This impression is confirmed on noting that P values with and without correction for structure are similar (Supplementary Fig. 9). We conclude that, for most of the genome, population structure has at most a small confounding effect in our study, and as a consequence the analyses reported below do not correct for structure. In principle, apparent associations in the few genomic regions identified in Table 1 as showing strong geographical differentiation should be interpreted with caution, but none arose in our analyses.