420 likes | 828 Vues
Topic #3 Linkage Disequilibrium, Haplotypes & Tagging. University of Wisconsin Genetic Analysis Workshop June 2011. Overview. Fate of a new mutation Linkage Disequilibrium (LD) Measurement Indirect association SNP selection based on LD Haplotypes SNP selection by tagging
E N D
Topic #3Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011
Overview • Fate of a new mutation • Linkage Disequilibrium (LD) • Measurement • Indirect association • SNP selection based on LD • Haplotypes • SNP selection by tagging • Practical – SNP selection using Haploview
Introduction of a Mutation into a Population 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 TIME 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1
Haplotype Concept • The sequence 111212 in this location becomes a signature for the chromosome carrying the mutation • Haplotype – alleles inherited together at linked loci on the same chromosome • 111212 haplotype will not be a perfect marker of disease • At the time mutation arose, there may have been other chromosomes with 111212 • New mutations • Recombination
Indirect Association • Each of the alleles in the 111212 haplotype is also expected to be indirectly associated with carrying the mutation. • Indirect association is an association of a marker with phenotype that is non-causal, being based on linkage disequilibrium (LD)
Linkage Disequilibrium (LD) • Mendel’s Second Law: alleles at different loci assort independently • Linkage Disequilibrium (LD): population-level association of alleles at linked loci
How LD is Measured A locus: A1 or A2 B locus: B1 or B2 LD – population-level association between linked loci Let P(A1) = pA1 Let P(B1) = pB1 Let P(A1B1) = pA1B1 D = pA1B1 - pA1pB1 = 0 if independent
Common LD Measures • D = |d| • Preferred measure for population geneticists • Maximum value is bounded by the marginals • D’ = |d|/dmax • D’ varies between 0 and 1 • Does not have an easy interpretation and 1.0 is achieved if one off-diagonal is zero • r2 ( D2) = D2/p(1-p)q(1-q) • Has several interpretations: • = squared (phi) correlation so lies in [0,1]. • = c2/N • Directly related to power for indirect association
Allelic Association • Direct Association • Initially it was thought that we could pick the genes and the (single) genetic variant w/i each gene that was relevant for disease • Indirect Association • The existence of LD opens up the possibility of tests by indirect association – we don’t need to actually test the causal variant but rather need only genotype a marker that is in high LD with the causal variant
Indirect and Direct Allelic Association Direct Association Indirect Association D D M3 M1 M2 Assess relationship of D locus indirectly by determining whether markers (Mi) are associated with disease – Midon’t need to be functional Assess relationship of D locus to phenotype directly – expect D to be a functional polymorphism in a candidate gene
Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394
Dawson, E. et al. (2002). A first-generation LD map of 22. Nature 418: 544-547
Population Differences Weiss, K.M & Clark, A.G. (2002). Trends in Genetics, 18(1):19-24.
Recombination Hotspots Hotspots typically span 1-2 kb Kauppi, L., Jeffreys, A. J., & Keeney, S. (2004). Where the crossovers are: Recombination distributions in mammals. Nature Reviews Genetics, 5, 413-424
Two- and Three-locus Haplotypes APOE locus and haplotypes containing APOE Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394
Two- and Three-locus Haplotypes 3-locus haplotype stronger signal than individual markers Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394
SNP Selection by Tagging • Basic rationale: • The power for a causal SNP in a sample of size N is equivalent to power of tagging SNP in a sample of size N/r2 • Tagging SNP selection: • Based on some reference sample (HapMap) • Two overarching strategies • Pairwise tagging • Multimarker tagging de Bakker, P. I. W., et al. (2005). Efficiency and power in genetic association studies. Nature Genetics, 37(11), 1217-1223.
Reference Sample: HapMap(www.hapmap.org) • HapMap Phase 1: • SNP Selection Strategy (yield ~ 1 million): • >1 common SNP every 5 kb, total of 1.3 million before QC • MAF > .05 • Some priority for non-synonymous cSNPs • Sample: N=270 (269) individuals from 4 populations • 30 trios of Europeans from Utah (CEU) • 45 unrelated Han Chinese (CHB) • 45 unrelated Japanese (JPT) • 30 Yoruban trios from Nigeria (YRI)
Reference Sample: HapMap(www.hapmap.org) • Phase 2: • 2.1 million additional SNPs • Total now averages ~ 1/per kb; >98% of common variants w/i 5kb • Focus still on MAF > .05 • Average max r2 of untyped common SNPs to a typed SNP
Reference Sample: HapMap(www.hapmap.org) • Phase 3: • Expand to N=1115 in 11 ancestral groups 2.1 million additional SNPs * Sample consists of family triples
HAPMAP3, Release 2 Region in NCBI B36 COMT Phase, Release and Build
Using Haploview to Identify Tagging SNPs for COMT • Download Data from HapMap • Choose HapMap Download, Phase 3, and Release 2 • Choose population • Choose chromosome (22) and region (NCBI B36/hg18) • Transcription starts at 18309; I will start at 18304 • Transcription ends at 18337; I will end at 18340 • Haploview Analysis • Get LD plot • Run Tagger (pairwise) • Force include/exclude
LD Plot Available from SNPInfo (http://manticore.niehs.nih.gov/)
Conclusions • Alleles at linked loci tend to be inherited together, a phenomenon known as linkage disequilibrium (LD) • Because recombination is not uniform, the genome has a “block-like” structure – haplotype • You do not need to have the “causal variant” in your genotyped set if it is adequately tagged • A major strategy for SNP selection is to ensure adequate coverage (r2 > .8) of common genetic variants in a gene, which can be done with Haploview