1 / 53

Molecular & Genetic Epi 217 Association Studies

Molecular & Genetic Epi 217 Association Studies. John Witte. Overview. More on Aggregation / Heritability Association Studies Design Population Stratification Family-based Studies Analysis Candidate Gene Studies Resources Selecting ‘tag’SNPs Pathways. Recurrence Risks,  s.

Télécharger la présentation

Molecular & Genetic Epi 217 Association Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Molecular & Genetic Epi 217Association Studies John Witte

  2. Overview • More on Aggregation / Heritability • Association Studies • Design • Population Stratification • Family-based Studies • Analysis • Candidate Gene Studies • Resources • Selecting ‘tag’SNPs • Pathways

  3. Recurrence Risks, s • Alzheimer Disease 3-4 • Rheumatoid Arthritis 12 • Schizophrenia 13 • Type I Diabetes 15 • Multiple Sclerosis 20-30 • Neural Tube Defects 25-50 • Autism 75-150

  4. Limitations of recurrence risks (Figure 4.1, p. 53) • Recurrence risks depend on mode of inheritance and disease frequency. • Single gene diseases have high recurrence risks • More common complex diseases have lower values (e.g., CHD). • Here, hard to distinguish genetic versus environmental effects.

  5. Heritability Analysis • Evaluates the genetic contribution to a trait Y in terms of variance explained. • Y = Genetics + Environment • Var(Y) = overall variation in phenotype Y • Broad sense heritability: H2 = Var (G) / Var (Y) where Var(G) = genetic part of variance = VA+VD (Additive + Dominance) • Narrow sense heritability: h2= VA / Var (Y) Proportion of phenotypic variance that is explained only by additive genetic effects.

  6. Process of Genetic Epidemiology Defining the Phenotype Migrant Studies Familial Aggregation Segregation Linkage Analysis Association Studies Cloning Fine Mapping Characterization

  7. 2. Association Studies

  8. Association Studies • Use of association studies is rapidly expanding, reflecting a number of laudable properties, including their: • Ease, since one need not collect large pedigrees; and • Potential for being more powerful than conventional linkage-based approaches.

  9. Linkage vs. Association Risch & Merikangas, Science 1996

  10. Association Study Approaches • Direct vs Indirect • Candidate genes: • Functional • All common variants • Exome Arrays • All common variants in genome (GWAS) • All variants in genes/genome (sequencing) • Expensive

  11. Genomics Revolution Human Genome Project: 13 years, $3B for 1 sequence Now: 1 week, $10K > 500 times faster < 1/100,000th the cost! Soon: 1 hour, $1K (#1 Innovation, 2010) The Economist, 2010

  12. Study Design: Control Selection • A critical aspect of association studies is that controls should be selected from the cases’ source population. • That is, controls should be those individuals who, if they were diseased, would become cases.

  13. Population stratification • A form of confounding in genetic association studies caused by genetic differences between cases and controls unrelated to disease but due to sampling them from populations of different ancestries

  14. Sub-population  RpR Gene Disease Population Stratification • Confounding bias that may occur if one’s sample is comprised of sub-populations with different: • allele frequencies (); and • disease rates (RpR) • Cases are more likely than controls to arise from the sub-population with the higher baseline disease rate. • Cases and controls will have different allele frequencies regardless of whether the locus is causal.

  15. Correcting for population stratification 1. Genomic control (PMID: 11315092) • Non-central chi-square  = mean of all 2 tests in the sample • Adjust all test statistics for inflation due to empirical chi-square distribution (2new = 2old/) • Critiques: An average across the genome, and may over or under correct for individual tests 2. Structured association • Structure (PMID: 10835412, PMID: 10827107) • Adjust regression model for ancestral group membership • Must specify number of groups; Can be slow to implement 3. Principal Components • Eigenstrat (PMID: 16862161) • Adjust regression model for principal component values which serve as a proxy for ancestry

  16. Genomic Control

  17. Quantile-Quantile (QQ) Plots

  18. Family-Based Association Studies Siblings Parents G G G G G Cousins G G G

  19. Subpopulation Gene Disease Continuum of Assoc Study Designs Population-based “Ethnicity” Matched Structured Assoc Family-based Population Stratification Overmatching (Bias…………………versus………………...efficiency) •  Sharing of genes & envt. • Efficiency Also, recruitment issues

  20. Association Analysis Simple chi-square test comparing genotype frequencies (2 d.f.) Called a co-dominant analysis

  21. Genetic Model ORs depend on genetic model R = r = 1 not risk allele R > r = 1 recessive R = r > 1 dominant R = r2 > 1 log additive (Assuming positive association) Genotype OR GG 1 GT r TT R

  22. Tests of association If genetic model known: • Collapse genotypes into 2x2 table, 1 d.f. test • Trend test for log additive • Use logistic regression: coding; covariates • Rarely know genetic model • Use all three models (dom, rec, log additive) • Compare fit with the co-dominant (2d.f.) model (LR test) • Cannot use LR test to compare models with each other as not nested • Model with best fit and smallest P is best? • Use permutation test here (MAX test)

  23. Selection of candidates Linkage regions? Biological support?“I am interested in a candidate gene and have samples ready to study. What SNPs do I genotype?” 3. Candidate Gene Studies

  24. Location: What chromosome? What position on the chr? • Exons/UTR: How many exons? UTR regions? • Size: How large is the gene? Use UCSC genome browser. Candidate Gene: Where do I Start?

  25. SNP Picking: Things to Consider • Validation: What is the quality of the SNPs? • Informativity: Are these SNPs informative in my population? How common are they? Location? • Potentially Functional: Do these SNPs have a potential biological impact? Missense variants? • Previously Associated: Have previous studies found SNPs in the candidate gene associated with the outcome?

  26. SNP Picking: Validation

  27. SNP Picking: Validation

  28. SNP Picking: Validation

  29. SNP Picking: Informative

  30. SNP Picking: Potentially Functional C677T

  31. SNP Picking: Previously Associated

  32. chr1 • Size: 20,329 bp • Exons: 12 • Potentially Functional: 5 missense of which 3 MAF >5% • Previously Associated: 3 (C677T, A1298C, A2756G) MTHFR Summary

  33. 102 SNPs across MTHFR Too Many SNPs to Genotype! http://genome.ucsc.edu/cgi-bin/hgGateway MTHFR SNPs

  34. G/C 3 G/A 2 T/C 4 G/C 5 A/T 1 A/C 6 G G A A G T G A C C C C C C C C T T A A G G C C high r2 high r2 high r2 • SNPs are correlated (aka Linkage Disequilibrium) Pairwise Tagging: SNP 1 SNP 3 SNP 6 3 tags in total Test for association: SNP 1 SNP 3 SNP 6 Too many MTHFR SNPsSolution: Tag SNP Selection Carlson et al. (2004) AJHG 74:106

  35. Coverage: Measurement Error in TagSNPs

  36. Common Measures of Coverage • Threshold Measures • e.g., 73% of SNPs in the complete set are in LD with at least one SNP in the genotyping set at r2> 0.8 • Average Measures • e.g., Average maximum r2 = 0.84

  37. Coverage and Sample Size • Sample size required for Direct Association, n • Sample size for Indirect Association n* = n/ r2 • For r2 = 0.8, increase is 25% • For r2 = 0.5, increase is 100%

  38. http://www.hapmap.org Tag SNPs Database Resources http://gvs.gs.washington.edu/GVS/index.jsp

  39. HapMap • Re-sequencing to discover millions of additional SNPs; deposited to dbSNP. • SNPs from dbSNP were genotyped • Looked for 1 SNP every 5kb • SNP Validation • Polymorphic • Frequency • Haplotype and Linkage Disequilibrium Estimation • LD tagging SNPs

  40. HapMap Phase III Populations • ASW African ancestry in Southwest USA • CEU Utah residents with Northern and Western European ancestry from the CEPH collection • CHB Han Chinese in Beijing, China • CHD Chinese in Metropolitan Denver, Colorado • GIH Gujarati Indians in Houston, Texas • JPT Japanese in Tokyo, Japan • LWK Luhya in Webuye, Kenya • MEX Mexican ancestry in Los Angeles, California • MKK Maasai in Kinyawa, Kenya • TSI Toscani in Italia • YRI Yoruba in Ibadan, Nigeria

  41. Tag SNPs: HapMap

  42. Tag SNPs: HapMap

  43. Tag SNPs: HapMap & Haploview http://www.broad.mit.edu/mpg/haploview/

  44. Tag SNPs: HapMap & Haploview

  45. Tag SNPs: HapMap & Haploview

  46. Tag SNPs: HapMap & Haploview

  47. Tag SNPs: HapMap & Haploview

  48. Tag SNPs: HapMap Summary • Identified 33 common MTHR SNPs (MAF > 5%) among Caucasians • Forced in 3 potentially functional/previously associated SNPs • Identified tag based on pairwise tagging • 15 tags SNPs could capture all 33 MTHR SNPs (mean r2 = 97%) • Note: number of SNPs required varies from gene to gene and from population to population

  49. 4. Pathways Physical activity Genetic susceptibility Obesity Hyperlipidemia Diet Diabetes Complex diseases: Many causes = many causal pathways! Vulnerable plaques Hypertension MI Atherosclerosis

More Related