1 / 74

Association Analysis

Association Analysis. University of Louisville Center for Genetics and Molecular Medicine January 11, 2008 Dana Crawford, PhD Vanderbilt University Center for Human Genetics Research. Association Analysis Outline. Study Design SNPs versus Haplotypes Analysis Methods Candidate Gene

Lucy
Télécharger la présentation

Association Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Association Analysis University of Louisville Center for Genetics and Molecular Medicine January 11, 2008 Dana Crawford, PhD Vanderbilt University Center for Human Genetics Research

  2. Association Analysis Outline • Study Design • SNPs versus Haplotypes • Analysis Methods • Candidate Gene • Whole Genome Analysis • Replication and Function

  3. Study Design Does your trait or phenotype have a genetic component? • Segregation analysis • Recurrence risks • Heritability • Other sources of evidence for a genetic • component

  4. Classic Segregation Analysis • Determines if a major gene is involved • Compares data to Mendelian models, such as • Autosomal dominant • Autosomal recessive • X-linked • Results can be used as parameters for • linkage analysis (e.g. parametric LOD) • Subject to ascertainment bias Note: More complex methods needed for complex traits

  5. Recurrence Risks The chance that a disease present in the family will recur in that family “Lightning striking twice” If recurrence risk is greater in the family compared with unrelated individuals, the disease has a “genetic” component Suggests familial aggregation

  6. λs = sibling recurrence risk population prevalence Recurrence Risks Measured using the risk ratio (λ) Sibling risk ratio = λs Cystic fibrosis λs = (0.25/0.0004) = 500 Huntington disease λs = (0.50/0.0001) = 5000

  7. Recurrence Risks: Complex traits λ here is for first degree relative Merikangas and Risch (2003) Science 302:599-601.

  8. Heritability The proportion of phenotypic variation in a population attributable to genetic variation Heritability measured as h2 Think “twin studies” (Can also be family studies) Quantitative traits

  9. Example: Height Boys Girls Mexican Americans Mexican Americans Blacks Blacks Whites Whites NHANES 1971-1974 versus NHANES 1999-2002 Freedman et al (2006) Obesity 14:301-308 Heritability and Quantitative Traits Determined by genes and environment

  10. σT2 = σG2 + σE2 σG2 = σa2 + σd2 σE2 = σf2 + σe2 hB2= σG2 / σT2 Broad Sense heritability hN2= σa2 / σT2 Narrow Sense heritability Heritability and Quantitative Traits Trait variation = genetic + environment Genetic variation = additive + dominant Environmental variation = familial/household + random/individual

  11. Heritability and Twins Studies h2 = 2(rMZ – rDZ), where r is the correlation coefficient Monozygotic = same genetic material = r ~ 100% Dizygotic = half genetic material = r ~ 50%

  12. Heritability and Twins Studies Trait r(MZ) r(DZ) Reference Cholesterol 0.76 0.39 Fenger et al SBP 0.60 0.32 Evans et al BMI 0.67 0.32 Schousboe et al Perceived pitch 0.67 0.44 Drayna et al

  13. Heritability: Is everything genetic? Trait r(MZ) r(DZ) Reference Vote choice 0.81 0.69 Hatemi et al Religiousness 0.62 0.42 Koenig et al

  14. Other Evidence For A Genetic Component Monogenic disorders Example: Phenotype of interest is sensitivity to warfarin dosing, but there are no heritability estimates Solution: Rare, familial disorder of warfarin resistance

  15. Other Evidence For A Genetic Component Case Reports Example: Phenotype of interest is susceptibility to Neisseria meningitidis (prevalence: 1/100,000) Solution: Case report of recurrent N. meningitidis in patient

  16. Other Evidence For A Genetic Component Other good arguments… • Animal models • Biochemistry or biological pathways • Expression data • Previous genetic association studies

  17. Study Design How well can you diagnose the disease or measure the trait? • Narrow definitions better than all-inclusive definitions • There are many paths that lead to the same • phenotype • Avoid misclassification and measurement error • Direct measurement versus recall/survey data • or indirect proxies • Be aware of age of onset • Can your control become a case over time? Arguably most important step in study design

  18. LDL-C LDLR Diet IL6 Acute Illness CRP Target PhenotypesDisease or Quantitative trait? MI Note: SNPs associated with quantitative traits may not be associated with clinical endpoint Carlson et al. (2004) Nature 429:446-452

  19. Study Design How many cases and controls will you need to detect an association? Statistical Power • Null hypothesis: all alleles are equal risk • Given that a risk allele exists, how likely is a study to reject the null? • Study sample size ideally determined before you begin to recruit and genotype

  20. Note: Significance threshold for 1 SNP tested Study Design What are the thresholds/variables in a general power calculation? • Statistical significance • Significance = p(false positive) • Traditional threshold 5% • Statistical power • Power = 1- p(false negative) • Traditional threshold 80% • Traditional thresholds balance confidence in results against reasonable sample size

  21. Study Design • Power Calculation Resources • Quanto (hydra.usc.edu/gxe/) • Supports quantitative, discrete traits (unrelated • and family based) • Genetic Power Calculator • (pngu.mgh.arvard.edu/~purcell/gpc/) • Supports discrete traits, variance components, • quantitative traits for linkage and • association studies • (List of other software: linkage.rockefeller.edu/soft/)

  22. Study Design How can you maximize power for your study? • Large sample size • Better estimate of variability or risk • Chance of misclassification / measurement error • Large genetic effect size • SNP risk allele with large odds ratio or explains a lot of trait variance • This is unknown at beginning of study • Risk SNP is common • This is unknown at beginning of study • Calculate power for a range of common MAFs (5-45%) • Genotype the risk SNP directly • Risk SNP is unknown at beginning of study • Remember tagSNPs are imperfect proxies • Adjust sample size by 1/r2

  23. Study Design MAF Calculated using Quanto 1.1.1 Power calculation example: Cases: Adverse reaction (wheezing) to flu vaccination Controls: Vaccinated children with no adverse reactions

  24. Study Design Power calculation example: Immunogenicity to influenza A (H5N1) vaccine Calculated using Quanto 1.1.1

  25. Study Design Why are you considering an association study instead of linkage? • Linkage analysis is powerful for disorders with • Discernable pattern of inheritance • Rare alleles w/ large genetic effect sizes • High penetrance • Not powerful for disorders that • have complex pattern of inheritance • are common • many risk alleles with small effect sizes • have low penetrance

  26. Study Design Common variant/common disease hypothesis • Common genetic variants confer susceptibility • Risk-conferring alleles ancient; common across most • populations • Risk-conferring allele has small effect • Multiple risk alleles expected for common disease; • also environment

  27. Study Design Should you design a candidate gene or whole genome study? • Candidate gene association study • Interrogate specific genes or regions • Based on previous knowledge or • biological plausibility • Hypothesis testing • Whole genome association study • Interrogate the “entire” genome • No previous knowledge required • Hypothesis generation

  28. Candidate gene association studies • Choose gene based on previous knowledge • Gene function • Biological pathway • Previous linkage or association study • Choose DNA variations for genotyping • Direct association approach • Indirect association approach

  29. Collins et al (1997) Science 278:1580-1581 Direct Candidate Gene Association Study Genotype “functional” SNPs Example: Nonsynonymous SNPs

  30. Botstein and Risch (2003) Nat Genet 33 Suppl:228-37. Direct Candidate Gene Association Study Problem: We don’t know what is functional and what is not functional

  31. Functional synonymous SNPs in MDR1 alter P-glycoprotein activity Komar (2007) Science 315:466-467 Direct Candidate Gene Association Study What would we miss?

  32. Direct Candidate Gene Association Study What would we miss? • Non-coding SNPs or DNA variations in • Introns • Intergenic regulatory regions • 99% human genome is non-coding

  33. Indirect Candidate Gene Association Study Kruglyak (2005) Nat Genet 37:1299-1300 • Genotype a fraction of all SNPs regardless of “function” • Rely on SNP-SNP correlations (linkage disequilibrium) • to capture information for SNPs not genotyped

  34. Measured by r2 r2 = [f(A1B1) – f(A1)f(B1)]2 f(A1)f(A2)f(B1)f(B2) Indirect Candidate Gene Association Study Linkage disequilibrium (LD) r2 = 0 SNPs are independent r2 = 1 SNPs are perfectly correlated AND have the same minor allele frequency

  35. r2>0.80 CRP European-descent 4 tagSNPs Indirect Candidate Gene Association Study Using LD to pick “tagSNPs” CRP European-descent 10 SNPs >5% MAF

  36. CRP African-descent 10 tagSNPs Indirect Candidate Gene Association Study “tagSNPs” are population specific CRP European-descent 4 tagSNPs

  37. Indirect Candidate Gene Association Study • “tagSNPs” are • population specific • Merge sets for • “cosmopolitan” set http://gvs.gs.washington.edu/GVS/

  38. Indirect Candidate Gene Association Study Multiple testing • Testing many SNPs for association with • disease status • No consensus on correcting p-value • Bonferroni • False Discovery Rate • Need to replicate findings in independent study

  39. Indirect Candidate Gene Association Study: Pros and Cons • Can interrogate all common SNPs in gene • SNPs must be known and genotypes available • to calculate LD and pick tagSNPs • Multiple testing within a gene • Limited to previous knowledge

  40. Affymetrix GeneChips Illumina Infinium assay Whole Genome Association Study • Can now genotype 100K – 1 million SNPs • Coverage depends on platform and chip • tagSNPs capturing HapMap common SNPs • Genic SNPs overrepresented • Conserved non-coding SNPs represented • Evenly spaced across genome

  41. Whole Genome Association Study • Same study design and challenges as candidate gene • Mostly case-control (retrospective) • Multiple testing • Data storage and higher-order interaction • testing issues • Hypothesis generation tool (replication)

  42. Case/Control Study Designs For either candidate gene or whole genome Manolio et al.Nature Reviews Genetics7, 812–820 (October 2006)

  43. Case/Control Study Designs: Pros and Cons Study Pros Cons Case/Control Easier to collect Subject to bias Less expensive No risk estimates Prospective Risk estimates Harder to collect More expensive Subject to bias For rare outcomes, case/control design may be only option

  44. Case/Control Study Designs: Pros and Cons Types of bias • Bias in selection of cases • Those that are currently living • Miss fatal or short episodes of disease • Might miss mild diseases • Referral/admission bias • Non-response bias • Exposure suspicion bias • Family information bias • Recall bias Often ignored in genetic association studies Manolio et al.Nature Reviews Genetics7, 812–820 (October 2006)

  45. Analysis Methods Genotype QC • Test for departures of Hardy-Weinberg Equilibrium • Test for gender inconsistencies • Eliminate very rare SNPs (no power) • Eliminate SNPs with low genotyping efficiency • Eliminate samples with low genotyping efficiency

  46. Analysis Methods What statistical methods do you use to analyze your data? • SNP by SNP (borrowed from epidemiology) • Chi-square and Fisher’s exact • 2x2 table • 2x3 table • Logistic and linear regression • Covariates • Haplotypes • Haplo.stats and regression • Interactions • Traditional regression • MDR (Ritchie et al)

  47. Case Control Minor allele A B Major allele C D Analysis Methods The Case/Control Study Odds ratio (OR) = ratio of odds of minor allele in Cases (A/C) and Controls (B/D) OR (A*D)/(B*C)

  48. Case Control Aa A B AA C D Case Control aa A B AA C D Analysis Methods For genotypes, set homozygous for major allele (A) as “referent” genotype, and calculate 2 odds ratios:

  49. Analysis Methods Case/control: Interpretation of Odds Ratio 1.0 – Referent >1.0 – Greater odds of disease compared with controls <1.0 – Lesser odds of disease compared with controls Confidence Intervals: probably contain true OR OR does not measure risk*

  50. Analysis Methods Prospective cohort • Disease free at beginning of study • Followed over time for disease (“incident”) • Follow “exposed” and “unexposed” groups • Gold-standard study design

More Related