1 / 52

Genome-wide Studies: Association

Genome-wide Studies: Association. Genome-wide Association Studies. 1. History Linkage vs. Association Power/Sample Size 2. SNPs and The International HapMap Project 3. Direct vs. Indirect Association Using Linkage Disequilibrium to reduce genotyping

rachel
Télécharger la présentation

Genome-wide Studies: Association

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome-wide Studies: Association

  2. Genome-wide Association Studies • 1. History • Linkage vs. Association • Power/Sample Size • 2. SNPs and The International HapMap Project • 3. Direct vs. Indirect Association • Using Linkage Disequilibrium to reduce genotyping • 4. SNP selection, Coverage, Study Designs • 5. Genotyping Platforms • 6. Early (recent) GWA Studies

  3. Gene Mapping Study Designs • Positional Cloning • Linkage Analysis • Linkage Disequilibrium based Fine Mapping • Candidate Gene Association • Need a biological hypothesis

  4. Risch and Merikangas 1996 Sample Size Association < Sample Size for Linkage

  5. Risch and Merikangas 1996

  6. What Risch and Merikangas proposed: • 5 genetic polymorphisms per gene • 100,000 genes (1996) • = 500,000 genotypes per subject • Candidate Gene Study Design • All genes are candidates • Direct or Sequence-based approach • Causal variant is one of the variants tested

  7. Sample Size Required • Linkage Analysis with affected sib pairs • Transmission Disequilbrium Test (TDT) • TDT with affected sib pairs

  8. Affected Sib Pair Linkage Analysis • 2 siblings/family • Both sibs affected • IBD at the marker locus • Expect 50% marker sharing on average

  9. Identity By Descent Sibling 1 A A 2 1 1 0 A A a A A a a a

  10. Identity By Descent Expected number of alleles IBD is = 2*25% + 1*50% + 0*25% = 1 allele = 50% sharing

  11. Risch and Merikangas 1996

  12. Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required

  13. Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required High IBD sharing Low IBD sharing

  14. TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2

  15. TDT Transmitted alleles vs. non-transmitted alleles TDT = (n12 - n21)2 (n12 + n21) Asymptotically c2 with 1 degree of freedom

  16. TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2

  17. TDT For this one Trio: TDT = (1 - 0)2 (1 + 0) p-value = 0.32 = 1

  18. TDT For one hundred Trios: TDT = (50 - 45)2 (50 + 45) p-value = 0.01 = 6.58

  19. Risch and Merikangas 1996 TDT

  20. Conclusions • Linkage • Good for Large Effect Sizes • Genome-wide Association • Good for Modest Effect Sizes

  21. Two Hypotheses • Common Disease-Common Variant • Common variants • Small to modest effects • Rare Variant • Rare variants • Larger effects

  22. Large samples needed for rare alleles

  23. How many common variants are there? • Millions of SNPs • Microsatellites, Insertions/Deletions, Copy Number Polymorphisms, Inversions • Numbers depend on: • Population under study • Minimum allele frequency • 5% is “common” • Less than 5% requires very large samples

  24. The Number of SNPs in the Human Genome (HapMap) 1This is an underestimate.

  25. Ethnic/Racial Variation in SNP frequency

  26. Multiple testing and genotyping “The number of tests we have used as the basis for our calculations (1,000,000) is likely to be far larger than necessary if one allows for linkage disequilibrium, which could substantially reduce the required number of markers and families needed for initial screening.”

  27. Direct vs. IndirectSequence-based vs. Map-based

  28. Coverage • Percent of all SNPs captured by genotyped SNPs • More genotyped SNPs = better coverage

  29. Linkage Disequilibrium A B a b A b a B

  30. Measuring Coverage Maximum r2 0.8 0.6 0.2 Genotyping Set Complete Set

  31. Indirect Association and LD • Sample size required for Direct Association, n • Sample size for Indirect Association = n/ r2 • For r2 = 0.8, increase is 25% • For r2 = 0.5, increase is 100%

  32. The HapMap Project • Initial Goal: • 600,000 SNPs for indirect association studies • LD information between SNPs • Phase 1: 1 million SNPs • Phase 2: additional 2.9 million SNPs

  33. HapMap • SNPs from dbSNP were genotyped • Looked for 1 every 5kb • SNP Validation • Polymorphic • Frequency • Linkage Disequilibrium Estimation • LD tagging SNPs

  34. HapMap • 270 subjects • 45 Chinese • 45 Japanese • 90 Yoruban and 90 European-American • 30 Trios • 2 parents, 1 child

  35. Number of SNPs needed to capture all SNPs • Depends on: • Population studied • Minor allele frequency of causal SNP • Level of LD (r2) used as a cutoff • For Example: • Caucasians, Asians, Africans • Minor Allele Frequency ≥ 5% • r2 ≥ 0.8

  36. The Number of SNPs in the Human Genome (extrapolating from the HapMap) This is an underestimate.

  37. Genotyping Platforms • Affymetrix 500K • Pseudo-random SNPs • Illumina 550K, 650K • HapMap-based LD-tagging SNPs • r2 ≥ 0.8 for some SNPs, ≥ 0.7 for others • Parallele 20K • Nonsynonymous SNPs

  38. Multistage Study Designs

  39. One- and Two-Stage GWA Designs Two-Stage Design One-Stage Design SNPs SNPs 1,2,3,……………………………,M 1,2,3,……………………………,M 1,2,3,………………………,N 1,2,3,………………………,N samples Stage 1 Samples Samples Stage 2 markers

  40. One-Stage Design SNPs Samples Two-Stage Design Joint analysis Replication-based analysis SNPs SNPs Samples Stage 1 Stage 1 Samples Stage 2 Stage 2

  41. Multistage Designs • Joint analysis has more power than replication • p-value in Stage 1 must be liberal • Lower cost—do not gain power • http://www.sph.umich.edu/csg/abecasis/CaTS/index.html

  42. GWA studies have been published • Myocardial Infarction • Gene-based SNPs • Age related Macular Degeneration • Affymetrix 500K • Parkinson’s Disease • Perlegen 198K chip • 1,793 SNPs in second stage

  43. GWA studies have been published

  44. Myocardial Infarction

  45. Functional SNP approach

  46. Myocardial Infarction Results

  47. Macular Degeneration

  48. Macular Degeneration Results

  49. Macular Degeneration

  50. Macular Degeneration • Small Sample—96 cases, 50 controls • Sparse SNP set • Under a previous linkage peak • Missed other loci

More Related