280 likes | 369 Vues
Hunting Disease Genes in the Wilds of the Genome -- II. Richard A. Spritz, M.D. April 8, 2010 richard.spritz@ucdenver.edu 303-724-3107. HMGP. Why Find Disease Genes?. The Future? Personalized Medicine.
E N D
Hunting Disease Genes in the Wilds of the Genome -- II Richard A. Spritz, M.D. April 8, 2010 richard.spritz@ucdenver.edu 303-724-3107 HMGP
The Future? Personalized Medicine • Optimized individualized treatments based on genetic diagnosis of disease susceptibilities • Preventative treatments tailored to one’s specific disease risks (“personalized medicine”)
I. Hypothesis-driven approaches • Candidate gene association • Candidate gene sequencing II. Hypothesis-free approaches Genomewide association (Genomewide expression) Genomewide sequencing Exome Full-genome • Most hypotheses wrong! How Do You Find Disease Genes?
Common, Complex Diseases • Asthma • Autism • Obesity • Preterm birth • Cleft lip/palate • IBD • Diabetes • Cancers • Common traits like height
Common, Complex Diseases Utility of Experimental Approaches Common RISK ALLELE FREQUENCY Rare GWAS Re-Sequencing Linkage Small EFFECT SIZE (OR) Large
Candidate genes Depends on: biological hypothesis (biological candidate) positional hypothesis / information (positional candidate) • Sometimes successful in Mendelian disorders • Low yield in polygenic, multifactorial (“complex”) disorders—pathogenic sequence variants not obvious, often present in normal individuals • Most hypotheses wrong! Hypothesis-Driven Approaches
Concept: Causal disease variation in gene suggested by known biology ‘tagged’ by nearby polymorphic DNA markers; test for co-occurrence. Because: DNA sequence variations very close together on the same piece of DNA will tend to not be separated by recombination over long periods, and so will be non-randomly co-inherited (“linkage disequilibrium”). Therefore: Genotype known variants in a candidate gene as surrogates for unknown disease-causing variants; can’t discover ‘new’ genes; most hypotheses wrong! Candidate Gene Association Study
Candidate Gene Association Studies • Typically compares SNP allele (or genotype) frequencies in cases versus controls (“case-control” study design) • Easy statistics (Fisher exact test, Chi-square) • Must Bonferroni correct for multiple-testing • Must ethnically match cases and controls • Easy, cheap • Most powerful for common risk alleles • Can detect common alleles with small allele-specific effects (i.e. “complex”, polygenic traits) • Most common published type of “genetic study” • Most hypotheses wrong!
Two Fatal Flaws in Gene-by-Gene Case-Control Design • Must apply multiple-testing correction; true denominator often not known • Must ethnically match cases & controls; otherwise, differences in allele frequencies may reflect different genetic backgrounds of cases vs. controls, not disease association • Difficult or impossible even in “homogeneous” population, occult admixture (“stratification”), can lead to false-positives • Even true associations vary between populations • ~96% of published positive case-control associations are false-positives due to population stratification and publication bias
“Population stratification” and false-positive case-control genetic association studies Population 1Population 2 blue/green just indicates overall genetic background Disease Admixed Study Population 1/2 Prof. Wizard’s Case-Control Study CasesControls Eureka!
“Family-based” association studies: • Compare allele transmission from parents to patients • Much less prone to false-positives • Require nuclear families; difficult for adult disease (parents often not available/living)
“Family-Based” Association Studies Avoids stratification; each family is its own control • “Transmission disequilibrium test” (TdT) compares transmission frequency of marker alleles from parents to affected offspring in “trios” to theoretical 50%
Hypothesis-Free Approaches Genome-Wide Association Studies (GWAS) • Relatively recent approach (>300 published): • Genotype hundreds of thousands to millions of SNPs across genome using microarrays; extremely expensive • Case-control or family-based (trio) design • Requires no hypotheses about pathogenesis; can discover new genes • Can discover common alleles with small effects • Can provide very fine localization
Genome-wide association studies (GWAS) • Can apply appropriate multiple testing correction • - “Genomewide significance” P < 5 x 10-8 • Still requires ethnic matching of cases and controls • - Can correct for population stratification • “Principal components” analysis • Genomic inflation factor, “genomic control” • Can discover new, unknown genes; power similar to candidate gene case-control study • Case-control “associations” require independent confirmation Hypothesis-free approaches
The Genomewide Association Study (GWAS) Manolio TA. N Engl J Med 2010;363:166-176.
Meta-Analysis of Genomewide Association Studies Manolio TA. N Engl J Med 2010;363:166-176.
Genomewide Dataset “Quantile-Quantile (QQ) Plot” Genomic Inflation Factor 1.11Genomic Inflation Factor 1.00 Correct Test Statistics by “Genomic Control” method
Genome-Wide Association Studies“Manhattan plot” Per-SNP -log(P values) across genome for association of SNP allele freq. differences between patients with generalized vitiligo versus controls (all Caucasian)
Genome-Wide Association Studies • Very large number of SNPs tested (500,000 – 2,000,000) presents huge multiple-testing problem; requires at least ~1000 cases and ~1000 controls • Many SNPs in linkage disequilibrium (i.e. correlated); simple Bonferroni correction too strict (assumes independence) • Can minimize # SNPs genotyped by genotyping “tagSNPS” (SNPs that ‘tag’ specific haplotype blocks from HapMap) • “Significant” associations require confirmation by independent follow-up association study of specific SNPs to reduce multiple-testing complexity
Personalized Medicine The case of the ‘missing heritability’ • Disease risk genes found by GWAS • account for only a small fraction of genetic risk • >Type 1 diabetes-- ~50 genes, ~6.5% of genetic risk • Are there a virtually unlimited number of additional genes, each conferring small additional risk? • >Maybe, but probably not • Have we under-estimated fraction of genetic risk already accounted for? • >Maybe. GWAS misses rare risk alleles • Have we over-estimated total genetic component of risk? • >Maybe, but not ten-fold
Hypotheses of Common, “Complex” Disease • Common disease, common variant hypothesis (Reich & Lander, 2001) • versus • Rare variant hypothesis (Pritchard, 2001; Prixhard and Cox, 2002)
Complex Diseases Utility of Experimental Approaches Common RISK ALLELE FREQUENCY Rare GWAS Re-Sequencing Linkage Small EFFECT SIZE (OR) Large
Disease risk genes found by GWAS • account for only a small fraction of genetic risk • >Type 1 diabetes-- ~50 genes, ~6.5% of genetic risk • Implies that detailed prediction via personalized medicine may not be realistic • Are there a virtually unlimited number of additional genes, each conferring small additional risk? • >Maybe, but probably not • Have we under-estimated fraction of genetic risk already accounted for? • >Maybe. GWAS misses rare risk alleles • Have we over-estimated total genetic component of risk? • >Maybe, but not ten-fold • What does that mean for Personalized Medicine. Will it work? • >Maybe. Odds Ratio v. Population Attributable Risk Personalized Medicine The case of the ‘missing heritability’
Deep re-sequencing Combined hypothesis-based and hypothesis-free approaches • High-throughput DNA sequencing • Biological candidate genes • GWAS signals (specific genes or genes within regions) • Must distinguish potentially causal variants from non-pathological variation (1000 Genomes Project data will help) • Prioritize for follow-up functional analyses
Exome/Genome sequencing Hypothesis-free approach • High-throughput DNA sequencing • - Genome • - Exome (1% of genome) • Must distinguish potentially causal variants from non-pathological variation (1000 Genomes Project data will help) • Predict based on Mendelian inheritance • Compare across unrelated families • Prioritize for follow-up functional analyses
Missense (non-synonymous) substitutions • Most rare (<1%) missense may be deleterious • > MAQ, Bowtie, SOAP2 • Nonsense, frameshift mutations • Splice junction mutations • Exonic splice enhancer mutations • > SKIPPY • INDELs, CNVs, translocations • > GSNAP • ENSEMBL Regulatory Feature variants Variant Prioritization in Exome/Genome Sequencing