Prediction Models Using Genomic Profiling

Prediction Models Using Genomic Profiling H. Zhang E. Warner D. Zhao

GENOMIC PROFILING Testing genes at multiple loci simultaneously Complex diseases-complex causal pathway High number of weak predictors Single genes–of limited predictive value

QUESTIONS OF INTEREST Is testing low-risk genes at multiple loci useful clinically -- discriminative accuracy? Can we predict individual genetic risk from GWAS? Value for assessing susceptibility to common diseases and targeting interventions?

Predictive testing for complex diseases using multiple genes: Fact or fiction? Cecile J. W. Janssens, Yurii S. Aulchenko, Stefano Elefante, Gerard J J. M Borsboom, Ewout W. Steyerberg, Cornelia M van Dujuin Genet Med 2006: 8(7):395-400

METHODS Simulation test: 100,000 subjects up to 400 genes Each gene: two alleles Hardy-Weinberg Equilibrium Disease risk associated with genetic profiles: Bayes’ theorem Multiplicative model No LD between genes All genes predictive of the disease are known

METHODS (CONTINUED) Part I: genes had the same risk allele frequencies and the same effect on the disease (same ORs) Part II: genes with varying ORs and allele frequencies

DISCRIMINATIVE ACCURACY • AUC (Area Under the ROC Curve) • Probability that the test correctly identifies the diseased subject

Examples of AUC

FIG 1. DISCRIMINATIVE ACCURACY OF GENETIC PROFILING (CONSTANT ORs)

FIG 2. DISCRIMINATIVE ACCURACY OF GENETIC PROFILING (VARYING ORs)

FIG. 3. RELATIONSHIP BETWEEN HERITABILITY AND DISCRIMINATIVE ACCURACY

CONCLUSIONS • Discriminative accuracy depends on • Number of genes • Frequency of risk alleles • Risk associated with the genotypes • Heritability (few strong predictors or large number of common susceptibility genes) • Level of discriminative accuracy required for clinical application depends on the goal of testing, burden of disease, cost, treatment availability etc.

Prediction of individual Genetic risk to disease from genome-wide association studies Naomi R. Wray, Michael E. Goddard and Peter M Visscher Genome Res. 2007 17: 1520-1528

Purpose • Research Question • Can we identify high risk genetic profiles consisting of multiple risk alleles with small effects at any given locus? • Aims • Investigate the relationship between the RR of genetic loci and the number of loci that contribute to disease risk • Investigate the number of loci underlying complex disease of a given disease prevalence and heritability • Simulate a case control study to investigate the prediction of genetic risk of disease from multiple loci in a genome wise association study (GWAS) • Use SNPs selected from the simulation to see how accurately they predict the risk in a random sample

Methods • Repeated simulations • Parameters • Disease prevalence: 0.05 or 0.10 • Heritability: 0.1 or 0.2 • Allele frequency distribution: uniform (common disease-common variant) or U-shaped (neutral allele hypothesis) • GWAS • 500,000 SNPs • Number of disease risk loci: 10, 20, 50, 100, 300, 1000 • 1000 or 10,000 cases and controls

Results: # of Loci and Average Relative Risk

Results: Risk Allele models

Results: selected snps

Results: Accuracy

Assumptions and Limitations • True causal SNPs were always included in GWAS • All genetic variance was attributable to variants of frequency 0.01 to 0.99 • No population stratification • All genotypes are in Hardy-Weinberg equilibrium • No LD between SNPs • Did not consider gene-gene or gene-environment interactions

Conclusions • Prediction of genetic risk is possible, even if there are hundreds of risk variants, each of small effect • Genomic profiling may not be appropriate for rare diseases • Implementation of these procedures doesn’t require knowledge of causal mechanism

An epidemiologic assessment of genomic profiling for measuring susceptibility to common diseases and targeting interventions Muin J. Khoury, Quanhe Yang, Marta Gwinn, Julian Little, W. Dana Flanders Genet Med 2004:6(1):38-47

Purpose • Normal epidemiological methods already provide information about important exposure-disease associations that can be used to reduce disease burden • What value does genomic profiling/genetic testing to predict susceptibility add to usual epidemiological methods?

Two aspects of “value” • Clinical value – individual level • Clinical validity – can genetic testing help predict future disease positive and negatives? • Clinical utility – can genetic testing help lower disease risks for people with a “positive” genetic test • See the task force on genetic testing and the Secretary's Advisory Committee on Genetic Testing – references in paper • Public health value – population level • Public health utility – how does reduction of disease burden in population based on genetic profiling compare to population-wide interventions

General Methods • Model: Risk = Baseline + Gene1 + Gene2 + Gene3 + Modifiable exposure • Posited hypothetical but “likely” data by varying the following parameters: • Lifetime risk of a disease • Number of loci in genetic test • Frequency of genotypes • Strength of association between these loci and the disease • Strength of association between exposure and the disease • Calculated value for these hypothetical data by calculating impact of targeted intervention on the exposure

CLINICAL VALIDITY AND UTILITY Technical validity is assumed

Public health utility Calculate the ratio of PAFt – reduction of disease burden due to targeted intervention – to PAF

Implications • There are other parameters that could be varied: • Higher synergy will lead to higher predictive values and population impact • Epistasis will lead to higher predictive values • Tension between targeted and population interventions • Screening may be a good compromise: population-wide intervention of education and awareness + targeted intervention • Genetic testing has different added values under different conditions and epidemiological methods can be used to determine the extent of its added value

Summary • Genetic tests should involve multiple loci • Discriminative accuracy improves with higher heritability • Number of loci needed to accurately identify association • Function of heritability, prevalence and RR • Complicated relationships between accuracy, allele effect sizes and allele frequency

Summary • The accuracy of the predictive models in the presence of gene-gene/environment interactions may be overstated • Genetic testing must be applied to all subjects and can be resource-intensive

Prediction Models Using Genomic Profiling