Variación genética en el genoma

A G A G T T C T G C T C G A G G G T T A T G C G C G A G A G T T C T G C T C G A G G G T T A T G C G C G A G A G T T C T G C T C G A G G G T T A T G C G C G A G A G T T C T G C T C G A G G G T T A T G C G C G Variación genética en el genoma

International HapMap Project (http://www.hapmap.org)

International HapMap Project (http://www.hapmap.org) Aplicaciones biomédicas • Disponer datos genotípicos diferentes grupos étnicos • Selección TagSNPs estudio asociación -> Potencial para Whole Genome Association studies • Evaluación significación estadística e interpretación resultados • Estudio de los alelos menos comunes • Estudio variación estructural • Farmacogenómica

Bases de datos de variación genética

Association studies: Phenotpyic effect of SNPs Human genetic & phenotypic diversity database Phenotype Genotype Trait i Disease 1 ... SNP1 SNP2 SNP3 Estimation phenotypic effect G/T A/A G/C Secuence individual 1 x1 Healthy A/C C/C T/T Cervical Cancer x2 Secuence individual 2 ... ... ...

BioBanks: Studies of cohorts at a great scale USA • deCODE (Islandia) • Estonia • Germany • Canada • Japan • China

Association Studies

Association Studies • Study design • Statistical analyses

1st phase: Design Study designs

2nd phase: Statistical analysis Statistical analysis methods

2nd phase Statistical analyses in Association Studies • Data validation • Genetic description • Unidimensional (snp by snp) • Multidimensional • Test for association genotype-phenotype • snp by snp • Multisnp / haplotype /tagSNP • Power assessment • Predictive model Steps

Statistical analyses in Association Studies Step • Data validation (error sources: sampling, genotyping) • Checking with SNPref • Hardy-Weinberg proportions (separately for controls and cases) • Consistence among samples • Stratification (genetic markers)

Hardy-Weinberg Test • SNP diallelic: A & a with p and q relative freq. • Genotypic HW proportions • AA, Aa & aa • p2, 2pq & q2 Genotype frequencies SNP rs1137933 • Three statistics: • (i) That based on the Pearson (χ2) test statistic(ii) That based on the Likelihood ratio test statistic (G test). (iii) An exact test

Genotypes SNP rs1137933 Example of Hardy-Weinberg Test Control p = f(C)= f(CC) + f(CT)/2 q = 1 – p --------- Genotype SS SF FF Total Number, obs 38 76 15 = 129 = N Frequency, exp p2 2pq q2 = 1,00 Number, exp p2N 2pqN q2N = N Number, exp 50.1 70.0 8.9 = 129 ---------- Pearson (χ2) test statistic X2 = Σ (Oi-Ei) 2 / Ei Likelhood ratio (G) test statistic G= - 2 Σ ln (Oi / Ei)

Example of Hardy-Weinberg Test SNP rs1137933

SNP rs1137933 Genetic description: SNP by SNP Genotype frequencies Allele frequencies

Genetic description: MultiSNP Haplotype inference Genotypes Possible haplotypes • a g • t c • b) a c • t g a/t g/c -> Haplotype 1acgtagcatcgtatgcgttagacgggggggtagcaccagtacag Haplotype 2acgtagcatcgtatgcgttagacgggggggtagcaccagtacag Haplotype 3acgtagcatcgtatgcgttagacgggggggtagcaccagtacag Haplotype4acgtagcatcgtttgcgttagacgggggggtagcaccagtacag Haplotype5acgtagcatcgtttgcgttagacgggggggtagcaccagtacag Haplotype6acgtagcatcgtttgcgttagacggcatggcaccggcagtacag Haplotype7acgtagcatcgtttgcgttagacggcatggcaccggcagtacag Haplotype8acgtagcatcgtttgcgttagacggcatggcaccggcagtacag Haplotype9acgtagcatcgtttgcgttagacggcatggcaccggcagtacag

Frequency Haplotype estimates

Genetic description: MultiSNP Linkage disequilibrium measure (D’ Lewontin) B1 B2 Total A1 p11 = p1q1 + D p12 = p1q2 - D p1 A2 p21 = p2q1 - D p22 = p2q2 + D p2 Total q1 q2 1 D’ = D / Dmax r = D’ / square root (p1 p2 q1 q2)

Linkage Disequilibrium representation Recombination Hotspot Linkage blocks Associated Sites TagSNPs

Statistical analyses in Association Studies • Data validation • Genetic description • Unidimensional (snp by snp) • Multidimensional • Test for association genotype-phenotype • snp by snp • Multisnp / haplotype /tagSNP • Power assessment • Predictive model Steps

Case – control study 40% G 60% C Neutral SNP SNP1 (G/C) 40% G 60% C 100% A 0% T SNP2 (A/T) 0% A 100% T Mendelian SNP SNP3 (T/G) 80% T 20% G 60% T 40% G QTL SNP Genetic - phenotype Association -> Guilty by association Case vs Control SNPn

Test for association • (snp by snp) • Chi-square Independence Test Genotypic SNP rs1137933 ChiSquare (2 gl) = 9,71** p = 0,00779 G (Likelihood ratio) (2 gl) = 9,67** p = 0,00795 ChiSquare (1 gl) = 0,07 p = 0,79134 G (Likelihood ratio) (1 gl) = 0,07 p = 0,79134 Allele Odds Ratio (OR) = 1,05Risk Ratio (RR) = 1,02

odds (oportunidad) is the ratio of probabilties for an event given by the quantity p / (1 − p), where p is the probability of the event p Odds ratio (oportunidad relativa) 1 - p An disease with a 1 in 5 probability of occurring for a given genotype (i.e. 0.2 or 20%), then the odds are 0.2 / (1 − 0.2) = 0.2 / 0.8 = 0.25. • The odds ratio is defined as the ratio of the odds of an event occurring in one group to the odds of it occurring in another group. These groups might be case and control groups, or any other dichotomous classification. So if the probabilities of the event in each of the groups are p (first group) and q (second group), then the odds-ratio is:

Odds ratio (razón de posibilidades) El cociente a/c es la Odds de exposición observada en el grupo de casos. El cociente b/d es la Odds de exposición en el grupo control OR = 2,2 -> 2,2:1 Un efecto (enfermedad) aparece 2,2 veces más ante la presencia de otra variable (alelo SNP) que si esta variable no está presente

RR= tasa de incidencia de expuestos/tasa de incidencia en no expuestos Riesgo relativo RR, Risk ratio Riesgo Relativo

Razón Odds = 210/100 = 2,52 250/300 Riesgo Relativo = 210/460 = 1,83 100/400

Controling for other independent variables Genotypic SNP rs1137933 ChiSquare (2 gl) = 7,59* p = 0,02248 G (Likelihood ratio)(2 gl) = 7,5* p = 0,02352 ♀ ♂ ChiSquare (2 gl) = 1,95 p = 0,37719G (Likelihood ratio) (2 gl) = 1,98 p = 0,37158

Test for association (multisnp) Test for association among haplotype and response (diseases) or TagSNP and response

Logistic regression modelo de regresión estadística de variables dependientes binarias. Puede considerarse un modelo lineal generalizado que usa la función logit como función de enalce (link), y sus errores están distribuidos binomialmente. • El modelo se expresa en la forma • i, = 1, ..., n, donde • El logaritmo de odds (probabilidad dividida por uno menos la probabilidad) del resultado se modela como una función lineal de variables explicativas, X1 a Xk. Puede escribirse como • La interpretación de las estimas de los parámetros β es el efecto multiplicativo sobre la razón de odds. En el caso de variables dicotómicas explicativas, por ejemplo sexo, eβ (el antilog de β) es la estima del odds-ratio of tener el resultado según se compare machos y hembras. • Los parámetros α β1, ..., βk se estiman normalmente por máxima verosimilitud. Logistic regression

Logistic regression is a predictive tool if the logit β1 = 2.303, then the corresponding odds ratio (the exponential function, eβ1) is 10, then we may say that when the independent variable increases one unit, the odds that the dependent = 1 increase by a factor of 10, when other variables are controlled.

http://bioinfo.iconcologia.net/SNPstats (Web tool for association studies) • http://www.mep.ki.se/genestat/tl/genass_ldmap (Tutorial for association studies) • http://linkage.rockefeller.edu/soft (Software for genetic analysis) • http://www.broad.mit.edu/personal/jcbarret/haploview (Haploview) • http://www.genome.gov/26525384 (Catálogo de estudios de GWA publicados) • http://geneticassociationdb.nih.gov (Base de datos de estudios de asociación de enfermedades humana) Links

Association studies: Recurso Web http://bioinfo.iconcologia.net/index.php?module=Snpstats

40% G 60% C SNP1 (G/C) 40% G 60% C 100% A 0% T 0% A 100% T SNP2 (A/T) SNP3 (T/G) 80% T 20% G 60% T 40% G Asociación genética -> Culpable por asociación Pacientes vs Control SNPn

Hoy podemos abordar el análisis de asociación de miles de SNPs, pudiendo desvelar la base genética de las enfermedades.

Translation of genetic-phenotypic information into the clinical practise D.R. Bentley. 2004 Nature 429: 440-445

Translation of genetic-phenotypic information into the clinical practise

Variación genética en el genoma

Variación genética en el genoma

Presentation Transcript

Bioinform tica y Gen mica