60 likes | 84 Vues
Computational Challenges in Whole-Genome Association Studies. Ion Mandoiu Computer Science and Engineering Department University of Connecticut. Approaches to Disease Gene Mapping. Cases. Controls. Association analysis 2 -test
E N D
Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut
Approaches to Disease Gene Mapping Cases Controls • Association analysis • 2-test • Genome-wide scans made possible by recent progress in SNP genotyping technologies • Linkage analysis • LOD:=log10(L()/L(1/2)) • Very successful for Mendelian diseases (cystic fibrosis, Huntington’s,…) • Low power to detect genes with small relative risk in complex diseases [RischMerikangas’96]
Computational Challenges • Detecting genotyping errors • Imputation of missing genotypes • Imputation of untyped genotypes based on reference population (e.g., Hapmap) • Haplotype inference and haplotype-based association tests • Modeling gene-gene interactions • Handling structural variation data provided by new sequencing technologies • Optimal multi-stage study design
Genotype Error Detection • A real problem despite advances in technology • In [KMP07] we proposed efficient methods for error detection in trio data based on LLR approach combined with an HMM model of haplotype diversity • In ongoing work we seek to improve error detection accuracy by using low-level data such as typing confidence scores
Genotype Imputation • Current genotyping platforms cover <1 mil. SNPs of ~10mil. SNPs causal variant unlikely to be assayed directly • Untyped SNPs can be imputed based on linkage disequilibrium info inferred from high-density datasets such as Hapmap • Maximum likelihood approach: • probabilities computed using HMM Allele frequency, typed genotypes Allele frequency, imputed genotypes
Acknowledgements & Advertisment • Justin Kennedy, Bogdan Pasaniuc • NSF funding (Awards 0546457 and 0543365) DIMACS Workshop on Computational Issues in Genetic Epidemiology August 21 - 22, 2008 DIMACS Center, CoRE Building, Rutgers University Presented under the auspices of the DIMACS/BioMaPS/MB Center Special Focus on Information Processing in Biology. Organizers: Andrew Scott Allen, Duke University, Ion Mandoiu, University of Connecticut Dan Nicolae, University of Chicago, Yi Pan, Georgia State University, Alex Zelikovsky, Georgia State University