Computational Challenges in Whole-Genome Association Studies

Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut

Approaches to Disease Gene Mapping Cases Controls • Association analysis • 2-test • Genome-wide scans made possible by recent progress in SNP genotyping technologies • Linkage analysis • LOD:=log10(L()/L(1/2)) • Very successful for Mendelian diseases (cystic fibrosis, Huntington’s,…) • Low power to detect genes with small relative risk in complex diseases [RischMerikangas’96]

Computational Challenges • Detecting genotyping errors • Imputation of missing genotypes • Imputation of untyped genotypes based on reference population (e.g., Hapmap) • Haplotype inference and haplotype-based association tests • Modeling gene-gene interactions • Handling structural variation data provided by new sequencing technologies • Optimal multi-stage study design

Genotype Error Detection • A real problem despite advances in technology • In [KMP07] we proposed efficient methods for error detection in trio data based on LLR approach combined with an HMM model of haplotype diversity • In ongoing work we seek to improve error detection accuracy by using low-level data such as typing confidence scores

Genotype Imputation • Current genotyping platforms cover <1 mil. SNPs of ~10mil. SNPs  causal variant unlikely to be assayed directly • Untyped SNPs can be imputed based on linkage disequilibrium info inferred from high-density datasets such as Hapmap • Maximum likelihood approach: • probabilities computed using HMM Allele frequency, typed genotypes Allele frequency, imputed genotypes

Acknowledgements & Advertisment • Justin Kennedy, Bogdan Pasaniuc • NSF funding (Awards 0546457 and 0543365) DIMACS Workshop on Computational Issues in Genetic Epidemiology August 21 - 22, 2008 DIMACS Center, CoRE Building, Rutgers University Presented under the auspices of the DIMACS/BioMaPS/MB Center Special Focus on Information Processing in Biology. Organizers: Andrew Scott Allen, Duke University, Ion Mandoiu, University of Connecticut Dan Nicolae, University of Chicago, Yi Pan, Georgia State University, Alex Zelikovsky, Georgia State University

Computational Challenges in Whole-Genome Association Studies

Computational Challenges in Whole-Genome Association Studies

Presentation Transcript

copy number variations: a new type of genetic marker in whole-genome association studies

Whole Genome Sequencing

PLINK / Haploview Whole genome association software tutorial

Whole Genome Association Analysis with PLINK

Whole genome association studies

Genome-wide association studies

Computational and Statistical Challenges in Association Studies

Genome-Wide Association Studies

Genome-wide association studies

Genome-wide association studies (GWAS)

On genome-wide association studies (GWAS)

Whole Genome Alignment

Genome-wide association studies

Genome-Wide Association (GWA) Studies

Genome-wide association studies (GWAS)

Genome-wide Studies: Association

Genome-Wide Association Studies

Genome-wide association studies (GWAS)

Analysis of whole genome association studies in pedigreed populations

Genome-wide Association Studies

Whole genome alignments

Whole genome alignments