1 / 6

WHI Imputation

WHI Imputation. Target GWAS data. WHIMS +, ~5,000-6,000 samples, Illumina Omni express GRANET, ~5,000 samples, Illumina Omni Hipfx , ~4,000-5,000 samples, Illumina 550k-610k SHARE, ~12,000 samples, Affy 6 (already imputed)

merry
Télécharger la présentation

WHI Imputation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WHI Imputation

  2. Target GWAS data • WHIMS +, ~5,000-6,000 samples, Illumina Omni express • GRANET, ~5,000 samples, Illumina Omni • Hipfx, ~4,000-5,000 samples, Illumina 550k-610k • SHARE, ~12,000 samples, Affy 6 (already imputed) • GECCO, ~4,000 samples, Illumina 550K, 550K duo, 610K, 300K (already imputed) • WHI NCI GWAS, ~1000 samples, not cleaned and will not be included in the current batch of imputations

  3. Reference Panel • Reference Panel: 1000G Phase I Integrated Release Version 2 Haplotypes; downloaded from MACH website • 1092 samples (AFR 246;AMR 181;ASN 286; EUR 379). We used all population because it has been shown to increase the imputation quality of the rare variants. • 36,648,992 SNPs; 3,660,720 indels

  4. Imputation steps • Convert the GWAS data from HG18 to 19 • Matched the strand in GWAS data with 1000 genomes reference panel (forward strand) • Phased each study separately using Beagle • Impute each study using minimac. Will break chromosomes small chunks to facilitate the imputation process; there will be overlapping between neighbouringchunks, which improves accuracy near the edges

  5. Expected output • The imputation program will output the imputed dosage and the imputation quality measure R2 for each SNP. • Previously in imputation of GECCO data, we determined the R2 cutoff by MAF categories to filter SNPs • For MAF>0.01: R2>0.3 • For 0.005<MAF<0.01: R2>0.5 • For MAF<0.005: R2>0.99 • Around 9M SNPs and 1.2M indels left after the filtering • Comparison of SNPs that pass the filtering for Hapmap and 1000 genomes imputation

  6. Timeline • It will take approximately one half month to finish the imputation for each study • So the imputation should be done by the end of May 2013 assuming we get access to GWAS data in a timely manner.

More Related