1 / 19

Whole Genome Association Analysis with PLINK

Whole Genome Association Analysis with PLINK. Dug Yeo Han, PhD Discipline of Nutrition School of Medical Sciences The University of Auckland. PLINK. Whole genome association analysis toolset Integration with gPLINK and Haploview

abia
Télécharger la présentation

Whole Genome Association Analysis with PLINK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Whole Genome Association Analysis with PLINK Dug Yeo Han, PhD Discipline of Nutrition School of Medical Sciences The University of Auckland

  2. PLINK • Whole genome association analysis toolset • Integration with gPLINK and Haploview • Developed by Shaun Purcell at the Centre for Human Genetic Research, Massachusetts General Hospital, and the Broad Institute of Harvard & MIT.   • A command line program

  3. PLINK integration with gPLINK and Haploview

  4. Basic Data File PLINK has two basic data files: PED file and MAP file

  5. Basic Data File – PED file • A white-space (space or tab) delimited file • The first six columns are mandatory: • Family ID • Individual ID • Paternal ID • Maternal ID • Sex • Phenotype • SNP 1 -- SNP N

  6. PED file Example • CH18526 NA18526 0 0 2 1 G G C C T T A A G G G G T A G G T G C C T T T T C A C C A C G G C C • CH18524 NA18524 0 0 1 1 G G C C T T A A G G A G A A G A G G C C T T C T A A C CCC G G C C • CH18529 NA18529 0 0 2 1 C G C C T T C A G G G G T A G G T G C C T T C T A A C C A C A G C C • …

  7. Basic Data Format – MAP file • Each line describes a single marker • Must contain exactly 4 columns: • Chromosome (1-22, X, Y, or 0 for unplaced) • rs# or SNP identifier • Genetic distance (morgans) • Base-pair position (bp units)

  8. PLINK - MAP file Example Chr SNP Genetic distance bp units 8 rs17121574 12.7991 12799052 8 rs754238 12.8481 12848056 8 rs11203962 12.8484 12848438 8 rs6999231 12.8623 12862253 8 rs17178729 12.867 12867001 8 rs10105623 12.8683 12868315 8 rs2460915 12.8704 12870407 8 rs7835221 12.8781 12878098 8 rs2460911 12.8953 12895289 8 rs12156420 12.9146 12914557 8 rs17786052 12.9224 12922389 8 rs529983 12.9426 12942555 8 rs630969 12.9458 12945844 8 rs2460914 12.9581 12958068 8 rs607499 12.9619 12961886 8 rs634228 12.9633 12963283 8 rs556531 12.9893 12989321

  9. Normal Text Format to Binary Format Binary format: • Binary PED file (mydata.bed) • Extended MAP file (mydata.bim) • Phenotype information (mydata.fam) Text format: mydata.ped mydata.map

  10. Basic data manipulation

  11. Quality Control Filters for quality control • Individual genotyping rate • SNP genotyping rate • Allele, genotype, haplotype frequencies • Hardy-Weinberg test • Mendel errors Tests for non-random missingness Individual homozygosity estimates …

  12. Command for Quality Control

  13. Association Analysis • Population-based • Allelic, trend, genotypic, Fisher’s exact • Stratified tests • Multilocus tests • Haplotype estimation • Set-based tests • Epistasis …

  14. Association Analysis

  15. Haplotype Testing

  16. Results After quality control: 836 subjects (348 CD patients & 488 controls) with 128970 SNPs included in the analysis 34 SNPs met genome-wide significant evidence for association with Crohn’s disease

  17. Manhattan plot from our study

  18. Significant SNPs

  19. Thank you

More Related