Fill ing Missing Genotypes Using Haplotypes
150 likes | 309 Vues
Fill ing Missing Genotypes Using Haplotypes. Genotypes / Haplotypes. Genotypes indicate how many copies of each allele were inherited Haplotypes indicate which alleles are on which chromosome Observed genotypes partitioned into the two unknown haplotypes
Fill ing Missing Genotypes Using Haplotypes
E N D
Presentation Transcript
Genotypes / Haplotypes • Genotypes indicate how many copies of each allele were inherited • Haplotypes indicate which alleles are on which chromosome • Observed genotypes partitioned into the two unknown haplotypes • Pedigree haplotyping uses relatives • Population haplotyping finds matching allele patterns
Filling missing genotypes • Predict unknown SNP from known • Measure 3,000, predict 43,000 SNP • Measure 50,000, predict 500,000 • Measure each haplotype at highest density only a few times • Predict dam from progeny SNP • Increase reliabilities for less cost
Haplotyping Programfindhap.f90 • Begin with population haplotyping • Divide chromosomes into segments, ~250 SNP / segment • List haplotypes by genotype match • Similar to fastPhase, IMPUTE • End with pedigree haplotyping • Detect crossover, fix noninheritance • Impute nongenotyped ancestors
Recent Program Revisions • Improved imputation and GEBV reliability since 9WCGALP paper • Changes since January 2010 • Use known haplotype if second is unknown • Use current instead of base frequency • Combine parent haplotypes if crossover is detected • Begin search with parent or grandparent haplotypes • Store 2 most popular progeny haplotypes
Example Bull: O-StyleUSA137611441, Sire = O-Man • Read genotypes and pedigrees • Write haplotype segments found • List paternal / maternal inheritance • List crossover locations
Pedigree HaplotypingAB allele coding Genotypes: OMan BB,AA,AA,AB,AA,AB,AB,AA,AA,AB Ostyle BB,AA,AA,AB,AB,AA,AA,AA,AA,AB Haplotypes: OStyle (pat) B A A _ A A A A A _ OStyle (mat) B A A _ B A A A A _
Allele and Segment Coding • Genotypes • 0 = BB, 1 = AB or BA, 2 = AA • 5 = missing • Haplotypes • 0 = B, 1 = not known, 2 = A • Segment storage (example) • O-Style has haplotype numbers 5 and 8 • O-Man has haplotype numbers 8 and 21 • O-Style got haplotype number 5 from dam
Most Frequent Haplotypes1st segment of chromosome 15 1 5.16% 022222222020020022002020200020000200202000022022222202220 2 4.37% 022020220202200020022022200002200200200000200222200002202 3 4.36% 022020022202200200022020220000220202200002200222200202220 4 3.67% 022020222020222002022022202020000202220000200002020002002 5 3.66% 022222222020222022020200220000020222202000002020220002022 6 3.65% 022020022202200200022020220000220202200002200222200202222 7 3.51% 022002222020222022022020220200222002200000002022220002220 8 3.42% 022002222002220022022020220020200202202000202020020002020 9 3.24% 022222222020200000022020220020200202202000202020020002020 10 3.22% 022002222002220022002020002220000202200000202022020202220 Most frequent haplotype in Holsteins had 4,316 copies = .0516 * 41,822 animals * 2 chromosomes each
Population Haplotyping Steps • Put first genotype into haplotype list • Check next genotype against list • Do any homozygous loci conflict? • If haplotype conflicts, continue search • If match, fill any unknown SNP with homozygote • 2nd haplotype = genotype minus 1st haplotype • Search for 2nd haplotype in rest of list • If no match in list, add to end of list • Sort list to put frequent haplotypes 1st
Check New Genotype Against List1st segment of chromosome 15 Check genotype: 022112222011221022021110220010110212202000102020120002021 5.16% 022222222020020022002020200020000200202000022022222202220 4.37% 022020220202200020022022200002200200200000200222200002202 4.36% 022020022202200200022020220000220202200002200222200202220 3.67% 022020222020222002022022202020000202220000200002020002002 3.66% 022222222020222022020200220000020222202000002020220002022 3.65% 022020022202200200022020220000220202200002200222200202222 3.51% 022002222020222022022020220200222002200000002022220002220 3.42% 022002222002220022022020220020200202202000202020020002020 3.24% 022222222020200000022020220020200202202000202020020002020 3.22% 022002222002220022002020002220000202200000202022020202220 Subtract 1st haplotype from genotype to get 2nd: 022002222002220022022020220020200202202000202020020002020
Conclusions • Missing genotypes can be filled easily • Population and pedigree haplotyping can both process long segments efficiently • Imputing 500,000 SNP for 33,414 Holsteins required 3 Gbyte memory, 3 CPU hours • Program findhap.f90 implemented for April 2010 routine evaluation • Several recent improvements to accuracy • Ready to include lower or higher density genotypes in evaluations