1 / 24

Reconstructing Kinship Relationships in Wild Populations

Bhaskar DasGupta UIC. Reconstructing Kinship Relationships in Wild Populations. I do not believe that the accident of birth makes people sisters and brothers. It makes them siblings. Gives them mutuality of parentage . Maya Angelou. Mary Ashley UIC. Tanya Berger-Wolf UIC.

jael
Télécharger la présentation

Reconstructing Kinship Relationships in Wild Populations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BhaskarDasGuptaUIC Reconstructing Kinship Relationshipsin Wild Populations I do not believe that the accident of birth makes people sisters and brothers. It makes them siblings. Gives them mutuality of parentage. Maya Angelou Mary AshleyUIC TanyaBerger-WolfUIC W. Art ChaovalitwongseRutgers AshfaqKhokharUIC Chun-An (Joe) Chou Rutgers PriyaGovindanRutgers Saad SheikhEcolePolytechnique Isabel CaballeroUIC Alan Perez-RathkeoUIC

  2. CACACACA 5’ Alleles CACACACA #1 CACACACACACA #2 #3 CACACACACACACA Genotypes 1/1 2/2 1/2 1/3 2/3 3/3 Microsatellites (STR) • Advantages: • Codominant (easy inference of genotypes and allele frequencies) • Many heterozygous alleles per locus • Possible to estimate other population parameters • Cheaper than SNPs • But: • Few loci • And: • Large families • Self-mating • …

  3. Siblings:two children with the same parents Question: given a set of children, find the sibling groups Diploid Siblings allele locus father(.../...),(a /b ),(.../...),(.../...) (.../...),(c /d ),(.../...),(.../...) mother (.../...),(e /f ),(.../...),(.../...) child one from fatherone from mother

  4. Why Reconstruct Sibling Relationships? Used in: conservation biology, animal management, molecular ecology, genetic epidemiology Necessary for: estimating heritability of quantitative characters, characterizing mating systems and fitness. • But: hard to sample parent/offspring pairs. Sampling cohorts of juveniles is easier

  5. Sibling Groups: 2, 4, 5, 6 1, 3 7, 8 The Problem

  6. Existing Methods

  7. Inheritance Rules father(.../...),(a /b ),(.../...),(.../...) (.../...),(c /d ),(.../...),(.../...) mother child 1 (.../...),(e1/f1),(.../...),(.../...) child 2 (.../...),(e2/f2),(.../...),(.../...) child 3 (.../...),(e3/f3),(.../...),(.../...) … child n(.../...),(en/fn),(.../...),(.../...) 4-allele rule:siblings have at most 4 distinct alleles in a locus 2-allele rule: In a locus in a sibling group:a + R ≤ 4 Num distinct alleles Num alleles that appear with 3 others or are homozygot

  8. Our Approach: Mendelian Constrains 4-allele rule:siblings have at most 4 different alleles in a locus Yes: 3/3, 1/3, 1/5, 1/6 No:3/3, 1/3, 1/5, 1/6, 3/2 2-allele rule: In a locus in a sibling group: a + R ≤ 4 Yes: 3/3, 1/3, 1/5 No: 3/3, 1/3, 1/5, 1/6 Num distinct alleles Num alleles that appear with 3 others or are homozygot

  9. Our Approach: Sibling Reconstruction Given:n diploid individuals sampled at l loci Find: Minimum number of 2-allele sets that contain all individuals NP-complete even when we know sibsets are at most 31.0065 approximation gapAshley et al ’09 ILP formulationChaovalitwongseet al. ’07, ’10 Minimum Set Cover based algorithm with optimal solution (using CPLEX)Berger-Wolf et al. ’07 Parallel implementationSheikh, Khokhar, BW ‘10

  10. Canonical families 2/3 1/1 1/1 1/2 2/1 2/2 1/4 4/1 2/4 2/3 3/1 2/1 1/3 1/3 3/2 2/1 3/1 2/1 3/1 1/3 1/2 1/1 2/1 1/2 1/1 4/2 3/2 1/3 2/1 2/3 2/1 3/2 1/3 2/2 1/1 1/2 1/4 2/3 2/4 3/4 3/3 4/4

  11. Aside: Minimum Set Cover Given: universe U = {1, 2, …, n} collection of sets S = {S1, S2,…,Sm} where Si subset of U Find: the smallest number of sets in S whose union is the universe U Minimal Set Cover is NP-hard (1+ln n)-approximable (sharp)

  12. Are we done? Challenges No ground truth available Growing number of methods Biologists need (one) reliable reconstruction Genotyping errors Answer: Consensus Consensus is what many people say in chorus but do not believe as individuals Abba Eban (1915 - 2002), Israeli diplomat In "The New Yorker," 23 Apr 1990

  13. Consensus Methods Combine multiple solutions to a problem to generate one unified solution C:S*→S Based on Social Choice Theory Commonly used where the real solution is not known e.g. Phylogenetic Trees S1 S2 Sk S ... Consensus

  14. Error-Tolerant ApproachSheikh et al. 08 S2 Sk S ... Locus 2 Locus 1 Locus 3 Locusl Sibling Reconstruction Algorithm ... Consensus S1

  15. Distance-based Consensus fq S S2 S1 Sk Ss fd • Algorithm • Compute a consensus solution S={g1,...,gk} • Search for a goodsolution nearS fq fd Search Consensus ... NP-hard for any fd, fq or an arbitrary linear combination Sheikh et al. ‘08

  16. A Greedy Approach - Algorithm Compute a strict consensus While total distance is not too large Merge two sibgroups with minimal (total) distance Quality: fq=n-|C| Distance function from solution C to C’ fd(C,C’) =sum of costs of merging groups in C to obtain C’ =sum of costs of assigning individuals to groups Cost of assigning individual to a group:‏ Benefit: Alleles and allele pairs shared Cost: Minimum Edit Distance

  17. Auto Greedy Consensus • Change costs to average per locus costs • Compare max group error on per locus basis • Treat cost and benefit independently • In order to qualify a merge • Cost <= maxcost • Benefit >= minbenefit • Benefit = max benefit among possible merges

  18. A Greedy Approach • S1 = { {1,2,3},{4,5},{6,7} } • S2 = { {1,2,3},{4}, {5,6,7} } • S3 = { {1,2},{3,4,5}, {6,7} } Strict Consensus S = { {1,2}, {3}, {4}, {5}, {6,7} } S = { {1,2}, {3}, {4}, {5}, {6,7}} S={ {1,2}, {3,6,7}, {4}, {5} }

  19. Testing and Validation: Protocol • Get a dataset with known sibgroups(real or simulated) • Find sibgroups using our alg • Compare the solutions • Partition distrance, Gusfield ’03 = assignment problem • Compare to other sibship methods • Family Finder, COLONY

  20. Salmon (Salmosalar) - Herbingeret al., 1999 351 individuals, 6 families, 4 loci. No missing alleles Shrimp (Penaeusmonodon) - Jerry et al., 200659 individuals,13 families, 7 loci. Some missing alleles Ants (Leptothoraxacervorum )- Hammond et al., 2001Antsare haplodiploid species. The data consists of 377 worker diploid ants Test Data Simulated populations of juveniles for a range of values of number of parents, offspring per parent, alleles, per locus, number of loci, and the distributions of those.

  21. Experimental Protocol Generate F females and M males (F=M=5, 10, 20) Each with l loci (l=2, 4, 6,8,10) Each locus with aalleles (a=10, 15) Generate f families (f=5,10,20) For each family select female+male uniformly at random For each parent pair generate o offspring(o=5,10) For each offspring for each locus choose allele outcome uniformly at random Introduce random errors

  22. Results

  23. Results

  24. Conclusions • Combinatorial algorithms with minimal assumptions • Behaves well on real and simulated data • Better than others with few loci, few large families • Error tolerant • Useful, high demand New and improved: • Efficient implementation Perez-Rathlke et al. (in submission) • Other objectives (bio vs math) Ashley et al. ‘10 • Other genealogical relationshipsSheikh et al. ‘09, ’10 • Different combinatorial approach Brown & B-W, ‘10 • Pedigree amalgamation

More Related