1 / 54

ABC

ABC. The method: practical overview. Index. Applications of ABC in population genetics Motivation for the application of ABC ABC approach Characteristics of an ABC methodology Algorithm of an ABC inference Limitations of the ABC approach Typical ABC run Present work

hammer
Télécharger la présentation

ABC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ABC The method: practical overview

  2. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  3. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  4. Popanc Pop1 Pop2 Pop3 Pop4 • Application of ABC in population genetics

  5. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  6. Motivation for the application of ABC • Two processes are usually considered important in determining population structure: - Gene flow; - Population splitting. • Most often these processes are modelled and inferred separately; • Recent advances by Nielsen and Wakeley (2001) and Hey and Nielsen (2004) for two-population scenario using Markov Chain Monte Carlo (MCMC) can study both processes at the same time; • An Approximate Bayesian Computation (ABC) method developed by (Beaumont, 2006) deals with the same problem but in a three-population scenario. The idea is to avoid problems associated with MCMC such as poor-mixing and long convergence times. But it relies in a couple of approximations. The aim of this study is to see how good these approximations are.

  7. Motivation for the application of ABC Background using MCMC: • Wakeley, Hey (1997, Genetics) - developed an algorithm to estimate historic demographic parameters. • Nielsen, Wakeley (2001, Genetics) - developed a MCMC algorithm to infer about demographic parameters in a “Isolation with Migration” model. • Hey, Nielsen (2004, Genetics) - presents the IM program (software that uses the MCMC algorithm previously developed). • Hey et al (2004, Mol. Ecol.) - introduce changes in IM software (HapSTR data can be used). • Won, Hey (2005, Mol. Biol. Evol.) - presents a case study in 3 populations of chimpanzees. • Hey (2005, PLoS. Biol.) – the peopling of the Americas. Introduce changes in IM software (founder population size can be inferred).

  8. Motivation for the application of ABC Background using ABC: • Tavaré et al. (1997, Genetics) – presented a simulation based-algorithm to infer about specific demographic parameters • Pritchard et al. (1999, MBE) - introduce the first ABC approach with a rejection method step to estimate demographic parameters. • Beaumont et al. (2002, Genetics) – introduce a regression method within a ABC framework to estimate demographic parameters. • Marjoram et al (2003, PNAS) – uses MCMC without likelihoods within an ABC framework. • Beaumont (2006, “Simulation, Genetics, and Human Prehistory”) - uses regression based ABC to estimate demographic parameters within a “Isolation with Migration” model for microsatellites in three populations. • Hickerson et al (2006, in press) – compares ABC with IM in two-population studies for sequence data.

  9. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  10. ABC approach • Characteristics of an ABC methodology Replace the data with summary statistics: • Summarize a large amount of data into a few representative values • By replacing the data with summary statistics, it is easier to decide how ‘similar’ data sets are to each other. Get the posterior distribution by sampling values from it: • Simulate samplesFi, Difrom the joint densityp(F,D): • First sample from the prior: Fi ~ p(F) • Then simulate the data, given Fi: Di ~ p(D | Fi) • The posterior distribution, • p(F|D) = p(D,F) / p(D) , for any givenD, • can be estimate by the proportion of all simulated points that correspond to that particularDand Fdivided by the proportion of points corresponding toD(ignoringF).

  11. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Bayesian inference on population genetics • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  12. Parameter, F SummStats, S • ABC approach • Algorithm of an ABC inference Obtained genetic data F1 Get summary statistics(S) F2 F3 F4 s’ in (Nordborg, 2001) Joint distribution (S,F) Set of priors (F)

  13. Parameter, F1 SummStats, S • ABC approach • Algorithm of an ABC inference By extracting the points near the real data set we obtain the posterior: p s’ Joint distribution (S,F) Posterior distribution – p(F1| S=s’)

  14. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  15. ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step • Limitations

  16. ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step • Limitations

  17. ABC approach Limitation on the number of summary statistics used F F S2 s’2 s’ (F, S1 = s’1, S2 = s’2) S s’ (F, S = s’) s’1 S1 Summary Statistics = 2 Summary Statistics = 1

  18. ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step • Limitations

  19. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Bayesian inference on population genetics • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  20. ABC approach • Typical ABC run Step3 - estimation Step1 - simulation Step2 – getting posterior distribution • Choosing the priors • Choosing the summary statistics • Choosing a “rejection” method of the simulated data

  21. Parameter, F SummStats, S • ABC approach • Typical ABC run Rejection method (Pritchard et al, 1999): d - tolerance Posterior distribution – p(F| S) s’ – “real” data

  22. Parameter, F SummStats, S • ABC approach • Typical ABC run Local Linear Multiple Regression adjustment and Weighting (Beaumont et al, 2002): Regression s’ - “real” data Posterior distribution – p(F| S) Weighting

  23. ABC approach • Typical ABC run E [P(F|S=s)] Correlation coefficients vector Linear multiple regression: Vector of standardized summstats Local weighting We want to minimize Least square error where Spherical acceptance region Epanechnikov kernel

  24. ABC approach • Typical ABC run Least squares gives an estimate of the posterior mean To obtain samples from the posterior distribution we adjust the parameter values as I.e. we are assuming that the conditional mean of the parameter is a linear function of the summary statistics, but all other moments remain the same.

  25. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  26. Present Work Neanc Popanc m tev1 One simple case: • 6 parameters to be estimated + m (mutation rate) m2 t m1 Ne2 Ne1 Pop1 Pop2

  27. Present Work Summary Statistics used • Sequence Data: • mean of pairwise differences • in each population • both populations joined together • number of segregating sites • in each population • both populations joined together • number of haplotypes • in each population • both populations joined together

  28. Present Work Simulated “real” data and Prior information 1000 1000 1000 0.01 0.01 500 0 10000 0 10000 0 10000 0 0.05 0 0.05 0 5000 Ne1 Ne2 Neanc Mig1 Mig2 Tev “real” data ABC method prior distribution MCMC method

  29. Present Work Ne1 – no migration: sim1 sim2 sim3 sim4 sim5 sim6 sim7 sim8 sim9 sim10

  30. Present Work Ne2 – no migration: sim1 sim2 sim3 sim4 sim5 sim6 sim7 sim8 sim9 sim10

  31. Present Work Neanc – no migration: sim1 sim2 sim3 sim4 sim5 sim6 sim7 sim8 sim9 sim10

  32. Present Work Te1 – no migration: sim1 sim2 sim3 sim4 sim5 sim6 sim7 sim8 sim9 sim10

  33. Present Work ABC vs MCMC: Data 1 (no migration); Simulation 7: Ne1 Ne2 Neanc Tev Data 2 (migration = 0.01); Simulation 9: Ne1 Ne2 Neanc Mig1 Mig2 Tev

  34. Present Work ABC vs MCMC (500 000 iter, tol=0.02): MISE: No migration MISE: Migration = 0.01

  35. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  36. Present Work Summary Statistics used • Sequence Data: • mean of pairwise differences • in each population • both populations joined together • number of segregating sites • in each population • both populations joined together • number of haplotypes • in each population • both populations joined together • variance of pairwise differences • in each population • both populations joined together • Shanon’s index • in each population • both populations joined together • number of singletons • in each population • both populations joined together

  37. Present Work Simulated “real” data and Prior information 1000 1000 1000 0.01 0.01 500 0 10000 0 10000 0 10000 0 0.05 0 0.05 0 5000 Ne1 Ne2 Neanc Mig1 Mig2 Tev previous + Shanon’s MCMC based method “real” data standard previous + singletons prior distribution previous + var pairwise dif

  38. Present Work Summary Statistics (500 000 iter, tol=0.02): Data 1 (no migration); Simulation 7: Ne1 Ne2 Neanc Tev Data 2 (migration = 0.01); Simulation 9: Ne1 Ne2 Neanc Mig1 Mig2 Tev

  39. Present Work Summary Statistics (7 000 000 iter, tol=0.02): Data 1 (no migration); Simulation 7: Ne1 Ne2 Neanc Tev Data 2 (migration = 0.01); Simulation 9: Ne1 Ne2 Neanc Mig1 Mig2 Tev

  40. Present Work Summary Statistics (7 000 000 iter, tol=0.02): MISE: No migration MISE: Migration = 0.01

  41. Present Work Summary Statistics (7 000 000 iter, tol=0.02): Adjusted R2: No migration Adjusted R2: Migration = 0.01

  42. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with MCMC • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  43. Three populations model Neanc1 Popanc1 tev1 m manc Neanc2 m3 Popanc2 • 11 parameters to be estimated + topology + m (mutation rate) tev2 m1 m2 Ne2 Ne3 Ne1 Pop3 Pop1 Pop2

  44. Present Work Simulated “real” data and Prior information 0.01 1000 1000 1000 1000 1000 0 0.05 0 10000 0 10000 0 10000 0 10000 0 10000 Mig1 Ne1 Ne2 Ne3 Neanc2 Neanc1 1500 0.01 0.01 0.01 500 free top fixed top 0 5000 0 0.05 0 0.05 0 0.05 0 5000 Tev1 Mig2 Mig3 Miganc Tev2

  45. Present Work Three Populations model (no migration): Data 1 (no migration); Simulation 7: Ne1 Ne2 Ne3 Neanc2 Neanc1 Topology: (2,3)1) Tev1 Tev2

  46. Present Work Three Populations model (migration = 0.01): Data 2 (migration = 0.01); Simulation 6: Mig1 Ne1 Ne2 Ne3 Neanc2 Neanc1 Topology: (1,2)3) Tev1 Mig2 Mig3 Miganc Tev2

  47. Present Work Three Populations model (500 000 iter, tol=0.02): No migration: Migration = 0.01:

  48. Conclusions: • ABC up to 2 orders of magnitude faster for single locus • ABC modes are similar to MCMC but overall precision is lower • No substantial improvement with more summary statistics • No substantial improvement with more iterations • ABC is able to consider more complex scenarios, but ability to infer parameters is reduced when considering migration

  49. Index • Applications of ABC in population genetics • Motivation for the application of ABC • ABC approach • Characteristics of an ABC methodology • Algorithm of an ABC inference • Limitations of the ABC approach • Typical ABC run • Present work • Compare the ABC algorithm with a MCMC one • Study the use of different summary statistics • Study the use of ABC in more complex scenario • “State of art” of the software • Future developments

  50. The user-friendly version of the program (initial stage) • Present Work • Features of the program • Use of heredity scalars for each locus • Use different types of DNA data at the same time (Microsatellite and DNA sequence) • Use an unlimited number of populations within an IM model • Use of different combinations of 7 different summary statistics for each DNA data type • Freeware and source code available (soon)

More Related