1 / 25

Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar

Three Weeks of Experience at the formatics Institute. Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar October 23th, 2009. Content. The 10kTrees Project Phylogenetic Targeting Acknowledgements. 1. The 10kTrees Project. Goals.

dooley
Télécharger la présentation

Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Three Weeks of Experience at the formatics Institute Christian Arnold Bioinformatics Group, University of Leipzig Bioinformatics Herbstseminar October 23th, 2009

  2. Content • The 10kTrees Project • Phylogenetic Targeting • Acknowledgements

  3. 1. The 10kTrees Project

  4. Goals • Updated primate phylogeny that includes phylogenetic uncertainty • Use newest available sequence data, include as much primate species as possible, and update regularly • Produce a set of >=10,000 primate-wide trees (with branch lengths) that are appropriate for taxonomically broad comparative research on primate behavior, ecology and morphology using Bayesian methods • Make it accessible to other researchers

  5. Methodology

  6. Version 1 vs. Version 2

  7. Preliminary consensus tree Green: Cercopithecines Blue: Hominoids Red: Platyrrhines Yellow: Tarsiers Brown: Strepsirrhines Rooted with Galeopterus variegatus

  8. The 10kTrees Website http://10ktrees.fas.harvard.edu/

  9. Current Progress • Submitted to Evolutionary Anthropology, in press. • Will be presented at the AAPA conference (April 2010) in Albuquerque, New Mexico • Version 2 is almost finished • Available at http://10kTrees.fas.harvard.edu

  10. Summary • Bayesian approach is time-consuming, but works well, even though data matrix is very sparse • Increased number of sequences in Version 2 dramatically reduces need for constraints and improves quality of tree and branch lengths estimates • Ongoing project • Total number of downloaded trees since June 2009: 95800

  11. 2. Phylogenetic Targeting

  12. Which species should we study?

  13. ? Goals For which species should we collect data in order to increase the size of comparative data sets ?

  14. Example 1/2 • Hypothesis: Two characters (x and y) show correlated evolution • Goal: Test this hypothesis comparatively (e.g. by using phylogenetically independent contrasts and correlation tests) • Problem 1: Data has been only collected for x, but not for y • Solution 1: Collect data for y and test hypothesis • Problem 2: From which species should we collect data for y? • Solution 2: Phylogenetic targeting!?

  15. Example 2/2 Brain size Cognitive data 4 ? 9 7 10 ? 3 ? 2 ? Collecting new data is time-consuming and expensive…

  16. Methods • Systematically generate all possible pairwise comparisons • For every pairwise comparison, calculate character differences for the two species that form the pair and assign a score • Determine set of phylogenetically independent pairs that maximizes the sum of all selected pair scores (maximal pairing)

  17. Maximal pairing: Example

  18. Decomposition of the maximal pairing Time complexity: , for balanced trees:

  19. Simulation results 1/2 Detecting correlated character evolution, based on selection of 12 species • Random (Rnd) selection of species • Type 1 errors close to nominal level • Power: ~40%, independent of number of taxa • Uses 67% of available variation • Phylogenetic targeting (PT) induced selection of species • Type 1 errors close to nominal level • Power: 67-81%, increases with number of taxa • Uses 89% of available variation

  20. Simulation results 2/2 Fraction of available variation after sampling 12 18 24 Number of selected species PT Rnd PT Rnd PT Rnd PT Rnd

  21. Current Progress • A revised version will be resubmitted to American Naturalist in the not too distant future • TODO: Extend simulations and clarify some issues • Available at http://phylotargeting.fas.harvard.edu

  22. Summary • A focused selection of species can save valuable time and money • Phylogenetic targeting provides a very flexible approach and can address different questions in the context of limited resources • Dynamic programming algorithms are everywhere

  23. 3. Acknowledgements

  24. Thanks! • Harvard University • Max-Planck Institute for Evolutionary Anthropology • University of Leipzig • Charlie Nunn • Luke Matthews • Peter F. Stadler

  25. Any Questions? Thank you for your attention! Questions? If not: Cheers (it’s early, but not too early…)

More Related