1 / 45

BCB 444/544

BCB 444/544. Lecture 31 Phylogenetics – Character-Based Methods #31_Nov05. Required Reading ( before lecture). Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML Chp 11 - pp 142 – 169

welcome
Télécharger la présentation

BCB 444/544

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BCB 444/544 Lecture 31 Phylogenetics – Character-Based Methods #31_Nov05 BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  2. Required Reading (before lecture) Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods • Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML • Chp 11 - pp 142 – 169 Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33 Functional and Comparative Genomics • Chp 17 and Chp 18 BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  3. Assignments & Announcements Mon Oct 29 - HW#5 HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  4. BCB 544 Only: New Homework Assignment 544 Extra#2 Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2 Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  5. Seminars this Week BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html • Nov 7 Wed - BBMB Seminar 4:10 in 1414 MBB • Sharon Roth Dent MD Anderson Cancer Center • Role of chromatin and chromatin modifying proteins in regulating gene expression • Nov 8 Thurs - BBMB Seminar 4:10 in 1414 MBB • Jianzhi George Zhang U. Michigan • Evolution of new functions for proteins • Nov 9 Fri - BCB Faculty Seminar2:10 in 102 SciI • Amy AndreottiISU • Something about NMR BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  6. Chp 11 – Phylogenetic Tree Construction Methods and Programs SECTION IV MOLECULAR PHYLOGENETICS Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs • Distance-Based Methods • Character-Based Methods • Phylogenetic Tree Evaluation • Phylogenetic Programs BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  7. Tree Construction • Two main categories of tree building methods • Distance-based • Overall similarity between sequences • Character-based • Consider the entire MSA BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  8. Summary of Distance-Based Methods • Clustering-based methods: • Computationally very fast and can handle large datasets that other methods cannot • Not guaranteed to find the best tree • Optimality-based methods: • Better overall accuracies • Computationally slow • All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  9. Character-Based Methods • Based directly on the sequence characters in the MSA rather than overall distances • Count mutational events accumulated on sequences • Evolutionary dynamics of each character can be studied and ancestral sequences inferred • Two popular approaches • Parsimony • Maximum Likelihood (ML) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  10. Parsimony • Parsimony is based on Occam’s Razor – the simplest explanation is most likely correct • Goal: Find the tree that allows evolution of the sequences with the fewest changes BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  11. Parsimony • Parsimony score of a tree: The smallest (weighted) number of steps required by the tree • Two parsimony problems: • Large Parsimony problem: Find the tree with the lowest parsimony score • Small Parsimony problem: Given a tree, find its parsimony score • Use the small parsimony problem to solve the large parsimony problem BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  12. Algorithms for Small Parsimony • Fitch’s algorithm: • Based on set operations • Evolutionary steps have the same weight • Sankoff’s algorithm: • Based on dynamic programming • Allows steps to have different weights • Both algorithms compute the minimum (weighted) number of steps a tree requires at a given site BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  13. Fitch’s Algorithm Example BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  14. Sankoff’s Algorithm • Allows for different weights for different evolutionary steps • Transitions (A <-> G or C <-> T) are more probable than transversions, so give a lower weight to transitions BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  15. Sankoff’s Algorithm Example BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  16. Sankoff’s Algorithm Traceback BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  17. Searching for a Most Parsimonious Tree • Solving the large parsimony problem requires searching all possible trees (or does it?) • Exhaustive search (exact) • Branch-and-Bound (exact) • Heuristic search methods (not exact) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  18. Exhaustive Search • Build the only possible unrooted tree for three taxa (can be randomly chosen) • Try all possible places to add the fourth taxon and score each tree • Try all places to add the fifth taxon to the trees and score again … BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  19. Why Finding a True Tree is Difficult Number of rooted trees • The number of possible trees grows exponentially with the number of species (or sequences) • Nr = (2n -3)!/2(n-2)(n-2)! • Nu = (2n -5)!/2(n-3)(n-3)! • To find the best tree, you must explore all possibilities (or must you?) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  20. Adding the Fourth Taxon BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  21. Adding the Fifth Taxon BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  22. BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  23. Branch and Bound • Similar to exhaustive search except that we maintain the score of best tree obtained so far • If score of current tree exceeds the current best score, backtrack and take next available path • Main idea: The parsimony score of a tree can only increase as we add another taxa BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  24. Branch and Bound • When a tip of the search tree is reached the tree is either optimal (and retained) or suboptimal (and rejected) • When all paths leading from the initial 3 taxon tree have been explored, the algorithm terminates, and all most parsimonious trees will have been identified BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  25. Branch and Bound BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  26. Branch and Bound • One way to find a reasonable lower bound quickly: • Use UPGMA or NJ to build a complete tree • Calculate the parsimony score of this tree and use it as a lower bound in our search BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  27. Heuristic Search • Shortcuts have been designed to reduce the search space • Idea: Build a tree quickly (by NJ or some other fast method) and rearrange parts of it to explore some of the possible trees • Branch swapping • Nearest neighbor interchange • Subtree pruning and regrafting • Tree bisection and reconnection BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  28. Nearest-Neighbor Interchange BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  29. Subtree Pruning and Regrafting BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  30. Tree Bisection and Reconnection BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  31. Stepwise Addition – Another Heuristic • A greedy method • Start with 3 taxon tree • Add one taxon at a time • Keep only the best tree found so far • No guarantee of optimality, but may provide a good starting point for a search BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  32. Maximum Likelihood Method • ML is based on a Markov model of evolution • Observed: The species labeling the leaves • Hidden: The ancestral states • Transition probabilities: The mutation probabilities • Assumptions: • Only mutations are allowed • Sites are independent BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  33. Models of Evolution at a Site • Transition probability matrix: M = [mij], i,j {A,C,T,G} Where mij = Prob(i -> j mutation in 1 time unit) Branches may have different lengths BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  34. The Probability of an Assignment T G T A G C T Probability = mTG · mGA · mGG· mTT· mTC· mTT BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  35. Ancestral Reconstruction: Most Likely Assignment X Y Z A G C T L* = maxX,Y,Z {mXY · mYA · mYG· mXZ· mZC· mZT} Compute using Viterbi algorithm BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  36. Likelihood of a Tree X Y Z A G C T L* = X,Y,Z {mXY · mYA · mYG· mXZ· mZC· mZT} Compute using forward algorithm BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  37. Maximum Likelihood Comments • ML is robust • ML converges to the correct answer as more data is added • Can put in a Bayesian statistical framework to obtain a distribution of possible phylogenies • ML can be slow BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  38. Phylogenetic Tree Evaluation • Bootstrapping • Jackknifing • Bayesian Simulation • Statistical difference tests (are two trees significantly different?) • Kishino-Hasegawa Test (paired t-test) • Shimodaira-Hasegawa Test (χ2 test) BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  39. Bootstrapping • A bootstrap sample is obtained by sampling sites randomly with replacement • Obtain a data matrix with same number of taxa and number of characters as original one • Construct trees for samples • For each branch in original tree, compute fraction of bootstrap samples in which that branch appears • Assigns a bootstrap support value to each branch • Idea: If a grouping has a lot of support, it will be supported by at least some positions in most of the bootstrap samples BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  40. Bootstrapping Comments • Bootstrapping doesn’t really assess the accuracy of a tree, only indicates the consistency of the data • To get reliable statistics, bootstrapping needs to be done on your tree 500 – 1000 times, this is a big problem if your tree took a few days to construct BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  41. Jackknifing • Another resampling technique • Randomly delete half of the sites in the dataset • Construct new tree with this smaller dataset, see how often taxa are grouped • Advantage – sites aren’t duplicated • Disadvantage – again really only measuring consistency of the data BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  42. Bayesian Simulation • Using a Bayesian ML method to produce a tree automatically calculates the probability of many trees during the search • Most trees sampled in the Bayesian ML search are near an optimal tree BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  43. Phylogenetic Programs • Huge list at: • http://evolution.genetics.washington.edu/phylip/software.html • PAUP* - one of the most popular programs, commercial, Mac and Unix only, nice user interface • PHYLIP – free, multiplatform, a bit difficult to use but web servers make it easier • WebPhylip – another interface for PHYLIP online BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  44. Phylogenetic Programs • TREE-PUZZLE – uses a heuristic to allow ML on large datasets, also available as a web server • PHYML – web based, uses genetic algorithm • MrBayes – Bayesian program, fast and can handle large datasets, multiplatform download • BAMBE – web based Bayesian program BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

  45. Final Comments on Phylogenetics • No method is perfect • Different methods make very different assumptions • If multiple methods using different assumptions come up with similar results, we should trust the results more than any single method BCB 444/544 F07 ISU Terribilini #31- Phylogenetics - Character-Based Methods

More Related