1 / 67

Molecular Phylogenetics

Molecular Phylogenetics. Phylogenetic trees are about visualizing evolutionary relationships. “Nothing in Biology Makes Sense Except in the Light of Evolution” Theodosius Dobzhansky (1900-1975). Phylogeny. Hypothesis of evolutionary relationships

Télécharger la présentation

Molecular Phylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Molecular Phylogenetics

  2. Phylogenetic trees are about visualizing evolutionary relationships “Nothing in Biology Makes Sense Except in the Light of Evolution” Theodosius Dobzhansky (1900-1975)

  3. Phylogeny • Hypothesis of evolutionary relationships • Phylogenetic tree = graphical summary of evolutionary history • We have been using trees throughout the semester • Now we will examine how to construct them • Phylogeny is only an estimate

  4. Phylogenetics • Under Darwin’s hypothesis of common descent  Species in the same genus stem from a recent ancestor • Hierarchical classification reflects not a mystical ordering of the universe, but rather a real historical process

  5. Phylogenies • Species tree (how are my species related?) • contains only one representative from each species • when did speciation take place? • all nodes indicate speciation events • Gene tree (how are my genes related?) • normally contains a number of genes from a single species • nodes relate either to speciation or gene duplication events

  6. Phylogenetic Trees • Diagram consisting of branches and nodes A B C D E terminal node interior node split (bipartition) also written AB|CDE or portrayed **--- branch (edge) root of tree

  7. Unrooted vs. rooted trees

  8. Rooting a Phylogeny • Several methods used to identify polarity • Most commonly used is the outgroup method • The character state of the target taxa is compared with that of a relative that diverged earlier • Outgroup represents the ancestral state • Identify outgroup from other phylogenetic studies or fossil data • Good to use several outgroups at once

  9. Rooting Using an Outgroup • The outgroup should be a sequence (or set of sequences or taxon) known to be less closely related to the rest of the sequences (taxa) than they are to each other 2. It should ideally be as closely related as possible to the rest of the sequences (taxa) while still satisfying condition 1 The root must be somewhere between the outgroup and the rest (either on the node or in a branch) The POINT of rooting (using an outgroup) is to include the ancestor of the group of interest in the phylogeny!

  10. Terms • Clade: A set of species (or sequences) which includes all of the species (or sequences) derived from a single common ancestor • Monophyly • Polyphyly • Paraphyly

  11. Cladograms VS. Phylograms • Cladogram • Only shows you the relationships between taxa • Branch lengths provide no data! • Phylogram • Shows you relationships AND the amount of change (evolution) inferred along each branch • Therefore, branch lengths are very important!

  12. Cladogram

  13. Phylogram [sometimes Phenogram] (branch lengths mean something)

  14. Cladograms VS. Phylograms Species A Species A Species B Species B Species C Species C Species D Species D Species E Species E Species F Species F 5 changes

  15. Phylogenetics Terms • Monophyletic Group • All members are believed to stem from a single common ancestor, and the group includes this common ancestor • Paraphyletic Group • Group that is monophyletic except that some descendents of the common ancestor have been removed • Polyphyletic Group • consisting of unrelated lineages, each more closely related to other lineages not placed in the taxon

  16. Cladistic Methods • Techniques that identify monophyletic groups based on synapomorphies • Synapomorphies define evolutionary branching points • Autapomorphies and ancestral characters do not • Must be able to identify homology of traits and direction of change through time (Polarization)

  17. Homology • The features of organisms almost always evolve from pre-existing features of their ancestors • Unlikely that features arise de novo from nothing…

  18. Homology • Homologous features are derived from a common ancestor • Organs of 2 organisms are homologous if they have been inherited (& perhaps modified) from a single organ of a common ancestor • A character may be homologous among species but a character state may not • 5 toed state is homologous in humans and lizards but the 3 toed state is not homologous in Guinea pigs and Sloths • The wings of birds and those of bats are not homologous, although their forelimbs in general are homologous structures (convergent evolution)

  19. Maximum Parsimony (Cladistic) Occam’s Razor Entia non sunt multiplicanda praeter necessitatem. William of Occam (1300-1349) The best tree is the one which requires the least number of substitutions

  20. Parsimony and Phylogeny • Most closely related taxa should have the most traits in common • Assume that traits are independent, heritable, and variable in target taxa • Traits may be DNA sequence, presence or absence of skeletal elements or floral parts, mode of embryonic development, etc. • Traits scored in different taxa must be homologous

  21. Parsimony and Phylogeny • Shared derived characters (ONLY) are used to deduce the branching patterns of the tree • Synapomorphy • Synapomorphies are used to attach two branches at a NODE on the tree

  22. Molecular Synapomorphies

  23. Molecular Synapomorphies

  24. Parsimony and Phylogeny • Traits may revert to ancestral form because of mutation or selection • This may destroy phylogenetic signal and lead to reconstruction of misleading relationships • Reversal • Convergence and Reversal and collectively known as Homoplasy

  25. Molecular Homoplasy via Reversal

  26. Parsimony and Phylogeny • Homoplasy • Creates noise in the data • Some characters give conflicting information about relationships • Systematists try to minimize homoplasy in a data set • Choose characters that evolve slowly relative to age of taxa

  27. Parsimony and Phylogeny • Parsimony minimizes total amount of evolutionary change in a tree • Synapomorphies are usually more common than convergence and reversal • Most parsimonious trees minimize homoplasy to give best estimate of phylogeny

  28. Fitch (equal-weighted) parsimony Data for site 1 shown on tree topology for all 16 possible combinations of states at the 2 interior nodes. Character length is 2 for this site.

  29. A A A C B B C B D D D C 237 241 225 (best) (worst) Tree length (or tree score) Total steps = 2 + 1 + 2 + 2 + . . . + 1 = 237 Character length from site 1 Character length from site 2 This value is used to compare this tree topology to other tree topologies (smaller is better)

  30. Phylogenetic Characters • Which characters should be used to reconstruct the correct phylogeny? • Morphological characters • ie, Skeleton • For fossils only morphological characters can be used • Morphological characters difficult to use because taxonomic expert needed • Molecular characters • Allozymes, RFLPs, DNA sequences • MUST CHOOSE MOLECULAR MARKER THAT IS APPROPRIATE • Best molecular marker is one which has plenty of variation (=phylogenetic signal) yet not too much homoplasy (not too variable!).

  31. Phylogenetic Characters • Which characters should be used to reconstruct the phylogeny? • Molecular data has the advantage that they can be rapidly collected and scored • However, homoplasy difficult to indentify • Only four bases: G, A, T, C • Multiple types of data (including multiple gene sequences) often the best

  32. What sequences should I use for organism phylogenies? • Slowly evolving / Fast evolving • rRNA • mitochondrion • Nuclear • chloroplast

  33. Other Phylogenetic methods Parsimony is not the only method for estimating phylogenetic relationships!…

  34. Some pitfalls of Parsimony… • It can take quite a long time to compute a Parsimony estimate of a phylogeny… • Also, parsimony may be very error prone when: • rates of evolution are variable • very divergent species (or OTUs) are compares because it does not deal well with accounting for homoplasy…

  35. Other Phylogenetic methods • Other reconstruction methods • Distance (Phenetic) methods • e.g.: Neighbor joining and UPGMA • Based on clustering technique • Based on overall similarity • Not a cladistic method • Uses differences (distances) among character states to group taxa

  36. Using Distance Methods to Reconstruct Phylogenetic Relationships Species with the LEAST genetic distance (or other distance) between them are assumed to be CLOSE relatives However, there are MANY cases where this may NOT be true!

  37. Distance-Based Methods (UPGMA, Neighbor Joining, etc..) • Distance methods are typically very very fast and easy to use to estimate a phylogenetic tree • However, they are not cladistic because they do not look for synapomorphies, but rather overall similarity… • This means this method is also susceptible to lots of error when a dataset has lots of homoplasy…

  38. Distance methods • Normally fast and simple • e.g. UPGMA, Neighbour Joining, Minimum Evolution, Fitch-Margoliash

  39. Correction for multiple hits • Only differences can be observed directly – not distances • All distance methods rely (crucially) on this • A great many models used for nucleotide sequences (e.g. JC, K2P, HKY, Rev, Maximum Likelihood) • AA sequences are infinitely more complicated! • Accuracy falls off drastically for highly divergent sequences

  40. Distance methods Attempts to account for multiple hits using models in distance methods (observed vs. estimated amount of evol. distance)

  41. Other Phylogenetic methods • Maximum likelihood assumes a particular model of sequence evolution and calculates how likely each branch arose based on the character data • Uses all data, even autapomorphies and invariant sites • Uses models of evolution designed to capture a pattern of change across characters (e.g., DNA) • Allows us to account for complex patterns of nucleotide evolution across regions of genes that may evolve very differently (thus, not all types of changes are weighted evenly in determining the phylogeny…) Lets look at an example… although we will save more heated discussions of patterns for Bayesian MCMCMC methods….

  42. Within vs. Between Gene Variation Transversions 1.8 Gene 1 Gene 2 C-G Relative Rate of Substitution (G-T = 1) A-C A-T 0 Length Along Genome

  43. Maximum Likelihood Methods • Likelihood methods are among the most accurate methods to reconstruct phylogenies! • However, they are VERY VERY computationally intensive a tree with 30 species may take several days, with 100 species may take several months! • New likelihood methods employing Bayesian statistics along with Marcov Chain Monte Carlo algorithms are helping to solve this problem and are the cutting edge of phylogeny reconstruction these days…

  44. Likelihood Methods • Requires a model of evolution • Each substitution has an associated likelihood given a branch of a certain length • A function is derived to represent the likelihood of the data given the tree, branch-lengths and additional parameter • So, the tree we get from ML is “the phylogeny that is most likely to have produced the observed data (under the model of evolution selected)”

  45. The Likelihood Criterion Given two trees, the one maximizing the probability of the observed data is best • Site likelihood - probability of the data for one site conditional on the assumed model of evolution • Tree score - sum of site log-likelihoods (term score also general term for the derivative of the lnL) • Unlike parsimony tree lengths, log-likelihoods are comparable across models as well as trees

  46. Models can be made more parameter rich to increase their realism • The most common additional parameters are: • A correction to allow different substitution rates for each type of nucleotide change • A correction for the proportion of sites which are unable to change • A correction for variable site rates at those sites which can change • The values of the additional parameters will be estimated in the process (e.g. PAUP)

  47. A gamma distribution can be used to model site rate heterogeneity

  48. Long Branches Attract In a set of sequences evolving at different rates the sequences evolving rapidly are drawn together Distance methods are VERY VERY prone to making this error Parsimony is also prone to this error Likelihood methods employ an ‘informed’ view of character change (a model) which helps identify situations which probably represent homoplasy, thus decreasing LBA

  49. Phylogenetic Methods… • It is useful to use a variety of tree reconstruction methods • If methods are congruent you have more confidence in your reconstructions!

More Related