160 likes | 296 Vues
This lecture (CS566, Lecture 13) delves into the field of phylogenetics, emphasizing its foundational motivation – understanding the evolutionary relationships between sequences and species. Key concepts cover the tree-like structures of species' evolution, the significance of mutations over generations, and how phylogenetic trees serve as guides for multiple sequence alignment. Various algorithms such as Parsimony, UPGMA, and Neighbor Joining are discussed, alongside considerations like molecular clocks and heuristic approaches to tree generation. The aim is to provide insights into the methodologies used to reconstruct evolutionary histories.
E N D
Phylogenetics • Motivation • Concepts • Algorithms Lecture 13 CS566
Motivation • Life arose just once - “Thou art my brethren but a few sequences removed” • Phylogenetic trees = Topologies of evolutionary relationships between sequences, and, possibly, species - “Is it man or cow that is the true heir of the fabulous treasures of the woolly mammoth dynasty?” • Phylogenetic tree as guide tree for multiple sequence alignment (déjà vu) Lecture 13 CS566
Concepts • Mutation and Evolution • Mutations that persist over generations = Evolution • Tree, not a lattice • Each species arose just once • Speciesphylogeny (often) != Sequence phylogeny • Sequences evolve at different rates • Within a single species • Between different species • Within a single sequence • Especially in bacteria, horizontal transfer (“Napster’s been around for ages”) quite common Lecture 13 CS566
Concepts • Molecular clock assumption • Sequences drift apart at a constant rate • Aka edge length proportional to time • Aka satisfaction of ultrametricity • For any 3 sequences (all pair-wise distances are equal) xor (2 distances are equal, and the third one smaller) • If true, then • All path lengths from root to leaf nodes are equal • Additivity • Distance metric chosen is • True distance (fulfils triangular inequality) • Such that cumulative sum of edge lengths along path between 2 sequences equals the distance between 2 sequences Lecture 13 CS566
Concepts • Heuristic forays into intractable space • Start with pairwise “distances” • Path length = Distance (~Evolutionary time) • Work from leaves to node to generate tree • (opposite of binary tree generation) • “Its easier to be rootless than to be rooted” • Binary tree approximation of higher order trees • Edges do not imply direct links (Missing links/incomplete data), only a representation of sequence evolution Lecture 13 CS566
Algorithms • Parsimony (Character-based) • Distance based methods • Neighbor joining • UPGMA • Maximum Likelihood IIncreasing Sequence Similarity Lecture 13 CS566
Algorithms • UPGMA (Unweighted pair group method with arithmetic averages) • Caveat if molecular clock not applicable: “If my cousin looks more like me than my brother, he must be my lost brother, and perhaps my brother my cousin?” • Neighbor joining • “Give me additive distances, and I shall give thee a tree, even if some sequences morph faster than others” • Parsimony • “Its just a bruise, not Kaposi’s sarcoma!” • Maximum Likelihood • “Given the facts, Watson, the answer is elementary!” Lecture 13 CS566
UPGMA • Easiest to use if molecular clock and additivity are valid • No. of clusters = no. of sequences = no. of leaf nodes • Inter cluster distance = Average pairwise distance {While (no. of clusters > 1) • Connect pair of closest clusters (at distance d) with intermediate node at distance d/2 from each of them} • Caveat: Satisfies minimal distance requirement, but may result in spurious topologies – because of constant rate evolution assumption Lecture 13 CS566
Parsimony • Parsimony (“Miserliness in model space”): Pick the simplest explanation that fits the facts - “If I hear a blood-curdling scream, it’s just one of my sons trying to kill the other – not an invasion by aliens!” • Every possible tree evaluated in terms of total number of steps needed to convert each sequence to another • Practical for only a few sequences • High percentage of similarity a prerequisite • Neither identical or ‘completely different’ sequence positions useful • Each difference should represent a single step (WYSIWYG) and not a ‘full circle’ or ‘non-shortest route’ Lecture 13 CS566
Parsimony 123456789………… • ACCEFAHIKLKNPR • ACCEFGHILLLNPR • ACDEFGHIKLINPK • AADEFGHILLNNPK * * * 1 C 3 D D C 2 C 4 D Candidate tree for position 3 Lecture 13 CS566
Parsimony • 3 sets of 3 trees each compared • The one with lowest total number of substitutions selected • Refinements: • Branch and bound: • Abandon a tree if subtree has a higher score than current minimal score tree • Heuristic branch-pattern representatives • Non-boolean costs: Tranversion > transition OR use of amino-acid substitution matrices Lecture 13 CS566
Neighbor Joining • Generates unrooted tree, allowing for unequal branches • Given: Distance matrix for sequences • Steps: Repeat 1-3 till all branches generated • Take closest sequences i, j • Find branch lengths between i and j by treating remaining sequences as composite (c) • Calculate average i-C and j-C distances • Calculate branch lengths i and j • Treat ij as composite sequence now and generate new distance table. • Generate multiple trees by starting with different pairs • Compare resulting trees in terms of best fit to original distance matrix Lecture 13 CS566
Rooting trees • Based on a “proxy ancestor” • Include a distant relative (“outgroup”) as the proxy ancestor • Add the outgroup as the last node • Point of attachment of outgroup represents root • Diameter center • Place root at center of longest path through tree Lecture 13 CS566
Summary • Parsimony and ML based approaches computationally intensive – scalability poor • Neighbor joining adequate if additivity assumption is valid • UPGMA adequate if both molecular clock and additivity assumptions are valid for given set of sequences Lecture 13 CS566
Summary • Phylogenetics useful to understand sequence evolution • Phylogenetics makes sense for • sequences with a high percentage of sequence identity • sequences not subject to ‘selection’ • Sequence tree not the same as species tree Lecture 13 CS566