1 / 38

Introduction to Phylogenetic Trees

Introduction to Phylogenetic Trees. BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@ biostat.wisc.edu Oct 9 th , 2012. Phylogenetic inference : task d efinition. Given data characterizing a set of species/genes Do

reece
Télécharger la présentation

Introduction to Phylogenetic Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Phylogenetic Trees BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Oct 9th, 2012

  2. Phylogenetic inference:task definition • Given • data characterizing a set of species/genes • Do • infer a phylogenetic tree that accurately characterizes the evolutionary lineages among the species/genes

  3. What is a tree? • undirected case: a graph without cycles • directed case: underlying undirected graph is a tree (sometimes requires indegree(v) ≤ 1 for all v) • Node has one parent (predecessor)

  4. Phylogenetic tree basics • leaves represent things (genes, species, individuals/strains) being compared • the term taxon (taxa plural) is used to refer to these when they represent species and broader classifications of organisms • internal nodes are hypothetical ancestral units • in a rooted tree, path from root to a node represents an evolutionary path • the root represents the common ancestor • an unrooted tree specifies relationships among things, but not from an ancestor

  5. Motivation • Why construct phylogenetic trees? • to understand lineage of various species • to understand how various functions evolved • to inform multiple alignments • to identify what is most conserved/important in some class of sequences • to identify what is under accelerated evolution

  6. Hox genes • Specify body patterning (anterior-posterior patterning). • Exhibit co-linearity. • Homologous genes acting in an apparently homologous way across the animal kingdom. Ferrier & Minguillion, 2003

  7. Example species tree: 29 Mammals Numbers mean # of substitutions per 100 bps. Image fromLindbald-Tohet al., 2011

  8. Genetic Analysis of Lice Supports Direct Contact between Modern and Archaic HumansD. Reed et al., PLoS Biology 2(11), November 2004. • inferred phylogeny of lice species closely parallels accepted phylogeny of their hosts • can phylogeny of lice tell us something about evolution of hosts?

  9. Genetic Analysis of Lice Supports Direct Contact between Modern and Archaic HumansD. Reed et al., PLoS Biology 2(11), November 2004. • a more detailed phylogenetic analysis of human lice species shows two quite separate clades(subtrees) • Lice lineages seem to have diverged when lineage of H. sapiens diverged from extinct human lineage.

  10. Genetic Analysis of Lice Supports Direct Contact between Modern and Archaic HumansD. Reed et al., PLoS Biology 2(11), November 2004. • this phylogeny supports a theory of human evolution in which • H. erectus and the ancestors of H. sapiens had little or no contact for a long period of time • there was contact between H. erectus and H. sapiens as late as 30,000 years ago

  11. Data for building trees • trees can be constructed from various types of data • morphological features (e.g. # legs), fossils • DNA/protein sequences

  12. 5 1 8 7 4 6 2 3 Rooted vs.unrootedtrees 9 8 7 6 4 2 3 5 1 time

  13. Number of possible trees • given n sequences, there are possible unrooted trees • and possible rooted trees

  14. Number of possible trees

  15. Phylogenetic tree approaches • three general types of methods • distance: find tree that accounts for estimated evolutionary distances • parsimony: find the tree that requires minimum number of changes to explain the data • maximum likelihood: find the tree that maximizes the likelihood of the data

  16. Representing distances in rooted and unrootedtrees B C dist(A,C) = 8 dist(A,D) = 5 1.5 1.5 4 4 3 2 1 2.5 1 A E D B C D 1.5 1.5 E A distances represented by summed height of edges to reach common ancestor distances represented by summed length of edges to reach common ancestor

  17. Distance-based approaches • given: an matrix MwhereMijis the distance between taxai andj • do: build an edge-weighted tree such that the distances between leaves i and j correspond to Mij 4 3 2 1 A E D B C

  18. Where do we get distances? • commonly obtained from sequence alignments in alignment of sequence i with sequence j • to consider evolutionary time between sequences:

  19. Distance metrics • properties of a distance metric

  20. The UPGMA method(Unweighted Pair Group Method using Arithmetic Averages) • given ultrametric data, UPGMA will reconstruct the tree T that is consistent with the data • basic idea: • iteratively pick two taxa/clusters and merge them • create new node in tree for merged cluster • distance between clusters and of taxa is defined as • (avg. distance between pairs of taxa from each cluster)

  21. UPGMA algorithm assign each taxon to its own cluster define one leaf for each taxon; place it at height 0 while more than two clusters determine two clusters i, jwith smallest define a new cluster define a node k with children i and j; place it at height replace clusters iand j with k compute distance between k and other clusters join last two clusters, iand j, by root at height

  22. UPGMA • given a new cluster formed by merging and • we can calculate the distance between and any other cluster as follows

  23. 4 3 2 1 A E D B C 4 3 2 1 A E D B C UPGMA example initial state after one merge

  24. 4 3 2 1 A E D B C 4 3 2 4 1 3 2 1 A E D B C A E D B C UPGMA example (cont.) after two merges after three merges final state

  25. UPGMA relies on the molecular clock assumption • Sequences diverge at the same rate at different points in the phylogeny • Distance from any leaf to root is the same.

  26. The molecular clock assumption & ultrametric data • The molecular clock assumption: sequences are diverging at the every point in the phylogeny at the same rate. • This assumption is not generally true: selection pressures vary across time periods, organisms, genes within an organism, regions within a gene • if it does hold, then the data is said to be ultrametric

  27. The molecular clock assumption &ultrametric data • ultrametric data: for any triplet of sequences, i,j, k, the distances are either all equal, or two are equal and the remaining one is smaller 4 3 2 1 A E D B C

  28. Neighbor joining • unlike UPGMA • doesn’t make molecular clock assumption • produces unrooted trees • does assume additivity: distance between pair of leaves is sum of lengths of edges connecting them • like UPGMA, constructs a tree by iteratively joining subtrees • two key differences • how pair of subtrees to be merged is selected on each iteration • how distances are updated after each merge

  29. A B 0.1 0.1 0.1 0.4 0.4 D C • wrong decision to join A and B: need to consider distance of pair to other leaves Picking pairs of nodes to join in NJ • at each step, we pick a pair of nodes to join; should we pick a pair with minimal ? • suppose the real tree looks like this and we’re picking the first pair of nodes to join?

  30. Picking pairs of nodes to join in NJ • to avoid this, pick pair to join based on [Saitou & Nei ’87; Studier & Keppler ’88] where L is the set of leaves

  31. m i k j Updating distances in neighbor joining • given a new internal node k, the distance to another node m is given by:

  32. m i k j Updating distances in neighbor joining • can calculate the distance from a leaf to its parent node in the same way

  33. Updating distances in neighbor joining • we can generalize this so that we take into account the distance to all other leaves where and L is the set of leaves

  34. Neighbor joining algorithm define the treeT = set of leaf nodes L = T while more than two subtrees in T pick the pair i, jin Lwith minimal add to T a new node k joining i and j determine new distances remove iand jfrom L and insert k(treat it like a leaf) join two remaining subtrees,i and jwith edge of length

  35. 3 1 4 2 Testing foradditivity • for every set of four leaves, i, j, k, and l, two of the distances , and must be equal and not less than the third 3 1 4 2 3 3 1 1 4 4 2 2

  36. Rooting trees • finding a root in an unrooted tree is sometimes accomplished by using an outgroup • outgroup: a species known to be more distantly related to remaining species than they are to each other • edge joining the outgroup to the rest of the tree is best candidate for root position outgroup 1 5 candidate root 8 7 4 6 2 3

  37. Rooting trees chimpanzee lice used as outgroup in human lice study

  38. Comments on distance-based methods • if the given distance data is ultrametric (and these distances represent real distances), then UPGMA will identify the correct tree • if the data is additive (and these distances represent real distances), then neighbor joining will identify the correct tree • otherwise, the methods may not recover the correct tree, but they may still be reasonable heuristics • neighbor joining is commonly used

More Related