1 / 28

Phylogenetic trees

Phylogenetic trees. Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576/ sroy@biostat.wisc.edu Oct 1 st , 2013. Key concepts in this section. What are phylogenies or phylogenetic trees? Terminology such as extant, ancestral, branch point, branch length, orthologs , paralogs , taxon

saber
Télécharger la présentation

Phylogenetic trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetic trees Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576/ sroy@biostat.wisc.edu Oct 1st, 2013

  2. Key concepts in this section • What are phylogenies or phylogenetic trees? • Terminology such as extant, ancestral, branch point, branch length, orthologs, paralogs, taxon • Why build phylogenetic trees? • How to build phylogenetic trees? • Distance-based methods • Parsimony methods • Minimize the number of changes • Probabilistic methods • Find the tree that best explains the data using probabilistic models

  3. What are phylogenetic trees? • A tree that describes evolutionary relationships among entities • Species, genes, strains • Leaves represent extant entities • Internal nodes represent ancestral species • Such a tree is inferred from observations in existing organisms.

  4. Tree of life aims to represents the phylogeny of all species on earth From http://tellapallet.com/tree_of_life.htm

  5. Phylogenetic tree of 29 mammals Lindbald-Toh et al, 2011, Nature

  6. Why phylogenetic trees? • Understand how organisms are related • Do humans and chimpanzees share a common ancestor or do humans and gorillas? • Ask how closely organisms are related • Humans and chimpanzees shard a common ancestor 5mya • Provide insight into the evolutionary history of species • How specific functions have evolved • Language evolution • Identify signatures of conservation of sequence • Conjecture the fate of specific regions of the genome • Will the human Y disappear? • Inform multiple sequence alignments

  7. Orthologs and paralogs • Orthologs: • Two sequences in two species that have a a common ancestor • Diverged due to a speciation event • Used to create a “species tree” • Paralogs: • Two sequences in the same species that arose from a gene duplication event • Captured in a “gene tree”.

  8. Phylogenetic tree basics • Leaves represent things (genes, species, individuals/strains) being compared • the term taxon (taxa plural) is used to refer to these when they represent species and broader classifications of organisms • For example if taxa are species, the tree is a species tree • Internal nodes are hypothetical ancestral units • Phylogenetic trees can be rooted or unrooted • the root represents the common ancestor • In a rooted tree, path from root to a node represents an evolutionary path • Gives directionality to evolutionary time • An unrooted tree specifies relationships amongtaxa, but not from an ancestor

  9. Tree basics Internal node: Ancestral 2 5 9 8 8 6 Branch length 7 6 7 Branch 1 3 5 4 1 2 4 3 Leaf node: Extant Unrooted tree Rooted tree Each tree topology represents a different evolutionary history For a species tree, internal nodes represent speciation events

  10. Internal nodes represent ancestral species Tree of Life project (http://tolweb.org/tree/)

  11. Rooting a tree • An unrooted tree can be converted to a rooted tree using an outgroup species • Outgroup: a species known to be more distantly related all the species than each of the species themselves • Find the branch where the outgroup is selected to be added • That gives the root

  12. Tree counting • A rooted tree with n leaf nodes has • n-1 internal nodes • 2n-2 edges/branches • An unrooted tree with n leaf nodes has • n-2 internal nodes • 2n-3 edges/branches • A root can be added to any of these branches to give 2n-3 rooted trees for any unrooted tree • For three taxa there is one unrooted tree and three rooted trees

  13. Tree counting 1 1 3 3 1 3 2 2 2 1 An unrooted tree 3 3 2 2 1 1 3 2 3 1 2 Possible positions for root Rooted trees

  14. Tree counting • Instead of adding a root we could add a branch for the n+1thtaxon 4 1 1 1 3 3 3 2 2 2 1 4 1 3 2 3 2 1 1 3 3 2 2 4

  15. Tree counting • With four nodes, we have five branches • Each of the branches can give rise to five trees of six nodes • Thus we have 3*5 trees • In general for n nodes we can have • (1)(3)(5)..(2n-5) unrooted trees

  16. Constructing phylogenetic trees • Three types of methods • Distance based methods • Parsimony methods • Probabilistic approaches • Most methods start with pairwise distance methods • We have already seen one method!

  17. Methods for phylogenetic tree reconstruction • Distance-based methods • UPGMA • Neighbor joining • Assume additivity and sometimes a “molecular clock” • Additivity means we can add up the branch lengths of the tree connecting two nodes and get their distances. • Alignment-based methods • Parsimony • Probabilistic

  18. Defining distance between sequences • Fractional alignment difference for two sequences i and j • pij = mij/Lij • Gives an estimate of changes per site • mij: Number of mismatches • Assumes that changes have happened only once • Underestimates the distance between sequences • Assumes all sequences change at the same rate • Jukes Cantor distance • The simplest, evolutionary distance

  19. UPGMA relies on the molecular clock assumption • Sequences diverge at the same rate at different points in the phylogeny • Distance from any leaf to root is the same. • If this is true the data is said to be ultrametric

  20. The molecular clock assumption &ultrametric data • Ultrametricdata: for any triplet of sequences, i,j, k, the distances are either all equal, or two are equal and the remaining one is smaller 4 3 2 1 A E D B C

  21. Problems with the molecular clock assumption 3 2 2 3 4 1 4 1 Constructed by UPGMA Actual tree

  22. Neighbor joining • The ultra-metric property is too strong • Most sequences diverge at different rates • A more relaxed requirement is that of additivity • Distance between a pair of species/nodes is equal to the sum of the branch lengths • Uses a similar idea to construct trees as UPGMA • That is consider pairs of nodes and joins them • Produces unrooted trees

  23. A B 0.1 0.1 0.1 0.4 0.4 D C How to select nodes for merging? • Given all pairwise distances for n sequences • dij denote the distance between node i and j • Should we select node pairs with the smallest dij? Should we merge A and B?

  24. Need to correct for long branches L: current set of leaves ri : Average distance from other nodes

  25. Defining the distance to a new node dkm? i m k j New node Given dij, dim, djm, how to calculate distance to new node k?

  26. Algorithm for NJ • Initialization • T be set the of leaf nodes • L = T • Estimate ri for all i in L • Estimate Dij • Iteration • Pick a pair i, j from L such that Dij is smallest • Define new node k • Estimate dik, djk, add edge between kand i, and between k toj • Add k to T, remove i andj from L • Estimate Dmn for all nodes m,n in L • Terminate • If L has two nodes, add the edge between these two.

  27. An example with neighbor joining • Consider 5 sequences: A, B, C, D, E • Distance matrix • What is the tree inferred by the Neighbor joining algorithm? B C D E A B C D E

  28. Can we check for additivity? Check for additivity: For four leaves, i, j, k, l and the distances dij, dik, dil, djk, djl, dkl j i l k The three sums of two distances j i j j i i l l l k k k Should be such that two of these are equal, and larger than the third.

More Related