1 / 24

Phylogeny (part III)

Doug Raiford Lesson 10. Phylogeny (part III). Review. Three methods Have looked at two Distance and parsimony Leaves maximum likelihood. Review distance. UPGMA: hierarchical clustering Start by finding closest two Combine closest pair. A. B. C. D. Review parsimony.

lakia
Télécharger la présentation

Phylogeny (part III)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doug Raiford Lesson 10 Phylogeny (part III) Phylogenetics Part III

  2. Review • Three methods • Have looked at two • Distance and parsimony • Leaves maximum likelihood Phylogenetics Part III

  3. Review distance • UPGMA: hierarchical clustering • Start by finding closest two • Combine closest pair A B C D Phylogenetics Part III

  4. Review parsimony • Looks at each column of an MSA and attempts to find a tree that describes • Builds a consensus tree AGCT AACT AACT AACT A or a G 0 if A 0 if A A A or a G 0 0 if A 1 if A 0 A A A G Phylogenetics Part III

  5. UPGMA and varying rates • Averages distances to combined pair • Doesn’t accurately reflect branch lengths • Can lead to inaccurate trees A B C D Phylogenetics Part III

  6. Other distance methods • Given a tree can calculate path through all branches for any given pair • Know all pair-wise distances (from matrix) • Fitch and Margoliash came up with a way to determine each of the branch lengths (unrooted only) X D XY: A,C,E,G,I A C F E B H G I Y Phylogenetics Part III

  7. Pseudo code • Find two closest taxa • Find average distance from each of these to all other organisms Simultaneous equations Combine closest, and continue A C c a D f d g b B e E Phylogenetics Part III

  8. Maximum likelihood • Similar to parsimony in that performed on each column of MSA • Given the known mutation rates… • All possible trees considered • Examined one column at a time • Probability instead of count • Maximize the probability of a tree match A C c a D f d g b B e E Phylogenetics Part III

  9. Maximum likelihood • Can determine substitution rates from base composition • What substitution rates would result in the base composition staying the same • Overall rate = N/L=rate for single nucleotide • Equilibrium: A↓= C A+G A+T A • N = (A A)fA+(A C)fA+…(T T)fT • Simplifying assumptions • A C = C A • Proportion of each nucleotide stays the same Once again, Simultaneous Equations Phylogenetics Part III

  10. Another view • Better seen in a matrix form • All rates sum to N • Also, A↓= C A+G A+T A Phylogenetics Part III

  11. Given rates… • Tree generated for a column • Each branch represents a substitution • Have a rate • Rate is similar to a probability • In this case rate is not mutations per unit time • Derived from number of mutations divided by number of nucleotides • Can be thought of as “probability of a mutation at any given nucleotide” Rate || Probability Phylogenetics Part III

  12. Probability of a tree • The probability (likelihood) that a tree is the result of the given set of mutations • Product rule: • Multiply the rates of each of the branches A C c a D f d g b B e E Phylogenetics Part III

  13. Combined probability • Generate probability for each column for each tree • Combined probability is the sum of these probabilities atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag Phylogenetics Part III

  14. Adjusting distances • Might not a mutation occur and then revert? • Simple count would not catch • The higher the degree of mutation, the greater the probability • Distances would be slightly greater Must account for reversions Phylogenetics Part III

  15. Jukes and Kantor • Increased distance based upon this probability • K: substitutions per site • p: fraction of nucleotides different between two sequences • .08→.0846 Phylogenetics Part III

  16. Kimura two parameter • Jukes and Kantor assumed a single mutation rate • Transversions less likely than transitions • A and G are purines, T and C are pyrimidines • Mutations that stay within family are more likely • Crossing families called transversion • P: fraction of transitions • Q: fraction of transversions Phylogenetics Part III

  17. When use which? • Page 247 Choose set of related sequences Is their strong sequence similarity? Max Parsimony Obtain MSA Yes Distance Medium similarity (clearly recognizable)? Yes Max Likelihood Phylogenetics Part III

  18. Summary • Branch length related to but not equal to time • MSA’s central due to differing mutation rates • 3 approaches • Distance • Parsimony • Maximum likelihood Choose set of related sequences Is their strong sequence similarity? Max Parsimony Obtain MSA Yes Distance Medium similarity (clearly recognizable)? Yes Max Likelihood Phylogenetics Part III

  19. For non-bioinformaticians • Learned a clustering technique • Hierarchical • Great for exploratory data analysis • Learned some tree growth properties • Exposed to some statistical analysis • Maximum likelihood • Relationship of rates to probabilities Phylogenetics Part III

  20. Phylogenetics Part III

  21. DNA and proteins, • If do rna must take covariance into account Phylogenetics Part III

  22. Gene duplication events • Paralogs can be exploited • Evolve in a coordinated way • Find an organism that is similar but split off before the gene duplication even took place Phylogenetics Part II

  23. Can convert score to dist • D = -log(S) Phylogenetics Part II

  24. Go back to distance • Did simple count divided by num • Can use alignment score (if normalized for length) • What if mutate and then mutate back • Jukes and Cantor • Kimura two param Phylogenetics Part III

More Related