Phylogeny (part III)

Doug Raiford Lesson 10 Phylogeny (part III) Phylogenetics Part III

Review • Three methods • Have looked at two • Distance and parsimony • Leaves maximum likelihood Phylogenetics Part III

Review distance • UPGMA: hierarchical clustering • Start by finding closest two • Combine closest pair A B C D Phylogenetics Part III

Review parsimony • Looks at each column of an MSA and attempts to find a tree that describes • Builds a consensus tree AGCT AACT AACT AACT A or a G 0 if A 0 if A A A or a G 0 0 if A 1 if A 0 A A A G Phylogenetics Part III

UPGMA and varying rates • Averages distances to combined pair • Doesn’t accurately reflect branch lengths • Can lead to inaccurate trees A B C D Phylogenetics Part III

Other distance methods • Given a tree can calculate path through all branches for any given pair • Know all pair-wise distances (from matrix) • Fitch and Margoliash came up with a way to determine each of the branch lengths (unrooted only) X D XY: A,C,E,G,I A C F E B H G I Y Phylogenetics Part III

Pseudo code • Find two closest taxa • Find average distance from each of these to all other organisms Simultaneous equations Combine closest, and continue A C c a D f d g b B e E Phylogenetics Part III

Maximum likelihood • Similar to parsimony in that performed on each column of MSA • Given the known mutation rates… • All possible trees considered • Examined one column at a time • Probability instead of count • Maximize the probability of a tree match A C c a D f d g b B e E Phylogenetics Part III

Maximum likelihood • Can determine substitution rates from base composition • What substitution rates would result in the base composition staying the same • Overall rate = N/L=rate for single nucleotide • Equilibrium: A↓= C A+G A+T A • N = (A A)fA+(A C)fA+…(T T)fT • Simplifying assumptions • A C = C A • Proportion of each nucleotide stays the same Once again, Simultaneous Equations Phylogenetics Part III

Another view • Better seen in a matrix form • All rates sum to N • Also, A↓= C A+G A+T A Phylogenetics Part III

Given rates… • Tree generated for a column • Each branch represents a substitution • Have a rate • Rate is similar to a probability • In this case rate is not mutations per unit time • Derived from number of mutations divided by number of nucleotides • Can be thought of as “probability of a mutation at any given nucleotide” Rate || Probability Phylogenetics Part III

Probability of a tree • The probability (likelihood) that a tree is the result of the given set of mutations • Product rule: • Multiply the rates of each of the branches A C c a D f d g b B e E Phylogenetics Part III

Combined probability • Generate probability for each column for each tree • Combined probability is the sum of these probabilities atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag Phylogenetics Part III

Adjusting distances • Might not a mutation occur and then revert? • Simple count would not catch • The higher the degree of mutation, the greater the probability • Distances would be slightly greater Must account for reversions Phylogenetics Part III

Jukes and Kantor • Increased distance based upon this probability • K: substitutions per site • p: fraction of nucleotides different between two sequences • .08→.0846 Phylogenetics Part III

Kimura two parameter • Jukes and Kantor assumed a single mutation rate • Transversions less likely than transitions • A and G are purines, T and C are pyrimidines • Mutations that stay within family are more likely • Crossing families called transversion • P: fraction of transitions • Q: fraction of transversions Phylogenetics Part III

When use which? • Page 247 Choose set of related sequences Is their strong sequence similarity? Max Parsimony Obtain MSA Yes Distance Medium similarity (clearly recognizable)? Yes Max Likelihood Phylogenetics Part III

Summary • Branch length related to but not equal to time • MSA’s central due to differing mutation rates • 3 approaches • Distance • Parsimony • Maximum likelihood Choose set of related sequences Is their strong sequence similarity? Max Parsimony Obtain MSA Yes Distance Medium similarity (clearly recognizable)? Yes Max Likelihood Phylogenetics Part III

For non-bioinformaticians • Learned a clustering technique • Hierarchical • Great for exploratory data analysis • Learned some tree growth properties • Exposed to some statistical analysis • Maximum likelihood • Relationship of rates to probabilities Phylogenetics Part III

Phylogenetics Part III

DNA and proteins, • If do rna must take covariance into account Phylogenetics Part III

Gene duplication events • Paralogs can be exploited • Evolve in a coordinated way • Find an organism that is similar but split off before the gene duplication even took place Phylogenetics Part II

Can convert score to dist • D = -log(S) Phylogenetics Part II

Go back to distance • Did simple count divided by num • Can use alignment score (if normalized for length) • What if mutate and then mutate back • Jukes and Cantor • Kimura two param Phylogenetics Part III

Phylogeny (part III)

Phylogeny (part III)

Presentation Transcript

Part III

Part III

Part III

Part III

Part III

Part III

Part III

PART III

Part III

PART III

PART III

Part III

Part III

Part III

Part III

PART III

Part III

Part III

Part III

Phylogeny and Systematics (Part 6)

Part III