1 / 28

Models of Evolution

Models of Evolution. Majid Kazemian. Introduction. Probabilistic Model of Indels Model of an arbitrary distribution of indel lengths (TKF Model) MCALIGN We have seen above models in the course Models of Nucleotide Substitution Jukes Cantor model Kimura model. Phylogeny Tree .

elke
Télécharger la présentation

Models of Evolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Models of Evolution Majid Kazemian

  2. Introduction • Probabilistic Model of Indels • Model of an arbitrary distribution of indel lengths (TKF Model) • MCALIGN • We have seen above models in the course • Models of Nucleotide Substitution • Jukes Cantor model • Kimura model

  3. Phylogeny Tree • Given a set of sequences x0=(x1,x2,…,xn) the goal is to infer Phylogeny tree • Suppose that • n= # of species • T=Topology of the tree • t0 is the edges’ length in the tree • We want to compute pr(x0|T,t0)

  4. t5 x5 t4 x4 t3 t2 t1 x2 x3 x1 A simple example • Suppose we have the following phylogeny tree then: • So to calculate pr(x0|T,t0) we need Pr( x | y, t), the probability that y evolves to x in time t

  5. Substitution • Assume that • Indels do not occur • Each position of sequence evolves independently • Then • Where pr(xj | yj, t) is the probability of a change from “yj” to “xj” in time t Ancestor : y1y2…yL Descendant : x1x2…xL

  6. Substitution Matrix

  7. aj xj yj t1 t2 time t1+t2 t0 t1 The assumption of the model • Multiplicity requirement • S(t1)S(t2)=S(t1+t2) • This requirement will hold if the transition probabilities be stationary and Markovian • Intuitively means that the probability of going from yj to xj just depends on (t2+t1) – t1

  8. Jukes Cantor Model

  9. Jukes Cantor Model (cont.) • In small amount of time ε probability of substitution is linear to time. This means that we can not go from Ai to Aj and go back to Ai. • S(ε)≈ I + Rε

  10. Jukes Cantor Model (cont.) • Is S(t) similar to S(ε)?

  11. Jukes Cantor Model (cont.) • We know that S(t) has the following form (why ?)

  12. Jukes Cantor Model (cont.)

  13. More advanced models • The J-C model made highly “symmetric” assumptions, in its formulation of the rate matrix R • In reality, for example, “transitions” are more common than “transversions” • What are these? Purine = A or G. Pyrimidine = C or T. Transition is substitution in the same category; transversion is substitution across categories • Purines are similarly sized, and pyrimidines are similarly sized. More likely to be replaced by similar sized nucl. • The “Kimura” model captures this transition/transversion bias

  14. Kimura Model • The rate matrix R is given by:

  15. Kimura Model (cont.) • We know that S(t) should look like this (why ?)

  16. Kimura Model (cont.) • Again by solving differential equations (like what we did for JC model) we have

  17. Even More advanced models (cont.) • Get to greater levels of realism • Kimura model still has a uniform stationary distribution, which is not true of real data • One extension: purine to pyrimidine subst. prob. is different from pyrimidine to purine subst. prob. • This leads to a non-uniform stationary probability • The “HKY” model captures this bias

  18. t2 t1 x2 x1 Inferring Phylogeny for two sequences • Let’s back to the original problem, we wanted to compute pr(x0|T,t0) • In the case of two sequences without gap we have Probability of root

  19. A simple example • Suppose that • x1=C C G G C C G C G C G • x2=C G G G C C G G C C G

  20. A simple example (cont.) • Assume JC model • Our goal is to find the tree topology, t1 and t2

  21. A simple example (cont.) • Suppose that n1 is the number of CC and GG pairs and n2 is #CG + #GC pairs • So • If α is known then we can find t1+t2 by simple Maximum Likelihood • α is estimated based on two close species that we assume t1+t2=1

  22. Parent of node i All possible internal node assignments Inferring Phylogeny for n sequences • How to infer topology and t0 for n sequences • How to compute this probability efficiently?

  23. Dynamic Programming • The recursion: probability of all leaves below node k given that residue at k is α • How to estimate (T,t0)? ML estimation? α b c

  24. How to infer topology? • The naïve way is to enumerate all topologies and solve ML estimation for a topology with numerical approaches (like Newtonian method) • This is not good if we have many species • The idea of inferring topology is utilizing a sampling technique

  25. Metropolis Sampling • We have • We must propose rejection and acceptance mechanism to go

  26. Proposal distribution • Accept with following probability

  27. Two comments • We made an independence assumption for column of genome, some region are evolving faster and some slower • We assumed that there is no gap • We need to consider gap (e.g pair HMM)

  28. Reference • Probabilistic Models of Proteins and Nucleic Acids ( by Richard Durbin , Sean R. Eddy , Anders Krogh , Graeme Mitchison) • 8.1 - 8.2 - 8.3 - 8.4 - 8.5

More Related