1 / 21

Additive Distances Between DNA S equences

Additive Distances Between DNA S equences. 1. 3. 2. C. G. T. A. A C. C C. 1. Additive Evolutionary distance : The number of substitutions which occurred during the sequence evolution. substitutions. site 1. site 2. site 3. 0.

sonja
Télécharger la présentation

Additive Distances Between DNA S equences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Additive DistancesBetween DNA Sequences MPI, June 2012

  2. 1 3 2 C G T A A C C C 1 Additive Evolutionary distance :The number of substitutions which occurred during the sequence evolution substitutions site 1 site 2 site 3 0 Some substitutions are hidden, due to overwriting. Therefore, the exact number of subst. is usually larger than the number of observed changes.

  3. Edge weight = Expected number of substit’s per site u Number of substitutions per site 0.321 v MPI, June 2012

  4. Interleaf distances: sum of edge weights v u d(u,v) = 1.12 0.3 0.5 0.42 When the exact number of substitutions between any two sequences is known, NJ (and any other algorithm which reconstructs trees from the exact distances) returns the correct evolutionary tree.

  5. Estimating# of substitutionsfrom observed substitutionsrequiresSubstitution Model JC [Jukes Cantor 1969] Kimura 2 Parameter (K2P) [Kimura 1980] HKY [Hasegawa, Kishino and Yano 1985] TN [Tamura and Nei 1993] GTR: Generalised time-reversible [Tavaré 1986] …and more…

  6. Distance estimation in theJukes Cantormodel

  7. Jukes Cantor model:All substitutions are equally like JC generic rate matrix tis the expected # of substitutions per site u tuv Ruv = v

  8. Rate Matrix R R = (Theory of Markov Processes) Substitution Matrix P P =

  9. JC distance estimation:First estimate the substitution matrix anEstimationof Puv From observed substit’s

  10. Estimatet from estimation of p(t)by “reverse engineering” Solve the formula for p(t)

  11. Checking the effectof estimation-errorsin Reconstructing Quartets

  12. Quartets Reconstruction = Finding the correct split Quartets are trees with four leaves. They have three possible (fully resolved) topologies, called splits: A C A B A C B D C D B D Distance methods resolves splits by the 4 point method

  13. wsep The 4 points method A C B D The 4-point condition: The 4-point condition for estimated distances:

  14. Evaluate the accuracy ofreconstructing quartetsusing evolutionary distances root t is “evolutionary time” The diameter of the quartet is 22t D A C B

  15. Phase A: simulate evolution D A C B

  16. ç ÷ ç ÷ Apply the 4p condition. Is the recontruction correct? ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ ç ÷ è ø D C A B Phase B: reconstruct the split by the 4p condition compute distances between sequences, Repeat this process 10,000 times, count number of failures

  17. This test was applied on the model quartet with various diameters … … • For each diameter, mark the fraction (percentage) of the simulations in which the reconstruction failed (next slide)

  18. Performance of K2P distances in resolving quartets, small diameters: 0.01-0.2 Template quartet

  19. “site saturation” Performance for larger diameters

  20. Repeat this experiment on the Hasegawa tree • Assume the JC model. • Reconstruct by the NJ algorithm (use any variants of NJ available in MATLAB)

  21. Hasegawa Tree

More Related