1 / 56

creativecommons/licenses/by-sa/2.0/

http://creativecommons.org/licenses/by-sa/2.0/. CIS786, Lecture 4. Usman Roshan. Iterated local search: escape local optima by perturbation. Local search. Local optimum. Perturbation. Local search. Output of perturbation. ILS for MP.

sagira
Télécharger la présentation

creativecommons/licenses/by-sa/2.0/

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://creativecommons.org/licenses/by-sa/2.0/

  2. CIS786, Lecture 4 Usman Roshan

  3. Iterated local search: escape local optima by perturbation Local search Local optimum Perturbation Local search Output of perturbation

  4. ILS for MP • We saw that ratchet improves upon iterative improvement • We saw that TNT’s sophisticated and faster implementation outperforms ratchet and PAUP* implementations • But can we do even better?

  5. Disk Covering Methods (DCMs) • DCMs are divide-and-conquer booster methods. They divide the dataset into small subproblems, compute subtrees using a given base method, merge the subtrees, and refine the supertree. • DCMs to date • DCM1: for improving statistical performance of distance-based methods. • DCM2: for improving heuristic search for MP and ML • DCM3: latest, fastest, and best (in accuracy and optimality) DCM

  6. 2. Compute subtrees using a base method 1. Decompose sequences into overlapping subproblems 3. Merge subtrees using the Strict Consensus Merge (SCM) 4. Refine to make the tree binary DCM2 technique for speeding up MP searches

  7. DCM1 and DCM2 decompositions DCM2 decomposition: Getting a smaller number of smaller subproblems speeds up solution DCM1 decomposition : NJ gets better accuracy on small diameter subproblems

  8. Supertree Methods

  9. 1 2 1 2 3 3 2 5 5 4 4 6 2 1 1 6 1 2 1 2 3 3 4 4 7 3 3 4 4 7 Strict Consensus Merger

  10. e f g a b h d f g a c d c f b h g a e g a f e c h b c d b h d e Tree Refinement

  11. The big question Why DCMs? Can DCMs improve upon existing Methods such as neighbor-joining or PAUP* or TNT?

  12. Improving sequence length requirements of NJ • Can DCM1 improve upon NJ? • We examine this question under simulation

  13. DCM1(NJ)

  14. DCM1(NJ)

  15. Computing tree for one threshold

  16. Recall simulation studies

  17. Experimental results • True tree selection (phase II of DCM1) • Uniformly random trees • Birth-death random trees • Sequence length requirements on birth-death random trees

  18. Comparing tree selection techniques

  19. Error rates on uniform random trees

  20. Error as a function of evolutionary rate NJ DCM1-NJ+MP

  21. Sequence length requirements as a function of evolutionary rates 100 taxa, 90% accuracy

  22. Sequence length requirements as a function of evolutionary rates 400 taxa, 90% accuracy

  23. Sequence length requirements as a function of #taxa DCM1-NJ+MP NJ

  24. Conclusion • DCM1-NJ+MP improves upon NJ on large and divergent settings • Why did it work? • Smaller datasets with low evolutionary diameters AND reliable supertree method  accurate subtrees (on subsets)  accurate supertree

  25. Conclusion • DCM1-NJ+MP improves upon NJ on large and divergent settings • Why did it work? • Smaller datasets with low evolutionary diameters AND reliable supertree method  accurate subtrees (on subsets)  accurate supertree • But can we improve upon MP heuristics, particularly on large datasets?

  26. Previously we saw a comparison of DCM components for solving MP • DCM2 better than DCM1 decomposition • SCM better than MRP (in DCM context) • Constrained refinement better than Inferred Ancestral States technique • Higher thresholds take longer but can produce better trees

  27. Comparison of DCM components for solving MP • DCM2 better than DCM1 decomposition • SCM better than MRP (in DCM context) • Constrained refinement better than Inferred Ancestral States technique • Higher thresholds take longer but can produce better trees • Can DCM2 improve over TNT? (TNT is state of the art in solving MP---very fast routines for TBR)

  28. I. Comparison of DCMs (1,322 sequences) Base method is the TNT-ratchet.

  29. I. Comparison of DCMs (1,322 sequences) Base method is the TNT-ratchet.

  30. I. Comparison of DCMs (4583 sequences) Base method is the TNT-ratchet.

  31. I. Comparison of DCMs (4583 sequences) Base method is the TNT-ratchet. DCM2 takes almost 10 hours to produce a tree and is too slow to run on larger datasets.

  32. DCM2 decomposition on 500 rbcL genes (Zilla dataset) • DCM2 decomposition • Blue: separator • Red: subset 1 • Pink: subset 2 • Vizualization produced by • graphviz program---draws • graph according to specified • distances. • Nodes: species in the dataset • Distances: p-distances • (hamming) between the DNAs • Separator is very large • Subsets are very large • Scattered subsets

  33. Doesn’t look anything like this

  34. DCM3 decomposition • DCM2 • Input: distance matrix d, threshold , sequences S • Algorithm: • 1a. Compute a threshold graph G using q and d • 1b. Perform a minimum weight triangulation of G DCM3 • Input: guide-tree T on S, sequences S • Algorithm: • Compute a short quartet graph G using T. The graph G is provably triangulated. • Find separator X in G which minimizes max • where are the connected components of G – X • Output subproblems as . DCM3 advantage: it is faster and produces smaller subproblems than DCM2

  35. DCM3 decomposition - example

  36. Approx centroid-edge DCM3 decomposition – example • Locate the centroid edge e (O(n) time) • Set the closest leaves around e to be the separator (O(n) time) • Remaining leaves in subtrees around e form the subsets (unioned with the separator)

  37. Time to compute DCM3 decompositions • An optimal DCM3 decomposition takes O(n 3) to compute – same as for DCM2 • The centroid edge DCM3 decomposition can be computed in O(n 2) time • An approximate centroid edge decomposition can be computed in O(n ) time (from hereon we assume we are using the approximate centroid edge decomposition)

  38. DCM2 decomposition on 500 rbcL genes (Zilla dataset) • DCM2 decomposition • Blue: separator • Red: subset 1 • Pink: subset 2 • Vizualization produced by • graphviz program---draws • graph according to specified • distances. • Nodes: species in the dataset • Distances: p-distances • (hamming) between the DNAs • Separator is very large • Subsets are very large • Scattered subsets

  39. DCM3 decomposition on 500 rbcL genes (Zilla dataset) • DCM3 decomposition • Blue: separator (and subset) • Red: subset 2 • Pink: subset 3 • Yellow: subset 4 • Vizualization produced by graphviz • program---draws graph according to • specified distances. • Nodes: species in the dataset • Distances: p-distances • (hamming) between the DNAs • Separator is small • Subsets are small • Compact subsets

  40. 0.30 0.25 Average MP 0.20 score above optimal, 0.15 shown as a percentage of the optimal 0.10 0.05 0.00 0 4 8 12 16 20 24 Hours Comparison of DCMs TNT DCM2 DCM3 Rec-DCM3 • Dataset: 4583 actinobacteria ssu rRNA from RDP. Base method is the TNT-ratchet. • DCM2 takes almost 10 hours to produce a tree and is too slow to run on larger datasets. • DCM3 followed by TNT-ratchet doesn’t improve over TNT • Recursive-DCM3 followed by TNT-ratchet doesn’t improve over TNT

  41. Local optimum Cost Global optimum Phylogenetic trees Local optima is a problem

  42. Local optima is a problem Average MP score above optimal, shown as a percentage of the optimal Hours

  43. Iterated local search: escape local optima by perturbation Local search Local optimum Perturbation Local search Output of perturbation

  44. Iterated local search: Recursive-Iterative-DCM3 Local search Local optimum Recursive-DCM3 Local search Output of Recursive-DCM3

  45. TNT DCM2 DCM3 Rec-DCM3 Rec-I-DCM3 0.30 0.25 Average MP 0.20 score above optimal, 0.15 shown as a percentage of the optimal 0.10 0.05 0.00 0 4 8 12 16 20 24 Hours Comparison of DCMs for solving MP Rec-I-DCM3(TNT-ratchet) improves upon unboosted TNT-ratchet

  46. I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.

  47. I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.

  48. I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.

  49. I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet.

  50. I. Comparison of DCMs (13,921 sequences) Base method is the TNT-ratchet. Note the improvement in DCMs as we move from the default to recursion to iteration to recursion+iteration.

More Related