1 / 24

Species Trees & Constraint Programming

Species Trees & Constraint Programming. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine). The Tree of Life. A central goal of systematics construct the tree of life a tree that represents the relationship between all living things including constraint programmers

brilliant
Télécharger la présentation

Species Trees & Constraint Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Species Trees & Constraint Programming

  2. Ongoing work with Ian Gent, Barbara Smith, Wu Wei (Christine)

  3. TheTree of Life • A central goal of systematics • construct the tree of life • a tree that represents the relationship between all living things • including constraint programmers • The leaf nodes of the tree are species • The interior nodes are hypothesized species • extinct, where species diverged

  4. Properties of a Species Tree • We have a set of leaf nodes, each labelled with a species • the interior nodes have no labels • each interior node has 2 children and one parent • except the root (it has no parent) • if we have n leaf nodes we then have n  1 interior nodes • it is a bifurcating tree

  5. Super Trees • We are given two trees, T1 and T2 • T1 has leaf set S1 and S2 has leaf set • remember, leaves are species! • But S1 and S2 have a non-empty intersection • why? How can that happen? • We want to combine T1 and T2 • so, why is that a problem?

  6. c a b Most Recent Common Ancestors (mrca) We have 3 species, a, b, and c • mrca(a,b) mrca(a,c) • mrca(a,b)  mrca(b,c) • mrca(a,c)  mrca(b,c) Species a and b are more closely related to each other than they are to c The most recent common ancestor of a and b is further from the root than the most recent common ancestor of a and c (and b and c)

  7. c d a b b c Triples (and Fans) Species trees are frequently presented as a set of triples (and fans)

  8. c d d a b b c c a b Triples (and Fans)

  9. BreakUp & OneTree (circa 1996) Algorithm breakUp takes a species tree and produces a set of rooted triples R that define that tree. Algorithm OneTree takes a set of species and a set of rooted triples, and builds a tree that respects those triples, or reports that no tree exists (in polytime) OneTree is a specialisation of Build, an algorithm proposed by Aho, Sagiv, Szymanski, and Ulman in 1981

  10. The Flavour of OneTree • Given a set of species S and rooted triples R • produce a node N • construct a graph G • with vertices in S • and edge (x,y) if triple xy|z is in R • if G is a single component fail • else recursively build • on the left with one component • with S’ and R’ (the set of species and triples in that component) • on the right, with the other components

  11. a b d c d a b c The Flavour of OneTree

  12. Min-cut Super Trees • What happens if OneTree fails? • Gives us the best you can • by breaking some triples (resulting in fans) • by excluding some species • There are polytime algorithms for this • but they are greedy and biased

  13. Constraint Programming solutions to building a species tree from a set of rooted triples

  14. A naïve constraint encoding (footnotes 756, 789, 794, 796) • n-1 variables as interior nodes • v[i] = j  parent(v[i]) = v[j] • no loops/cycles • Barbara used set variables (ILOG) • Patrick used specialised constraint (Chco) • Francois then encoded set variables! • n variables as leaf nodes • each takes a value respecting triples • I am sparing you (and me) the details

  15. Why was this a naïve constraint encoding? • It produced the right number of trees when no triples • the Catalan number • symmetry breaking • It would produce a tree if one existed • A 2 stage process • (1) build a tree from the interior nodes • there are Catalan many of these • (2) given an “interior tree” place the leaf nodes • there are n! ways to do this • if step (2) fails generate the next interior tree in (1) Yikes! That’s expensive. Imagine {ab|c,bc|d,cd|a}

  16. Ultrametric Trees & Species Trees (footnotes 803,804,805,810,819) What is an ultrametric tree? • We are given a 2d symmetric matrix D • D[i][j] is the time of divergence of species i and j. • D[i,j] is the the mrca(i,j) labeled with time of divergence • D[i,j] is the value of mrca(i,j) • Build a bifurcating tree • n leaves and n - 1 interior nodes • interior nodes labeled with entries from D • any path from the root is a strictly decreasing sequence

  17. Ultrametric Trees: here’s one I (well, Dan Gusfield actually ) prepared earlier 8 5 3 3 D B C A E Note: if the sequence increases, we have min-ultrametric tree

  18. Ultrametric Matrix: necessary & sufficient conditions • cannot have more than n - 1 distinct values • because there are n - 1 interior nodes • For every 3 indices i,j,k • there is a tie for the maximum between D[i,j], D[i,k], D[j,k] Given an ultrametric matrix, an ultrametric tree can be constructed in O(n2) … see Dan Gusfield’s book “Algorithms on Strings, Trees, and Sequences”

  19. A CP encoding of D • We have a 2 dimensional matrix of constrained integer cvariables D • We must ensure that for any i,j,k the following holds Think isosceles triangles, allowing equilateral An ultrametric space, composed of isosceles triangles

  20. A CP encoding of D Any instantiation of the variables in D is now guaranteed to be min-ultrametric We get Catalan number of min-ultrametric solutions

  21. k i j How can we exploit this? • We are given triples and fans, but not distances! • But we can consider a triple ij|k as a constraint This over-rides the disjunctions posted across the matrix Note: our tree is min-ultrametric!

  22. The CP encoding (contd) • we have the “blanket” disjunctive constraint to ensure min-ultrametric • triples are constraints that break the disjunctions • a solution (if one exists) is min-ultrametric respecting triples • we can then produce tree from the matrix, as a post process • NOTE: we need a pre-process to break up trees into triples

  23. So where are we? • Good question: • we have not yet tried real data • we have a number of different micro-encodings • Are we in P for decision? • Not sure yet • How about optimisation? • We can see a way, by introducing penalties • Wu Wei is coding up BreakUp and OneTree • so we have something real to compare with • We need real data to check this out • I need to get funding for this • write a grant proposal with DRG I think!

  24. Questions?

More Related