1 / 58

Flowers, Bees, and Algorithms: Adventures in Cophylogenetics

Flowers, Bees, and Algorithms: Adventures in Cophylogenetics. Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College Joint work with: Mike Charleston (Univ. of Sydney) Chris Conow (USC) Ben Cousins (Clemson) Daniel Fielder (HMC) John Peebles (HMC)

erol
Télécharger la présentation

Flowers, Bees, and Algorithms: Adventures in Cophylogenetics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flowers, Bees, and Algorithms: Adventures in Cophylogenetics Ran Libeskind-Hadas Department of Computer Science Harvey Mudd College Joint work with: Mike Charleston (Univ. of Sydney) Chris Conow (USC) Ben Cousins (Clemson) Daniel Fielder (HMC) John Peebles (HMC) Tselil Schramm (HMC) AnakYodpinyanee (HMC)

  2. Integrated CS/Bio Course Send e-mail to: ran@cs.hmc.edu

  3. Overview • A 75-minute “research lecture” to first-year students in our CS/Bio intro course • Show first-year students that what they’ve learned is relevant to current research • Showcase research done with senior students • What have they have done so far? • Biology: Genes, alignment, phylogenetic trees, RNA folding • CS: Programming, recursion, “memoization”

  4. Specifically… • Pairwise global alignment and RNA folding • Why you should care • Designed and implemented recursive solutions • Why are they slow? • How do we make them faster? • “Memoization” idea • Wow, that’s fast! (but no actual analysis yet) • Designed and implemented “memoized” versions • Used their implementations to investigate questions Around 10 lines of Python code!

  5. Specifically… • Phylogenetic trees • Why you should care • Implemented simple algorithm (e.g. UPGMA) • Used their implementation to answer questions… • Existence and relative merits of other algorithms (mention maximum likelihood… but it’s slow!)

  6. A 75-minute lecture in 30 minutes (or less)

  7. Actual 75-minute lecture starts here! (Also a chapter in new B4B) Cophylogenetics “ I can understand how a flower and a bee might slowly become, either simultaneously or one after the other, modified and adapted in the most perfect manner to each other, by the continued preservation of individuals presenting mutual and slightly favourable deviations of structure.” Charles Darwin, The Origin of Species

  8. Obligate Mutualism of Figs and Fig Wasps ovipostor From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004

  9. The Cophylogeny Problem From Hafner MS and Nadler SA, Phylogenetic trees support the coevolution of parasites and their hosts.Nature 1988, 332:258-259

  10. Indigobirds and Finches • High level of host specificity (e.g. mouth markings) www.indigobirds.com

  11. The Question… Given a host tree, parasite tree, and tip mapping, what is the most plausible mapping between the trees and is it suggestive of coevolution? This seems to be a “hard” problem!

  12. Measuring the “Hardness” of Computational Problems There are three kinds of problems… Easy Hard Impossible!

  13. “Easy” Problems Sorting a list of n numbers: [42, 3, 17, 26, … , 100] Multiplying two nxnmatrices: ( ) ( ) ( ) 3 5 2 7 1 6 8 9 2 4 6 10 9 3 2 12 1 5 5 4 5 12 8 6 7 6 1 5 9 23 5 8 n = n n n n

  14. Global Alignment is “easy”! • Reminder of 2n running time of alignment • Informally motivate n2 running time of memoized version

  15. “Hard” Problems Snowplows of Northern Minnesota Burrsburg Frostbite City Tundratown Shiversville Freezeapolis

  16. “Hard” Problems Snowplows of Northern Minnesota Burrsburg Frostbite City Tundratown Shiversville Freezeapolis Brute-force? Greed?

  17. n2 versus 2n Ran-O-Matic The Ran-O-Matic performs 109 operations/sec n = 10 n = 30 n = 50 n = 70 n2 2n 100 < 1 sec 900 < 1 sec 2500 < 1 sec 4900 < 1 sec 1024 < 1 sec 109 1 sec

  18. n2 versus 2n Ran-O-Matic The Ran-O-Matic performs 109 operations/sec n = 10 n = 30 n = 50 n = 70 n2 2n 100 < 1 sec 900 < 1 sec 2500 < 1 sec 4900 < 1 sec 1024 < 1 sec 109 1 sec 1015 13 days

  19. n2 versus 2n Ran-O-Matic The Ran-O-Matic performs 109 operations/sec n = 10 n = 30 n = 50 n = 70 n2 2n 100 < 1 sec 900 < 1 sec 2500 < 1 sec 4900 < 1 sec 1021 37 trillion years 1024 < 1 sec 109 1 sec 1015 13 days

  20. n2 versus 2n Ran-O-Matic The Ran-O-Matic performs 109 operations/sec n = 10 n = 30 n = 50 n = 70 n2 2n 100 < 1 sec 900 < 1 sec 2500 < 1 sec 4900 < 1 sec 1021 37 trillion years 1024 < 1 sec 109 1 sec 1015 13 days Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years ->

  21. n2 versus 2n Ran-O-Matic The Ran-O-Matic performs 109 operations/sec n = 10 n = 30 n = 50 n = 70 n2 2n 100 < 1 sec 900 < 1 sec 2500 < 1 sec 4900 < 1 sec 1021 37 trillion years 1024 < 1 sec 109 1 sec 1015 13 days Computers double in speed every 2 years. Let’s just wait 10 years! 37 trillion years -> 37 billion years!

  22. Snowplows and Travelling Salesperson Revisited! Tens of thousands of other known problems go in this cloud!! Travelling Salesperson Problem Snowplow Problem Protein Folding Multiple sequence alignment NP-complete problems Phylogenetic trees by maximum likelihood

  23. “I can’t find an efficient algorithm. I guess I’m too dumb.” Cartoon courtesy of “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

  24. “I can’t find an efficient algorithm because no such algorithm is possible!” Cartoon courtesy of “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

  25. “I can’t find an efficient algorithm, but neither can all these famous people.” Cartoon courtesy of “Computers and Intractability: A Guide to the Theory of NP-completeness” by M. Garey and D. Johnson

  26. $1 million Vinay Deolalikar

  27. Coping with NP-completeness… • Brute force • Ad hoc Heuristics • Meta heuristics • Approximation algorithms

  28. Obligate Mutualism of Figs and Fig Wasps ovipostor From Cophylogeny of the Ficus Microcosm, A. Jackson, 2004

  29. The Cophylogeny Problem… Host tree Parasite tree e d a b c

  30. The Cophylogeny Problem Host tree Parasite tree e d a b c Tips associations

  31. Input Possible Solutions e e d d a a b c b c

  32. Event Cost Modelcospeciation e e d cospeciation cospeciation d a a b c b c

  33. Event Cost Modelduplication e duplication e d d a a b c b c

  34. Event Cost Modelhost-switch e e d host-switch d a a b c b c

  35. Event Cost Modelloss e e d loss loss loss loss d a a b c b c

  36. Event Cost Model Cost = cospeciation + host-switch + loss Cost = duplication + cospeciation + 3 * loss e duplication e d cospeciation cospeciation loss loss loss loss host-switch d a a b c b c

  37. Some typical costs Cost = 8 Cost = 5 e duplication + 2 e d cospeciation cospeciation loss + 0 + 0 + 2 loss loss loss host-switch + 2 + 2 + 2 + 3 d a a b c b c

  38. This problem is hard! • How hard? NP-complete! (Joint work with Charleston, Ovadia, Conow, Fielder) • The host-switches are the culprits h f e g

  39. Existing Methods

  40. A Metaheuristic Approach • Fix a timing • We can solve the problem optimally for a given timing using Dynamic Programming (Memoization)

  41. Dynamic Programming Compute Cost[a,su,2] parasite a r t = 0 b s t = 1 c a t t = 2 t = 3 u t = 4 v w x y

  42. Dynamic Programming Compute Cost[a,su,2] parasite a r t = 0 b s t = 1 c a t t = 2 b t = 3 u Cost[b,tw,3] c t = 4 v w x y Cost[c,y,4]

  43. Dynamic Programming Compute Cost[a,su,2] parasite a r t = 0 b s t = 1 c a t t = 2 host-switch loss b t = 3 loss u Cost[b,tw,3] c t = 4 v w x y Cost[c,y,4]

  44. Dynamic Programming Candidate for Cost[a,su,2]: Cost[b, tw, 3] + Cost[c, uy, 4] + 2 * loss + host-switch r t = 0 s t = 1 a t t = 2 host-switch loss b t = 3 loss u Cost[b,tw,3] c t = 4 v w x y Cost[c,y,4]

  45. Dynamic Programming Running Time • O(n3) cells to fill in • O(n2) positions for first child • O(n2) positions for second child • O(n) to count #losses from each child, but this is precomputable O(n3 x (n2 x n2)) = O(n7) total

  46. Dynamic Programming Running Time • O(n3) cells to fill in • O(n2) positions for first child • O(n2) positions for second child • O(n) to count #losses from each child, but this is precomputable O(n3x (n2xn2)) = O(n7) total Can be improved to O(n3)

  47. Genetic Algorithm

  48. Existing Software

  49. The Fig/Wasp Challenge

  50. Results

More Related