1 / 14

Algorithms research

Algorithms research. Tandy Warnow UT-Austin. “Algorithms group”. UT-Austin: Warnow, Hunt UCB: Rao, Karp, Papadimitriou, Russell, Myers UCSD: Huelsenbeck UNM: Moret, Bader, Williams External participants: Mossel (UCB), Huson (Germany), Steel (NZ), and others. Main research foci.

primo
Télécharger la présentation

Algorithms research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms research Tandy Warnow UT-Austin

  2. “Algorithms group” • UT-Austin: Warnow, Hunt • UCB: Rao, Karp, Papadimitriou, Russell, Myers • UCSD: Huelsenbeck • UNM: Moret, Bader, Williams • External participants: Mossel (UCB), Huson (Germany), Steel (NZ), and others

  3. Main research foci • Solving maximum parsimony and maximum likelihood more effectively • “Fast converging methods” • Gene order and content phylogeny • Reticulate evolution • Multiple sequence alignment at the genomic level

  4. GRAPPA (Genome Rearrangement Analysis under Parsimony and other Phylogenetic Algorithms) http://www.cs.unm.edu/~moret/GRAPPA/ • Heuristics for NP-hard optimization problems • Fast polynomial time distance-based methods • Contributors: U. New Mexico,U. Texas at Austin, Universitá di Bologna, Italy • Poster: Jijun Tang

  5. A A D D B B 3 3 Total length = 18 6 C C E F 4 2 Maximum Parsimony on Rearranged Genomes (MPRG) • The leaves are rearranged genomes. • Find the tree that minimizes the total number of rearrangement events

  6. Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)

  7. Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor)

  8. Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor) 2003: Using latest version of GRAPPA: 2 minutes on a single processor (1-billion-fold speedup per processor)

  9. Reticulate Evolution • Group leader: Randy Linder • Software: (1) producing random networks, (2) simulating sequences down networks, (3) performance evaluation of methods (4) inferring reticulate networks • Current reconstruction methods limited to one reticulation event • Poster: Luay Nakhleh

  10. 20-taxon 1-hybrid network. 0.1 scaling factor.

  11. MP/ML heuristics • Disk-Covering Methods (DCMs): Divide-and-conquer strategies that boosting the performance of base methods for MP/ML (Warnow) • Mr Bayes (Huelsenbeck) • New I-DCM3 technique improves upon the Ratchet and TBR • Poster: Usman Roshan (DCM-MP)

  12. Gutell dataset: 854 rRNA sequences Iterative-DCM3 trials find trees of MP score 103210 in 30 hours, whereas ratchet500 trials take 45 hours to find trees of same score

  13. Other planned projects (partial list) • Multiple Sequence Alignment (Myers and Williams) • Steiner Tree algorithms - error bounds and new heuristics (Rao) • MCMC methods (Russell and Huelsenbeck) • Symbolic representation of data (Hunt) • Parallel algorithms (Bader and Williams)

  14. Questions for group • How should we measure performance? • How should we use simulated data? • How should we use real datasets? • How can we study criteria (MP, ML, etc.) as opposed to methods? • Should we sponsor DIMACS-style challenges? • Others? (please bring questions, comments, answers, to the break-out session)

More Related