140 likes | 238 Vues
Join the collaborative effort of top experts in algorithms and research to advance methods in phylogenetic analysis, genome rearrangements, and evolutionary biology. Explore fast converging methods, reticulate evolution, and optimizing NP-hard problems.
E N D
Algorithms research Tandy Warnow UT-Austin
“Algorithms group” • UT-Austin: Warnow, Hunt • UCB: Rao, Karp, Papadimitriou, Russell, Myers • UCSD: Huelsenbeck • UNM: Moret, Bader, Williams • External participants: Mossel (UCB), Huson (Germany), Steel (NZ), and others
Main research foci • Solving maximum parsimony and maximum likelihood more effectively • “Fast converging methods” • Gene order and content phylogeny • Reticulate evolution • Multiple sequence alignment at the genomic level
GRAPPA (Genome Rearrangement Analysis under Parsimony and other Phylogenetic Algorithms) http://www.cs.unm.edu/~moret/GRAPPA/ • Heuristics for NP-hard optimization problems • Fast polynomial time distance-based methods • Contributors: U. New Mexico,U. Texas at Austin, Universitá di Bologna, Italy • Poster: Jijun Tang
A A D D B B 3 3 Total length = 18 6 C C E F 4 2 Maximum Parsimony on Rearranged Genomes (MPRG) • The leaves are rearranged genomes. • Find the tree that minimizes the total number of rearrangement events
Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)
Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor)
Benchmark gene order dataset: Campanulaceae • 12 genomes + 1 outgroup (Tobacco), 105 gene segments • NP-hard optimization problems: breakpoint and inversion phylogenies 1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.) 2000: Using GRAPPA v1.1 on the 512-processor Los Lobos Supercluster machine: 2 minutes (200,000-fold speedup per processor) 2003: Using latest version of GRAPPA: 2 minutes on a single processor (1-billion-fold speedup per processor)
Reticulate Evolution • Group leader: Randy Linder • Software: (1) producing random networks, (2) simulating sequences down networks, (3) performance evaluation of methods (4) inferring reticulate networks • Current reconstruction methods limited to one reticulation event • Poster: Luay Nakhleh
MP/ML heuristics • Disk-Covering Methods (DCMs): Divide-and-conquer strategies that boosting the performance of base methods for MP/ML (Warnow) • Mr Bayes (Huelsenbeck) • New I-DCM3 technique improves upon the Ratchet and TBR • Poster: Usman Roshan (DCM-MP)
Gutell dataset: 854 rRNA sequences Iterative-DCM3 trials find trees of MP score 103210 in 30 hours, whereas ratchet500 trials take 45 hours to find trees of same score
Other planned projects (partial list) • Multiple Sequence Alignment (Myers and Williams) • Steiner Tree algorithms - error bounds and new heuristics (Rao) • MCMC methods (Russell and Huelsenbeck) • Symbolic representation of data (Hunt) • Parallel algorithms (Bader and Williams)
Questions for group • How should we measure performance? • How should we use simulated data? • How should we use real datasets? • How can we study criteria (MP, ML, etc.) as opposed to methods? • Should we sponsor DIMACS-style challenges? • Others? (please bring questions, comments, answers, to the break-out session)