230 likes | 370 Vues
RNAsim/CRIMSON Algorithm Benchmark Suite. U Penn: Junhyong Kim, Sampath Kannan, Susan Davidson, Steve Fisher, Sheng Guo U Texas : David Hillis, Lauren Meyers, Tracey Heath, Derrick Zwickl NC State: Spencer Muse Florida State: Mark Holder Yale: Paul Turner.
E N D
RNAsim/CRIMSON Algorithm Benchmark Suite U Penn: Junhyong Kim, Sampath Kannan, Susan Davidson, Steve Fisher, Sheng Guo U Texas : David Hillis, Lauren Meyers, Tracey Heath, Derrick Zwickl NC State: Spencer Muse Florida State: Mark Holder Yale: Paul Turner
Goal: Develop validated datasets of sufficient complexity and scale to realistically benchmark latest tree algorithms
Benchmark Infrastructure Model Characterization Simulators Character Evolution Simulators Taxon Sampling Database Tree Topology Simulators Data Subset with Associated Subtree • Others • Tree/Char Combined • Experimental Evolution • Virtual Cell • etc Model Sampling Format Translators RNAsim CRIMSON PAUP*, etc
Benchmark Scheme • Generate a very large dataset (>106 positions) over a very large tree (>106 taxa) using various models of evolution • Store the data in a database • Retrieve subsets of the data by various sampling schemes
RNA macro-evolution simulation (Sheng Guo, Lisan Wang) • Incorporate 2ndary structure constraints, incorporate indels, using a simulator based on edit mutations. A set of edit operators are implemented, such as stem edit, each of which operate on evolving strings with a characteristic wait time. Ancestral molecule is based on known rRNA gene with putative known 2ndary structure. Evolution of the 2ndary structure is tracked. anc delete stem pair change base initiate new stem insert base delete base add stem pair desc
Fixation probability as a function of fitness Parameters: Ne:effective population size :neutral mutation rate s : fitness change Neutral Advantageous(s>0)/Deleterious(s<0) Compensatory Mutation
Calibration on Empirical Data Simulated RNA 100 Eukaryotic ssRNA
Example: Pairwise Similarity of 1000 locally optimal ML trees (MDS plot) Empirical Data RNAsim ROSE SeqGen
1 Million Leaves (Tracey Heath; Birth-Death Model with variable rates)20 Data Replicate Partition Simulated and Stored at SDSC
Crimson Stephen Fisher, Susan Davidson, Junhyong Kim • Facilitates the extraction of sub-trees from very large phylogenetic trees. • Trees loaded into a shared database (Oracle or MySQL) • Extensive tree sampling options • Save query output to NEXUS or phylip files • Include paup commands in query output files • Comprehensive graphical dialogs • Command line interface allowing python-like scripting • Display trees with Walrus 3D Viewer
Query Options • Species Selection • Select All • Random Selection • Select By Temporal Depth • Same number of samples per sub-tree • Weight sampling of sub-trees by number of leaves • Select By Species Level • Same number of samples per sub-tree • Weight sampling of sub-trees by number of leaves • Manual Selection • Sequence Selection • Select All • Random Selection • Manual Selection
Depth Threshold Distribution L-1 L-2 L-3 L-4 L-5 L-6 L-7 L-8
Current Benchmarking Effort • Sample #1 • 10 leaves per sampled tree • Repeat taxon sampling 40 times per replicate data partition • Sample #2 • 100 leaves per sampled tree • Repeat taxon sampling 30 times per replicate data partition • Sample #3 • 1,000 leaves per sampled tree • Repeat taxon sampling 20 times per replicate data partition • Sample #4 • 10,000 leaves per sampled tree • Repeat taxon sampling 10 times per replicate data partition
Algorithms (to be expanded) • Neighbor Joining (paup) • breakties=random • Parsimony (paup) • set maxtrees=200 increase=no • hsearch timelimit=432000 • contree all /strict=no majrule=yes • RAxML (raxmlHPC) • -f a • -# 100 • -m GTRGAMMA
Computational Difficulty of Dataset Versus Accuracy sec hr hr
RAxML Computation Time (Heuristic) Over 30 Random 100-taxon Trees Replicates
Thanks to: Davidson, Susan Fisher, Steve Guo, Sheng Hillis, David Heath, Tracey Wang, Lisan Zhang, Yifeng Zwickl, Derrick Please Ask and Talk to: Steve Fisher Sheng Guo Lisan Wang Please See CRIMSON Demo by Steve Fisher