1 / 1

Compiling R for Performance in Bioinformatics Applications

Compiling R for Performance in Bioinformatics Applications John Garvin, John Mellor-Crummey, Bradley Broom, Ken Kennedy {garvin,johnmc,broom,ken}@cs.rice.edu. The R Language For statistical computations Widely used in bioinformatics Variants: S, S-PLUS Open source (GPL)

bob
Télécharger la présentation

Compiling R for Performance in Bioinformatics Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiling R for Performance in Bioinformatics Applications John Garvin, John Mellor-Crummey, Bradley Broom, Ken Kennedy {garvin,johnmc,broom,ken}@cs.rice.edu • The R Language • For statistical computations • Widely used in bioinformatics • Variants: S, S-PLUS • Open source (GPL) • Similar to Matlab, Mathematica, Octave, Ellpack • Interpreted, high-level • R Advantages • Intuitive programming • Quick turnaround time • Convenient domain-specific libraries • A few lines of R code can replace a page of C code • No CS degree required! • R Disadvantage • Poor performance • Big reason: interpreted, not compiled • No whole program to optimize • Researchers must painstakingly rewrite in C or Fortran for performance • Goal • Turn the R interpreter into an optimizing R compiler • Implement full language features while achieving good performance • Problem • R code from M.D. Anderson Cancer Center • Experiment design problem • 1000-patient trial • Discover when results are meaningful • In R interpreter: matter of minutes • Hand coding in C: matter of seconds • Approach • First: translate R to C • Next: compiler optimizations on C code • Part 1: R to C Compiler • Related to multi-staging (Taha), partial evaluation • Generate C code that performs the same actions as the R interpreter • Reverts to interpreting parse tree when necessary • Goal: implement full R language • Integrate with interpreter infrastructure • Interpreted R code can call compiled and vice versa • With code, optimizations are possible • Part 2: Optimization • Telescoping Languages (Kennedy) • Specialization • Domain-specific libraries • Open64 infrastructure • Useful for specifying transformations • Advanced profiling (Froyd) • Status As of 10/9/2003: • Plain compilation into C: done • Same speed as interpreter • Next: optimizations • Preliminary tests show potential • Future Optimizations • Improve allocation • LISP-like lists • Reduce vector allocation • Type specialization (McCosh) • Matrix size, shape analysis • Slice hoisting (Chauhan) • Combine allocations • Control flow • Interpreter: returns are jumps • More detailed control flow • Variable definition and lookup • Explicit environments • Lookup especially expensive • Use target language • Conclusion • Optimizing compiler for R is possible • Complied R can enable large productivity gains • Acknowledgements • Special thanks to Arun Chauhan, Nathan Froyd, Cheryl McCosh, the people at M.D. Anderson, and Walid Taha Acknowledgements

More Related