10 likes | 142 Vues
This paper discusses the development of an optimizing R compiler aimed at improving performance for bioinformatics applications. While R is popular for statistical analysis due to its intuitive programming and rich libraries, it suffers from performance limitations since it is an interpreted language. This work addresses these challenges by translating R code into C, applying compiler optimizations, and maintaining full language features. Through case studies like the M.D. Anderson Cancer Center's trial data, the authors demonstrate that significant performance improvements can be achieved, allowing researchers to benefit from fast execution while retaining R's ease of use.
E N D
Compiling R for Performance in Bioinformatics Applications John Garvin, John Mellor-Crummey, Bradley Broom, Ken Kennedy {garvin,johnmc,broom,ken}@cs.rice.edu • The R Language • For statistical computations • Widely used in bioinformatics • Variants: S, S-PLUS • Open source (GPL) • Similar to Matlab, Mathematica, Octave, Ellpack • Interpreted, high-level • R Advantages • Intuitive programming • Quick turnaround time • Convenient domain-specific libraries • A few lines of R code can replace a page of C code • No CS degree required! • R Disadvantage • Poor performance • Big reason: interpreted, not compiled • No whole program to optimize • Researchers must painstakingly rewrite in C or Fortran for performance • Goal • Turn the R interpreter into an optimizing R compiler • Implement full language features while achieving good performance • Problem • R code from M.D. Anderson Cancer Center • Experiment design problem • 1000-patient trial • Discover when results are meaningful • In R interpreter: matter of minutes • Hand coding in C: matter of seconds • Approach • First: translate R to C • Next: compiler optimizations on C code • Part 1: R to C Compiler • Related to multi-staging (Taha), partial evaluation • Generate C code that performs the same actions as the R interpreter • Reverts to interpreting parse tree when necessary • Goal: implement full R language • Integrate with interpreter infrastructure • Interpreted R code can call compiled and vice versa • With code, optimizations are possible • Part 2: Optimization • Telescoping Languages (Kennedy) • Specialization • Domain-specific libraries • Open64 infrastructure • Useful for specifying transformations • Advanced profiling (Froyd) • Status As of 10/9/2003: • Plain compilation into C: done • Same speed as interpreter • Next: optimizations • Preliminary tests show potential • Future Optimizations • Improve allocation • LISP-like lists • Reduce vector allocation • Type specialization (McCosh) • Matrix size, shape analysis • Slice hoisting (Chauhan) • Combine allocations • Control flow • Interpreter: returns are jumps • More detailed control flow • Variable definition and lookup • Explicit environments • Lookup especially expensive • Use target language • Conclusion • Optimizing compiler for R is possible • Complied R can enable large productivity gains • Acknowledgements • Special thanks to Arun Chauhan, Nathan Froyd, Cheryl McCosh, the people at M.D. Anderson, and Walid Taha Acknowledgements