130 likes | 242 Vues
This technical guide by Vipin Sachdeva from IBM's Computational Science Division presents methods for improving R's performance through hardware optimization, compiler selection, and external BLAS libraries. It details the benefits of using different Intel and GNU compilers, showing how compiler flags and optimized libraries like Intel MKL and GotoBLAS2 impact execution speed. The interplay between compiler settings and external dependencies is emphasized, focusing on performance benchmarks and configuration steps necessary for optimal R setup.
E N D
Compiling and Using the “best” R Vipin Sachdeva IBM Computational Science Division
Improving R performance • Performance improvements: • Hardware (Number of cores etc.) • Intel quad-core @2.4 Ghz Intel Q6600 • Compilers • Intel versus GNU • Compiler flags (unoptimized versus optimized) • Libraries (BLAS) • netlib BLAS, GotoBLAS2, Intel MKL, Intel MKL-SMP
Benchmark for R • R-benchmark-25.R • http://r.research.att.com/benchmarks/R-benchmark-25.R • Measures timings for • B= A’ *A, • C = A/B’ • Eigenvalues, Determinant, Cholesky, Inverse (BLAS) • Needs SuppDists package • ./Rscript --vanilla R-benchmark-25.R
Base R • ./configure –prefix=/home/vsachde/R-install Source directory: . Installation directory: /home/vsachde/R-project/all-R/GNU-R/R-native-unoptimized C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: static R library, shared BLAS, R profiling, Java Recommended packages: yes Compiler flags GNU Compilers External libraries being used
Somewhat Optimized R • export optim_flags=“-O3 -funroll-loops -ffast-math -march=core2” • CC="gcc" CFLAGS=$optim_flags CXX="g++" CXXFLAGS=$optim_flags F77="gfortran" FFLAGS=$optim_flags FC="gfortran" FCFLAGS=$optim_flags ./configure –prefix=$installdir C compiler: gcc -std=gnu99 -O3 -funroll-loops -ffast-math -march=core2 Fortran 77 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 C++ compiler: g++ -O3 -funroll-loops -ffast-math -march=core2 Fortran 90/95 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 • Compilers can be changed by variables CC, CXX, F77 • CC=icc CXX=icpc F77=ifort will use Intel compilers.
Linking external BLAS with R • R uses unoptimized routines to do linear algebra if not linked with external BLAS. • ./configure –-with-blas=<location of BLAS lib> • Various sources of BLAS • Netlib BLAS - Generic and unoptimized • GotoBLAS2 – Optimized and multi-threaded • Intel MKL – Optimized library from Intel (sequential) • Intel MKL-SMP (Multi-threaded) • Many others including ACML, Atlas. • Performance of kernels change on different libraries used. Tries to link the BLAS library
Linking external BLAS with R • If everything goes well: Source directory: . Installation directory: /home/vsachde/R-project/all-R/GNU-R/R-netlib-blas C compiler: gcc -std=gnu99 -O3 -funroll-loops -ffast-math -march=core2 Fortran 77 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 C++ compiler: g++ -O3 -funroll-loops -ffast-math -march=core2 Fortran 90/95 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline, BLAS(generic) Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: static R library, R profiling, Java Recommended packages: yes BLAS was linked in properly
Linking external BLAS with R • What does –-with-blas do ? • Link and run R with dgemm. configure:28567: checking for dgemm_ in /home/vsachde/R-project/all-blas/GNU-blas/netlib-blas/libblas_GNU.a configure:28588: gcc -std=gnu99 -o conftest -g -O2 -I/usr/local/include -L/usr/local/lib64 conftest.c /home/vsachde/R-project/all-blas/GNU-blas/netlib-blas/libblas_GNU.a -lgfortran -lm -ldl -lm >&5 configure:28595: result: yes • If the above linking step fails • Installation won’t fail, but BLAS will not be linked in. • Summary at end won’t show external BLAS linking. • Search for dgemm in config.log and look for errors. • Advice: Compile static libraries as they are easier to link
Linking with different BLAS • Netlib-BLAS • Download source from netlib.org, unoptimized. • GotoBLAS2 • Download from TACC website • Optimized and multi-threaded • Turn off CPU throttling to compile. • Intel MKL • Sequential and SMP • Linking step is same for most BLASes except Intel libs
Linking with Intel MKL libs • export MKLPATH=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/ • Intel MKL sequential: --with-blas="-Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread“ • Intel MKL SMP --with-blas="-Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread" Intel MKL SMP and GotoBLAS2 should show performance improvements in quad-core (run 4 threads)
Performance –BLAS Performance went down by 15-20X through compilers, compiler options and hardware (4 threads) Revolution R uses Intel MKL-SMP
Results • Generic R can be optimized for performance. • Intel MKL libraries give best performance results with freely available GotoBLAS2 a close second. • Experiment with LAPACK as well. • Question: How much is performance important for R users ?