120 likes | 226 Vues
This session, led by Dr. Robert Sinkovits, explores how to effectively combine MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) for high-performance parallel computing on the Blue Horizon supercomputer. We will cover crucial topics such as compiling, linking, and utilizing available compilers, as well as discussing optimal options and showcasing example programs. Attendees will learn best practices for managing processor allocations and task parallelism, ensuring efficient utilization of multi-core hardware resources crucial for advanced computational tasks.
E N D
Parallelism combining OpenMP and MPIRobert Sinkovits, Ph.D.sinkovit@sdsc.eduNPACI Parallel Computing InstituteAugust 28 - September 1, 2000San Diego Supercomputer Center
Outline • Purpose • Show how we combine MPI and OpenMP • Compiling and linking • Discuss • Available compilers • Required options • Options not to use • Example programs • Make files
Hardware description • Blue Horizon (horizon.npaci.edu) • 144 IBM SP-High Nodes • 144 8-way SMP nodes for a total of 1152 processors • 6.4 GB/s on-node memory bandwidth • 4 GB/node main memory • Power3 processors • 222 MHz, 4 floating point ops per cycle • 888 MFLOPS/processor for a total of 1.0023 Teraflop • Nodes currently connected with IBM switch • 115 MB/second • Maximum 4 MPI tasks/node in US when using switch
Why Combining MPI and OpenMP • We have 8 processors/node but can only use 4 • MPI requires multiple copies of data • Program to the hardware • Some applications have limited task parallelism • Keep data in cache (maybe)
Compilers • IBM • Fortran : xlf, xlf90 • Fortran with MPI: mpxlf, mpxlf90 • For OpenMP/SMP support append _r to the compiler command • xlf_r, xlf90_r, mpxlf_r, mpxlf90_r • OpenMP support for all flavors of Fortran, • C/C++ : xlc, xlC • C/C++ with MPI: mpcc, mpCC • For SMP support append _r to the compiler command • xlc_r, xlC_r, mpcc_r, mpCC_r • OpenMP support in C add -qsmp=omp • OpenMP not supported directly in C++ • C++ can call C or Fortran OpenMP subroutines
Compilers • KAI • Guidef90 guide f77 • Guidec, Guidec++ • “MP” scripts derived from IBM versions • kai_mpcc_r, kai_mpCC_r, kai_mpxlf90_r, kai_mpxlf_r • These are in /usr/local/apps/KAI_mpi • The compile line option -qalias=ALLPtrs causes wrong answers
A useful subroutine • Routine: thread_bind() • Causes threads to be bound to processors • Source can be found at: • www.npaci.edu/BlueHorizon/source/thread_bind.c • Needs to be called in a parallel critical region after MPI_Init • Does not have much effect on KAI compiler !$OMP PARALLEL !$OMP CRITICAL call thread_bind() !$OMP END CRITICAL !$OMP END PARALLEL #pragma omp parallel #pragma omp critical thread_bind();
Examples using OpenMP and MPI • C: bothc.c bothf.f • Creates arrays of random numbers • Sums in a OpenMP parallel for/do • Sums across processors using MPI_Reduce • IBM version calls thread_bind() • C++: both_C.C • KAI only • Same functionality as bothc.c • C++ calling C: callc.C • IBM only • C++ calls a C routine that does OpenMP returning the number of threads • MPI reduce finds the total number of threads
apps: ibm_apps kai_apps ibm_apps: bothc.ibm callc.ibm bothf.ibm kai_apps: bothc.kai both_C.kai bothf.kai OP=-O3 -qarch=auto -qtune=auto IBM_SMP=-qsmp=omp IBM_C_OP=-qalias=ALLPtrs IBM_F_OP= KAI_SMP= KAI_C_OP= KAI_F_OP= bothf.ibm: bothf.f bind.o mpxlf90_r $(IBM_SMP) $(OP) $(IBM_F_OP) bothf.f bind.o -o bothf.ibm bothc.ibm: bothc.c bind.o mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) bothc.c bind.o -o bothc.ibm callc.ibm: callc.C bind.o do_OpenMP.o mpCC_r $(IBM_SMP) $(OP) $(IBM_C_OP) callc.C \ do_OpenMP.o bind.o -o callc.ibm bind.o: bind.c mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) bind.c -c -o bind.o do_OpenMP.o:do_OpenMP.c mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) do_OpenMP.c -c -o do_OpenMP.o makefile
apps: ibm_apps kai_apps ibm_apps: bothc.ibm callc.ibm bothf.ibm kai_apps: bothc.kai both_C.kai bothf.kai OP=-O3 -qarch=auto -qtune=auto IBM_SMP=-qsmp=omp IBM_C_OP=-qalias=ALLPtrs IBM_F_OP= KAI_SMP= KAI_C_OP= KAI_F_OP= bothc.kai: bothc.c dummy.o kai_mpcc_r $(KAI_SMP) $(OP) $(KAI_C_OP) bothc.c dummy.o -o bothc.kai both_C.kai: both_C.C kai_mpCC_r $(KAI_SMP) $(OP) $(KAI_C_OP) both_C.C -o both_C.kai bothf.kai: bothf.f dummy.o kai_mpxlf90_r $(KAI_SMP) $(OP) $(KAI_F_OP) bothf.f dummy.o -o bothf.kai dummy.o: dummy.c cc -c dummy.c makefile
Output tf173i % make ibm_apps mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs bind.c -c -o bind.o mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs bothc.c bind.o -o bothc.ibm 1500-036: (I) Optimization level 3 has the potential to alter the semantics of a program. Please refer to documentation on -O3 and the STRICT option for more information. mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs do_OpenMP.c -c -o do_OpenMP.o mpCC_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs callc.C do_OpenMP.o bind.o -o callc.ibm 1540-5200 (W) The option "threaded" is not supported. mpxlf90_r -qsmp=omp -O3 -qarch=auto -qtune=auto bothf.f bind.o -o bothf.ibm ** hello === End of Compilation 1 === ** do_seed === End of Compilation 2 === 1501-510 Compilation successful for file bothf.f. tf173i %
Output tf173i % make kai_apps cc -c dummy.c kai_mpcc_r -O3 -qarch=auto -qtune=auto bothc.c dummy.o -o bothc.kai kai_mpCC_r -O3 -qarch=auto -qtune=auto both_C.C -o both_C.kai C++ prelinker: warning: library "libmpi_r{.so,.a}" does not exist in the specified library directories C++ prelinker: warning: library "libvtd_r{.so,.a}" does not exist in the specified library directories kai_mpxlf90_r -O3 -qarch=auto -qtune=auto bothf.f dummy.o -o bothf.kai "bothf.f", 1500-036 (I) Optimization level 3 has the potential to alter the semantics of a program. Please refer to documentation on -O3 and the STRICT option for more information. ** hello === End of Compilation 1 === ** do_seed === End of Compilation 2 === 1501-510 Compilation successful for file bothf.f. Target "apps" is up to date. tf173i %