Outline

Parallelism combining OpenMP and MPIRobert Sinkovits, Ph.D.sinkovit@sdsc.eduNPACI Parallel Computing InstituteAugust 28 - September 1, 2000San Diego Supercomputer Center

Outline • Purpose • Show how we combine MPI and OpenMP • Compiling and linking • Discuss • Available compilers • Required options • Options not to use • Example programs • Make files

Hardware description • Blue Horizon (horizon.npaci.edu) • 144 IBM SP-High Nodes • 144 8-way SMP nodes for a total of 1152 processors • 6.4 GB/s on-node memory bandwidth • 4 GB/node main memory • Power3 processors • 222 MHz, 4 floating point ops per cycle • 888 MFLOPS/processor for a total of 1.0023 Teraflop • Nodes currently connected with IBM switch • 115 MB/second • Maximum 4 MPI tasks/node in US when using switch

Why Combining MPI and OpenMP • We have 8 processors/node but can only use 4 • MPI requires multiple copies of data • Program to the hardware • Some applications have limited task parallelism • Keep data in cache (maybe)

Compilers • IBM • Fortran : xlf, xlf90 • Fortran with MPI: mpxlf, mpxlf90 • For OpenMP/SMP support append _r to the compiler command • xlf_r, xlf90_r, mpxlf_r, mpxlf90_r • OpenMP support for all flavors of Fortran, • C/C++ : xlc, xlC • C/C++ with MPI: mpcc, mpCC • For SMP support append _r to the compiler command • xlc_r, xlC_r, mpcc_r, mpCC_r • OpenMP support in C add -qsmp=omp • OpenMP not supported directly in C++ • C++ can call C or Fortran OpenMP subroutines

Compilers • KAI • Guidef90 guide f77 • Guidec, Guidec++ • “MP” scripts derived from IBM versions • kai_mpcc_r, kai_mpCC_r, kai_mpxlf90_r, kai_mpxlf_r • These are in /usr/local/apps/KAI_mpi • The compile line option -qalias=ALLPtrs causes wrong answers

A useful subroutine • Routine: thread_bind() • Causes threads to be bound to processors • Source can be found at: • www.npaci.edu/BlueHorizon/source/thread_bind.c • Needs to be called in a parallel critical region after MPI_Init • Does not have much effect on KAI compiler !$OMP PARALLEL !$OMP CRITICAL call thread_bind() !$OMP END CRITICAL !$OMP END PARALLEL #pragma omp parallel #pragma omp critical thread_bind();

Examples using OpenMP and MPI • C: bothc.c bothf.f • Creates arrays of random numbers • Sums in a OpenMP parallel for/do • Sums across processors using MPI_Reduce • IBM version calls thread_bind() • C++: both_C.C • KAI only • Same functionality as bothc.c • C++ calling C: callc.C • IBM only • C++ calls a C routine that does OpenMP returning the number of threads • MPI reduce finds the total number of threads

apps: ibm_apps kai_apps ibm_apps: bothc.ibm callc.ibm bothf.ibm kai_apps: bothc.kai both_C.kai bothf.kai OP=-O3 -qarch=auto -qtune=auto IBM_SMP=-qsmp=omp IBM_C_OP=-qalias=ALLPtrs IBM_F_OP= KAI_SMP= KAI_C_OP= KAI_F_OP= bothf.ibm: bothf.f bind.o mpxlf90_r $(IBM_SMP) $(OP) $(IBM_F_OP) bothf.f bind.o -o bothf.ibm bothc.ibm: bothc.c bind.o mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) bothc.c bind.o -o bothc.ibm callc.ibm: callc.C bind.o do_OpenMP.o mpCC_r $(IBM_SMP) $(OP) $(IBM_C_OP) callc.C \ do_OpenMP.o bind.o -o callc.ibm bind.o: bind.c mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) bind.c -c -o bind.o do_OpenMP.o:do_OpenMP.c mpcc_r $(IBM_SMP) $(OP) $(IBM_C_OP) do_OpenMP.c -c -o do_OpenMP.o makefile

apps: ibm_apps kai_apps ibm_apps: bothc.ibm callc.ibm bothf.ibm kai_apps: bothc.kai both_C.kai bothf.kai OP=-O3 -qarch=auto -qtune=auto IBM_SMP=-qsmp=omp IBM_C_OP=-qalias=ALLPtrs IBM_F_OP= KAI_SMP= KAI_C_OP= KAI_F_OP= bothc.kai: bothc.c dummy.o kai_mpcc_r $(KAI_SMP) $(OP) $(KAI_C_OP) bothc.c dummy.o -o bothc.kai both_C.kai: both_C.C kai_mpCC_r $(KAI_SMP) $(OP) $(KAI_C_OP) both_C.C -o both_C.kai bothf.kai: bothf.f dummy.o kai_mpxlf90_r $(KAI_SMP) $(OP) $(KAI_F_OP) bothf.f dummy.o -o bothf.kai dummy.o: dummy.c cc -c dummy.c makefile

Output tf173i % make ibm_apps mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs bind.c -c -o bind.o mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs bothc.c bind.o -o bothc.ibm 1500-036: (I) Optimization level 3 has the potential to alter the semantics of a program. Please refer to documentation on -O3 and the STRICT option for more information. mpcc_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs do_OpenMP.c -c -o do_OpenMP.o mpCC_r -qsmp=omp -O3 -qarch=auto -qtune=auto -qalias=ALLPtrs callc.C do_OpenMP.o bind.o -o callc.ibm 1540-5200 (W) The option "threaded" is not supported. mpxlf90_r -qsmp=omp -O3 -qarch=auto -qtune=auto bothf.f bind.o -o bothf.ibm ** hello === End of Compilation 1 === ** do_seed === End of Compilation 2 === 1501-510 Compilation successful for file bothf.f. tf173i %

Output tf173i % make kai_apps cc -c dummy.c kai_mpcc_r -O3 -qarch=auto -qtune=auto bothc.c dummy.o -o bothc.kai kai_mpCC_r -O3 -qarch=auto -qtune=auto both_C.C -o both_C.kai C++ prelinker: warning: library "libmpi_r{.so,.a}" does not exist in the specified library directories C++ prelinker: warning: library "libvtd_r{.so,.a}" does not exist in the specified library directories kai_mpxlf90_r -O3 -qarch=auto -qtune=auto bothf.f dummy.o -o bothf.kai "bothf.f", 1500-036 (I) Optimization level 3 has the potential to alter the semantics of a program. Please refer to documentation on -O3 and the STRICT option for more information. ** hello === End of Compilation 1 === ** do_seed === End of Compilation 2 === 1501-510 Compilation successful for file bothf.f. Target "apps" is up to date. tf173i %

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: