1 / 23

OpenMPI

OpenMPI. Majdi Baddourah. https://computing.llnl.gov/tutorials/mpi/. What is OpenMP. An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism Comprised of three primary API components: Compiler Directives

cicero
Télécharger la présentation

OpenMPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OpenMPI Majdi Baddourah https://computing.llnl.gov/tutorials/mpi/

  2. What is OpenMP • An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism • Comprised of three primary API components: • Compiler Directives • Runtime Library Routines • Environment Variables • Portable: • The API is specified for C/C++ and Fortran • Standardized: • Jointly defined and endorsed by a group of major computer hardware and software vendors • What does OpenMP stand for? • Open Multi-Processing

  3. C / C++ - General Code Structure #include <omp.h> main () { int var1, var2, var3; Serial code . Beginning of parallel section. Fork a team of threads. Specify variable scoping #pragma omp parallel private(var1, var2) shared(var3) { Parallel section executed by all threads . All threads join master thread and disband }   Resume serial code . }

  4. Fork - Join Model

  5. C / C++ Directives Format #pragma omp parallel default(shared) private(beta,pi)

  6. Parallel Region Example #include <omp.h>  main () {  int nthreads, tid; /* Fork a team of threads with each thread having a private tid variable */ #pragma omp parallel private(tid) { /* Obtain and print thread id */ tid = omp_get_thread_num(); printf("Hello World from thread = %d\n", tid); /* Only master thread does this */ if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %d\n", nthreads); } } /* All threads join master thread and terminate */ }

  7. Work-Sharing Construct

  8. Work-Sharing Constructs DO / for Directive #pragma omp for [clause ...] newline schedule (type [,chunk]) ordered private (list) firstprivate (list) lastprivate (list) shared (list) reduction (operator: list) collapse (n) nowait for_loop

  9. For Directive Example #include <omp.h> #define CHUNKSIZE 100 #define N 1000 main () { int i, chunk; float a[N], b[N], c[N]; /* Some initializations */ for (i=0; i < N; i++) a[i] = b[i] = i * 1.0; chunk = CHUNKSIZE; #pragma omp parallel shared(a,b,c,chunk) private(i) {  #pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i]; } /* end of parallel section */ }

  10. SECTIONS Directive • The SECTIONS directive is a non-iterative work-sharing construct. It specifies that the enclosed section(s) of code are to be divided among the threads in the team. • Independent SECTION directives are nested within a SECTIONS directive. Each SECTION is executed once by a thread in the team. Different sections may be executed by different threads. It is possible that for a thread to execute more than one section if it is quick enough and the implementation permits such.

  11. SECTIONS Directive #pragma omp sections [clause ...] newline private (list) firstprivate (list) lastprivate (list) reduction (operator: list) nowait {  #pragma omp section newline structured_block #pragma omp section newline structured_block }  

  12. SECTIONS Directive  #include <omp.h> #define N 1000  main () { int i; float a[N], b[N], c[N], d[N];  /* Some initializations */ for (i=0; i < N; i++) { a[i] = i * 1.5; b[i] = i + 22.35; } #pragma omp parallel shared(a,b,c,d) private(i) {  #pragma omp sections nowait {  #pragma omp section for (i=0; i < N; i++) c[i] = a[i] + b[i];  #pragma omp section for (i=0; i < N; i++) d[i] = a[i] * b[i];  } /* end of sections */  } /* end of parallel section */  }

  13. CRITICAL Construct #include <omp.h>  main() { int x;x = 0;  #pragma omp parallel shared(x) {   #pragma omp critical x = x + 1;  } /* end of parallel section */  }  

  14. THREADPRIVATE Directive • The THREADPRIVATE directive is used to make global file scope variables (C/C++) or common blocks (Fortran) local and persistent to a thread through the execution of multiple parallel regions

  15. #include <omp.h> int a, b, i, tid; float x; #pragma omp threadprivate(a, x) main () { /* Explicitly turn off dynamic threads */ omp_set_dynamic(0); printf("1st Parallel Region:\n"); #pragma omp parallel private(b,tid) { tid = omp_get_thread_num(); a = tid; b = tid; x = 1.1 * tid +1.0; printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x); } /* end of parallel section */ printf("************************************\n"); printf("Master thread doing serial work here\n"); printf("************************************\n"); printf("2nd Parallel Region:\n"); #pragma omp parallel private(tid) { tid = omp_get_thread_num(); printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x); } /* end of parallel section */ } Threadprivate Directive Example

  16. Results 1st Parallel Region: Thread 0: a,b,x= 0 0 1.000000 Thread 2: a,b,x= 2 2 3.200000 Thread 3: a,b,x= 3 3 4.300000 Thread 1: a,b,x= 1 1 2.100000 ************************************ Master thread doing serial work here ************************************ 2nd Parallel Region: Thread 0: a,b,x= 0 0 1.000000 Thread 3: a,b,x= 3 0 4.300000 Thread 1: a,b,x= 1 0 2.100000 Thread 2: a,b,x= 2 0 3.200000 Threadprivate Directive Example

  17. Reduction Clause Example #include <omp.h> main () { int i, n, chunk; float a[100], b[100], result; /* Some initializations */ n = 100; chunk = 10; result = 0.0; for (i=0; i < n; i++) { a[i] = i * 1.0; b[i] = i * 2.0; } #pragma omp parallel for \ default(shared) private(i) \ schedule(static,chunk) \ reduction(+:result) for (i=0; i < n; i++) result = result + (a[i] * b[i]); printf("Final result= %f\n",result); }

  18. Clauses / Directives Summary

  19. Environment Variables • OMP_SCHEDULE • OMP_NUM_THREADS • SetenvOMP_NUM_THREADS 5 • OMP_DYNAMIC • setenvOMP_DYNAMIC TRUE • OMP_NESTED • OMP_STACKSIZE • OMP_WAIT_POLICY • OMP_MAX_ACTIVE_LEVELS • OMP_THREAD_LIMIT

  20. Compilers Flags

  21. Amdahl’s Law

  22. int i; int j; int prime; int total = 0; # pragma omp parallel \ shared ( n ) \ private ( i, j, prime ) \ reduction ( + : total ) # pragma omp for for ( i = 2; i <= n; i++ ) { prime = 1; for ( j = 2; j < i; j++ ) { if ( i % j == 0 ) { prime = 0; break; } } total = total + prime; }

  23. total = 0.0; # pragma omp parallel shared ( a, b, n ) private ( i, x ) reduction ( + : total ) # pragma omp for for ( i = 0; i < n; i++ ) { x = ( ( n - i - 1 ) * a + ( i ) * b ) / ( n - 1 ); total = total + f ( x ); }

More Related