1 / 85

CS 591x Clutter Computing and Programming Parallel Computers

CS 591x Clutter Computing and Programming Parallel Computers. Introduction to OpenMP. Recall…. Three paradigms for parallel software Distributed memory Shared memory Data Parallel As the names imply – Distributed memory paradigm suited for distributed memory architectures

cana
Télécharger la présentation

CS 591x Clutter Computing and Programming Parallel Computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 591x Clutter Computing and Programming Parallel Computers Introduction to OpenMP

  2. Recall… • Three paradigms for parallel software • Distributed memory • Shared memory • Data Parallel • As the names imply – • Distributed memory paradigm suited for distributed memory architectures • Shared memory paradigm suited for SMP architectures (shared memory)

  3. Distributed Memory • No common memory space • No automatic sharing of data across processes • All data sharing is done by explicit message passing • MPI

  4. Shared Memory • Processes share common memory space • Data sharing via common memory space • No explicit message passing required • OpenMP

  5. SMP vs Distributed Memory Architectures • SMP – several/many processors share a common memory pool • No explicit message passing • Performance very good – relative to message passing • So, why not just SMPs…

  6. SMPs • SMPs with a large number of processors tend to be very expensive • Process/Cache/Interconnect technology limits the number of processors that can be connected in an SMP • Memory/Processor interconnect can be a bottleneck – limiting performance

  7. SMPs - Multiport Memory Processor Memory Bank Processor Processor Processor

  8. SMPs – Bus architecture Processor Processor Processor Processor Memory Memory Memory

  9. SMP – Crossbar switches Processor Processor Processor Processor Processor Memory Memory Memory

  10. SMPs • 64 or 128 processor SMPs are very large SMPs • Distributed memory systems can scale up to hundreds or thousands of processors • Fastest computer are distributed memory systems • Still… SMPs are very powerful.

  11. PSC – Rachel and Jonas • 64 Processors • 1.67 Ghz EV7 Processors • 256 Gbytes of shared memory

  12. OpenMP • OpenMP – application programming interface standard for SMP parallel programming • Widely accepted standard, not widely implemented • API for Fortran, C, C++ • Implementations • Portland Group Fortran, C, C++ • Intel Fortran, C, C++

  13. OpenMP Concepts • OpenMP is not a language, more like an extension to existing languages – Fortran, C, C++ • Parallelization is implemented in a program through compiler directives… • Fortran – directives • C/C++ Pragmas • … and limited number of runtime functions… • and environment variables

  14. OpenMP Concepts • Pragmas/directives are parallelization instructions to the compiler • Pragmas/directives start with sentinals • Fortran !$omp [command] c$omp [command] *$omp [command] • c/c++ #pragma omp [command]

  15. OpenMP Concept • Parallel program blocks • blocks of code.. • completely contained within openMP construct • starts/ends with {/} (just like c) • one entry point/one exit point • no breaks or jumps

  16. OpenMP Concepts • Processes vs Thread • Process – executable code with independent memory space from other (maybe similar) executable code • Thread – executable code that shares memory space with other executable code • Group of threads in a program as called “teams”

  17. OpenMP Concepts • Fork and join • Fork -Process adds threads of execution • Joins - closes at completion of task

  18. Fork and Join execution model MPI OpenMP p0 p1 p2 p3 Thread Threads t0 t1 t2 t3 Threads t0 t1 t2 t3

  19. Compiling OpenMP programs • Portland Group C/C++ pgcc –mp myprog.c –o myprog • Intel C/C++ icc –openmp myproc.c –o myproc

  20. Running OpenMP programs • Set environment variables • OMP_NUM_THREADS=n • NCPUS=n export OMP_NUM_THREADS=4 • PBS/Torque !#/bin/sh #PBS –l nodes=2,ppn=2 export OMP_NUM_THREADS=4 ./myprog

  21. Sample OpenMP Program #include <stdio.h> #include "omp.h" main(int argc, char** argv) { printf("Hello world from an openMP program.\n"); #pragma omp parallel { printf(" from thread number %d\n",omp_get_thread_num()); } printf("this part is sequential.\n"); }

  22. OpenMP – Three parts • Environment Variables • Run-time Library Functions • Pragmas/Directives *

  23. OpenMP – Environment Variables • OMP_NUM_THREADS • specifies the number of threads to run during the execution of an OpenMP program • determines the number of threads assigned to a job regardless the number of processors in the system • default = 1 • export OMP_NUM_THREADS=4 • NCPUS does the same thing

  24. OpenMP – Environment Variables • OMP_SCHEDULE • defines the type of iteration scheduling used in OMP for and OMP Parallel for loops • options are • static • guided • dynamic

  25. OpenMP – Environment Variables • MPSTKZ • increase the size of stacks used by thread in parallel regions • may be needed if threads have a lot of private variables or .. • if functions within parallel regions have a lot of local variable storage • must use integer + “M” or “m” to mean megabytes • export MPSTKZ=8M

  26. OpenMP – Run-time Library Functions • Always use • #include <omp.h> • int omp_get_num_threads(void); • returns the number of threads in a family in a running in the parallel region where it was called int thcount=omp_get_num_threads();

  27. OpenMP – Run-time Library Functions • void omp_set_num_threads(int n); • sets the number of threads for the next parallel region • must be called before parallel region • if called from within parallel regions it is undefined • has precedence over OMP_NUM_THREADS environment variable omp_set_num_threads(6);

  28. OpenMP – Run-time Library Functions • int omp_get_thread_num(void); • returns a thread number for a thread within a team • thread numbers run from 0 (root thread) to omp_get_num_threads()-1 • if called in serial region returns 0 • similar to MPI_Comm_rank(…) int mythread_num=omp_get_thread_num();

  29. OpenMP – Run-time Library Functions • int omp_get_max_threads(void); • returns the max number of threads a job can have • returns max number of threads even if in serial region • can be changed with omp_set_num_threads(n); int max_t=omp_get_max_threads();

  30. OpenMP – Run-time Library Functions • int omp_in_parallel(void); • returns non-zero if it is called within a parallel region… • returns zero if not in a parallel region int p_or_pnot=omp_in_parallel();

  31. OpenMP – Run-time Library Functions • Defined but not implemented • void omp_set_dynamic(int dyn); • int omp_get_dynamic(void); • void omp_set_nested(int nested); • int omp_get_nested(void);

  32. OpenMP – Run-time Library Functions • to be continued….

  33. OpenMP – Pragmas/directives • General format • #pragma omp pragma_name [clauses] • where pragma_name is one of the pragma command names • clauses is one or more options or qualifying parameters for the pragma

  34. OpenMP – Pragmas/directives #pragma omp parallel {parallel region of c/c++ code } -- declares a parallel region and spawns threads to execute the region in parallel

  35. OpenMP – Pragmas/Directives • In Fortran – !$OMP PARALLEL [clause] Fortran code block !$OMP END PARALLEL

  36. OpenMP – Pragmas/directives • Sample program - revisited #include <stdio.h> #include "omp.h" main(int argc, char** argv) { printf("Hello world from an openMP program.\n"); #pragma omp parallel { printf(" from thread number %d\n",omp_get_thread_num()); } printf("this part is sequential.\n"); }

  37. OpenMP – Pragmas/directives #pragma omp parallel [clauses] clauses… private(list) shared(list) default(private | shared | none) firstprivate(list) reduction(operator:list) copyin(list) if (scalar_expression)

  38. OpenMP – omp parallel clauses private(list) list is a variable list variables in the list are private or local to each thread shared(list) all variables in the list are shared among threads

  39. OpenMP – omp parallel clauses • default(shared | none) • defines the default state of variables in the parallel region • shared means that variables are shared among threads unless otherwise stated • none means there is no default - all variables must be defined shared, private… • firstprivate(list) • private but initialize from object prior to parallel region

  40. OpenMP – omp parallel clauses • reduction(operator: list) • performs reduction operation on variables in the list • specific reduction operation defined by operator • operators – +, *, -, &, |, ^, &&, ||

  41. OpenMP – omp parallel clauses • copyin(list) • list must appear in threadprivate list • copies variable value from master thread to private variable in threads • if (scalar_value) • if evaluates to non-zero – executes region in parallel • if evaluates to zero – executes region in a single thread

  42. OpenMP Pragmas/Directives • #pragma omp threadprivate (list) • declares variables that are private/local to threads • … but persistent across multiple parallel sections • must appear immediately after variable declaration

  43. OpenMP - threadprivate • The copyin clause – copyin (a, b) • copies in (initializes) values of threadprivate variables in parallel region • used in parallel directives

  44. OpenMP – Pragmas/Directives • #pragma omp for • distributes the work of a for loop across the threads in a team… • …if there are no serial dependencies in the loop calculations • a for loop must follow this pragma (DO in Fortran) • this pragma must be within a parallel region

  45. OpenMP – Pragmas/Directives • Clauses • private(list) • shared(list) • lastprivate(list) • reduction(operator:list) • schedule(kind[,chunk]) • ordered • nowait – no barrier at the end of for loop

  46. OpenMP – Pragmas/directives • #pragma omp parallel for [clauses] • performs a for loop within a parallel region • distributes the work of the loop across threads • if there are no serial dependencies in the loop calculations

  47. OpenMP – Pragmas/Directives • #pragma omp parallel for – clauses • private(list) • shared(list) • default(shared | none) • firstprivate(list) • variables in list are private in each thread, but they are initialized by object in serial region

  48. OpenMP – Pragmas/Directives • Parallel for clauses • lastprivate(list) • variables in the list are private in each thread, but the last thread to set the variables updates their shared counterpart in the serial program • reduction(operator:list) • copyin(list)

  49. OpenMP – Pragmas/Directive • parallel for clauses • if (scalar_expression) • ordered • specifies that a section in a parallel region will be executed in the same order as if it were executed on a serial computer • schedule(kind, chunk)

  50. OpenMP – Pragmas/Directives #include <stdio.h> #include "omp.h" void main(int argc, char** argv){ int a[12]={1,2,3,4,5,6,7,8,9,10,11,12}; int i; #pragma omp parallel for shared(a) for(i=0;i<12;i++){ a[i] = a[i] + 100; }; for(i=0;i<12;i++){ printf("here is a[%d}] -- %d\n",i,a[i]); } }

More Related