1 / 36

Parallel Programming On the IUCAA Clusters

Parallel Programming On the IUCAA Clusters. Sunu Engineer. IUCAA Clusters. The Cluster – Cluster of Intel Machines on Linux Hercules – Cluster of HP ES45 quad processor nodes References: http://www.iucaa.ernet.in/. The Cluster.

nate
Télécharger la présentation

Parallel Programming On the IUCAA Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming On the IUCAA Clusters Sunu Engineer

  2. IUCAA Clusters • The Cluster – Cluster of Intel Machines on Linux • Hercules – Cluster of HP ES45 quad processor nodes • References: http://www.iucaa.ernet.in/

  3. The Cluster • Four Single Processor Nodes with 100 Mbps Ethernet interconnect. • 1.4 GHz, Intel Pentium 4 • 512 MB RAM • Linux 2.4 Kernel (Redhat 7.2 Distribution) • MPI – LAM 6.5.9 • PVM – 3.4.3

  4. Hercules • Four quad processor nodes with Memory Channel interconnect • 1.25 GHz Alpha 21264D RISC Processor • 4 GB RAM • Tru64 5.1A with TruCluster software • Native MPI • LAM 7.0 • PVM 3.4.3

  5. ES45 Cluster Processor ~ 679/960 System GFLOPS ~ 30 Algorithm/Benchmark Used – Specint/float/HPL Expected Computational Performance • Intel Cluster • Processor - 512/590 • System GFLOPS ~ 2 • Algorithm/Benchmark Used – Specint/float/HPL

  6. Parallel Programs • Move towards large scale distributed programs • Larger class of problems with higher resolution • Enhanced levels of details to be explored • …

  7. The Starting Point • Model  Single Processor Program  Multi Processor Program • Model  Multiprocessor Program

  8. Decomposition of a Single Processor Program • Temporal • Initialization • Control • Termination • Spatial • Functional • Modular • Object based

  9. Multi Processor Programs • Spatial delocalization – Dissolving the boundary • Single spatial coordinate - Invalid • Single time coordinate - Invalid • Temporal multiplicity • Multiple streams at different rates w.r.t an external clock.

  10. In comparison • Multiple points of initialization • Distributed control • Multiple points and times of termination • Distribution of the activity in space and time

  11. Breaking up a problem

  12. Yet Another way

  13. And another

  14. Amdahl’s Law

  15. Degrees of refinement • Fine parallelism • Instruction level • Program statement level • Loop level • Coarse parallelism • Process level • Task level • Region level

  16. Patterns and Frameworks • Patterns - Documented solutions to recurring design problems. • Frameworks – Software and hardware structures implementing the infrastructure

  17. Processes and Threads • From heavy multitasking to lightweight multitasking on a single processor • Isolated memory spaces to shared memory space

  18. Posix Threads in Brief • pthread_create(pthread_t id, pthread_attr_t attributes, void *(*thread_function)(void *), void * arguments) • pthread_exit • pthread_join • pthread_self • pthread_mutex_init • pthread_mutex_lock/unlock • Link with –lpthread

  19. Multiprocessing architectures • Symmetric Multiprocessing • Shared memory • Space Unified • Different temporal streams • OpenMP standard

  20. OpenMP Programming • Set of directives to the compiler to express shared memory parallelism • Small library of functions • Environment variables. • Standard language bindings defined for FORTRAN, C and C++

  21. C An openMP program program openmp !$OMP PARALLEL print *, “Hello world from”, omp_get_thread_num() !$OMP END PARALLEL stop end Open MP example #include <stdio.h> #include <omp.h> int main(int argc, char ** argv) { #pragma omp parallel { printf(“Hello World from %d\n”,omp_get_thread_num()); } return(0); }

  22. Open MP directivesParallel and Work sharing • OMP Parallel [clauses] • OMP do [ clauses] • OMP sections [ clauses] • OMP section • OMP single

  23. Combined work sharingSynchronization • OMP parallel do • OMP parallel sections • OMP master • OMP critical • OMP barrier • OMP atomic • OMP flush • OMP ordered • OMP threadprivate

  24. OpenMP Directive clauses • shared(list) • private(list)/threadprivate • firstprivate/lastprivate(list) • default(private|shared|none) • default(shared|none) • reduction (operator|intrinsic : list) • copyin(list) • if (expr) • schedule(type[,chunk]) • ordered/nowait

  25. Open MP Library functions • omp_get/set_num_threads() • omp_get_max_threads() • omp_get_thread_num() • omp_get_num_procs() • omp_in_parallel() • omp_get/set_(dynamic/nested)() • omp_init/destroy/test_lock() • omp_set/unset_lock()

  26. OpenMP environment variables • OMP_SCHEDULE • OMP_NUM_THREADS • OMP_DYNAMIC • OMP_NESTED

  27. OpenMP Reduction and Atomic Operators • Reduction : +,-,*,&,|,&&,|| • Atomic : ++,--,+,*,-,/,&,>>,<<,|

  28. Simple loops • do I=1,N z(I) = a * x(I) + y end do !$OMP parallel do do I=1,N z(I) = a * x(I) + y end do

  29. Data Scoping • Loop index private by default • Declare as shared, private or reduction

  30. Private variables • !$OMP parallel do private(a,b,c) do I=1,m do j =1,n b=f(I) c=k(j) call abc(a,b,c) end do end do #pragma omp parallel for private(a,b,c)

  31. Dependencies • Data dependencies (Lexical/dynamic extent) • Flow dependencies • Classifying and removing the dependencies • Non removable dependencies • Examples Do I=2,n a(I) =a(I)+a(I-1) end do Do I=2,N,2 a(I)= a(I)+a(I-1) End do

  32. Making sure everyone has enough work • Parallel overhead – Creation of threads, synchronization vs. work done in the loop $!OMP parallel do schedule(dynamic,3) schedule type – static, dynamic, guided,runtime

  33. Parallel regions – from fine to coarse parallelism • $!OMP Parallel • threadprivate and copyin • Work sharing constructs • do, sections, section, single Synchronization • critical, atomic, barrier, ordered, master

  34. To distributed memory systems • MPI, PVM, BSP …

  35. Some Parallel Libraries Existing parallel libraries and toolkits include: • PUL, the Parallel Utilities Library from EPCC. • The Multicomputer Toolbox from Tony Skjellum and colleagues at LLNL and MSU. • The Portable, Extensible, Toolkit for Scientific computation from ANL. • ScaLAPACK from ORNL and UTK. • ESSL, PESSL on AIX • PBLAS, PLAPACK, ARPACK

More Related