1 / 40

Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder

Chapter 7: MPI and Other Local View Languages. Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder. Figure 7.1 An MPI solution to the Count 3s problem. Figure 7.1 An MPI solution to the Count 3s problem. (cont.). Code Spec 7.1 MPI_Init().

weathersby
Télécharger la présentation

Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7:MPI and Other Local View Languages Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder

  2. Figure 7.1 An MPI solution to the Count 3s problem.

  3. Figure 7.1 An MPI solution to the Count 3s problem. (cont.)

  4. Code Spec 7.1MPI_Init().

  5. Code Spec 7.2MPI_Finalize().

  6. Code Spec 7.3MPI_Comm_Size().

  7. Code Spec 7.4MPI_Comm_Rank().

  8. Code Spec 7.5MPI_Send().

  9. Code Spec 7.6MPI_Recv().

  10. Code Spec 7.7MPI_Reduce().

  11. Code Spec 7.8MPI_Scatter().

  12. Code Spec 7.8MPI_Scatter(). (cont.)

  13. Figure 7.2 Replacement code (for lines 16–48 of Figure 7.1) to distribute data using a scatter operation.

  14. Code Spec 7.9MPI_Gather().

  15. Figure 7.3 Each message must be copied as it moves across four address spaces, each contributing to the overall latency.

  16. Code Spec 7.10MPI_Scan().

  17. Code Spec 7.11MPI_Bcast(). MPI routine to broadcast data from one root process to all other processes in the communicator.

  18. Code Spec 7.12MPI_Barrier().

  19. Code Spec 7.13MPI_Wtime().

  20. Figure 7.4 Example of collective communication within a group.

  21. Code Spec 7.14MPI_Comm_group().

  22. Code Spec 7.15MPI_Group_incl().

  23. Code Spec 7.16MPI_Comm_create().

  24. Figure 7.5 A 2D relaxation replaces—on each iteration—all interior values by the average of their four nearest neighbors.

  25. Figure 7.6 MPI code for the main loop of the 2D SOR computation.

  26. Figure 7.6 MPI code for the main loop of the 2D SOR computation. (cont.)

  27. Figure 7.6 MPI code for the main loop of the 2D SOR computation. (cont.)

  28. Figure 7.7 Depiction of dynamic work redistribution in MPI.

  29. Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives.

  30. Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives. (cont.)

  31. Figure 7.8 A 2D SOR MPI program using non-blocking sends and receives. (cont.)

  32. Code Spec 7.17MPI_Waitall().

  33. Figure 7.9 Creating a derived data type.

  34. Partitioned Global Address Space Languages Higher level of abstraction Built on top of distributed memory clusters Considered a single address space Allows definition of global data structures Must consider local vs global data No longer consider message passing details or distributed data structures Use a more efficient one sided substrate

  35. Main PGAS • Co-Array Fortran • https://bluewaters.ncsa.illinois.edu/caf • Unified Parallel C • http://upc.lbl.gov/ • Titanium • http://titanium.cs.berkeley.edu/

  36. Co-Array Fortran (CAF) • Extends FORTRAN • Originally called F - - • Elegant and simple • Uses co-array (communication array) • Real, dimension (n,n)[p,*]:: a, b, c • a, b, c are co-arrays • Memory for co-array is dist across each process determined by the dimension statement

  37. Unified Parallel C (UPC) • Global view of address space • Shared arrays are distributed in cyclic or block cyclic arrangement (aides load balancing) • Supports pointers (C) 4 types • private private • shared private • private shared • shared shared

  38. C pointers • Private pointer pointing locally • int *p1; • Private pointer pointing to shared space • shared int *p2; • Shared pointer pointing locally • int *shared p3; • Shared pointer pointing into shared space • shared int *shared p4;

  39. UPC • Has a forall verb • upc_forall • Distributes normal C for loop iterations across all processes • A global operation whereas most other operations are local

  40. Titanium • Extends java • Object oriented • Adds regions • Supports safe memory management • Unordered iteration • Foreach • Allows concurrency over multiple indices in a block

More Related