1 / 77

Types of Parallelism

Types of Parallelism. Overt Parallelism is visible to the programmer Difficult to do (right) Large improvements in performance Covert Parallelism is not visible to the programmer Compiler responsible for parallelism Easy to do Small improvements in performance. Parallel Architectures.

morrisonl
Télécharger la présentation

Types of Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Types of Parallelism • Overt • Parallelism is visible to the programmer • Difficult to do (right) • Large improvements in performance • Covert • Parallelism is not visible to the programmer • Compiler responsible for parallelism • Easy to do • Small improvements in performance Message Passing Computing

  2. Parallel Architectures • For a long time parallel programs were written with a specific architecture in mind • Programs would only run on one type, maybe even only one, machine • A programmer would have to fit a problem to a specific architecture • Over time, programmers started writing programs in a particular style • The programs are then mapped onto a specific machine by a compiler Message Passing Computing

  3. Problem Architectures • Synchronous (SIMD) • The same operation is performed on all data points at the same time • Loosley Synchronous (SPMD) • The same operations are performed by all processors but they need not be done at exactly the same time • Not synchronized at the computer clock cycle but rather only macroscopically “every now and then” • Asynchronous (MPMD) • Every processor executes its own instruction on its own data Message Passing Computing

  4. SIMD Program Controller P0 P1 P2 P3 P4 P5 P6 Pn-1 Interconnection Network Message Passing Computing

  5. SPMD Same program - but no longer strictly synchronized Program Program Program Program Program Program Program Program P0 P1 P2 P3 P4 P5 P6 Pn-1 Interconnection Network Message Passing Computing

  6. MPMD Prog0 Prog1 Prog2 Prog3 Prog4 Prog5 Prog6 Progn-1 P0 P1 P2 P3 P4 P5 P6 Pn-1 Interconnection Network Message Passing Computing

  7. Processes • One can view a parallel programming as consisting of a number of independent processes • These processes are mapped to the physical processors • Ideally(?) one process per processor • You can also think of these as threads, although technically threads are a different sort of beast • For program development we do not really care about the mapping • Two ways to create processes • Static • All processes are specified before execution • The system executes a fixed number of processes • In a world were there is a mapping between process and processor this is only view that makes sense • Dynamic • Processes can be created at runtime • More powerful but incurs overhead at runtime Message Passing Computing

  8. Communication • Communication is vital in any kind of distributed application. • Initially most people wrote their own protocols. • Tower of Babel effect. • Eventually standards appeared. • Parallel Virtual Machine (PVM). • Message Passing Interface (MPI). Message Passing Computing

  9. Message Passing • In basic message passing, processes coordinate activities by explicitly sending and receiving messages • Commonly used in distributed-memory MIMD systems • Programming in an MP environment can be achieved by • Designing a special parallel language • Occam • Extending an existing sequential language to include MP constructs • Inmos C • Use a middleware layer that in conjunction with an existing language provides MP faciltities • MPI • Parallel Java • PVM Message Passing Computing

  10. Synchronous Message Passing Blocks until matching send() is complete recv(x) sync point send(2, x) Blocks until matching recv() is complete send(2, y) sync point recv(y) Message Passing Computing

  11. Asynchronous Message Passing Copies to buffer and continues send(2, x) Buffer recv(x) May or may not block recv(y) No synchronization point send(2, y) Message Passing Computing

  12. Broadcast P0 P1 P2 P3 data data data data Buffer May or may not be synchronous Message Passing Computing

  13. Multicast P0 P1 P2 P3 data data data data Buffer May or may not be synchronous Message Passing Computing

  14. Scatter P0 P1 P2 P3 data data data data May or may not be synchronous Message Passing Computing

  15. Gather P0 P1 P2 P3 data data data data May or may not be synchronous Message Passing Computing

  16. Reduction • Method to calculate a commutative (i.e., sum, product, minimum, maximum, …) value in log P steps • Think of summing the values in a tree 56 34 22 15 19 14 8 10 5 15 4 6 8 1 7 Message Passing Computing

  17. Reduction • If each node is a process… • Pairs of nodes on the bottom add and pass to parent • Pairs at next level do the same • Repeat until at root 56 34 22 15 19 14 8 10 5 15 4 6 8 1 7 Message Passing Computing

  18. Reduction • Instead of a tree, consider the reduction in a group of processors 10 5 15 4 6 8 1 7 7 6 5 4 3 2 1 0 Message Passing Computing

  19. Reduction • Sum to even processors 10 15 15 19 6 14 1 8 7 6 5 4 3 2 1 0 Message Passing Computing

  20. Reduction • Repeat 10 15 15 34 6 14 1 22 7 6 5 4 3 2 1 0 Message Passing Computing

  21. Reduction • Repeat one last time 10 15 15 29 6 14 1 56 7 6 5 4 3 2 1 0 Message Passing Computing

  22. Think Binary 111 110 101 100 011 010 001 000 Message Passing Computing

  23. Step 1 111 110 101 100 011 010 001 000 Message Passing Computing

  24. Step 1 111 110 101 100 011 010 001 000 Message Passing Computing

  25. Step 2 111 110 101 100 011 010 001 000 Message Passing Computing

  26. Step 2 111 110 101 100 011 010 001 000 Message Passing Computing

  27. Step 3 111 110 101 100 011 010 001 000 Message Passing Computing

  28. Step 3 111 110 101 100 011 010 001 000 Message Passing Computing

  29. Programming It Mask: 001 111 110 101 100 011 010 001 001 001 000 001 000 001 000 001 000 Bitwise AND Message Passing Computing

  30. Programming It Mask: 010 111 110 101 100 011 010 001 000 010 000 010 000 Bitwise AND Message Passing Computing

  31. Programming It Mask: 100 111 110 101 100 011 010 001 000 100 000 Bitwise AND Message Passing Computing

  32. Reduction • Okay now I know who sends when, but… • How do I know who to send to? Message Passing Computing

  33. Programming It Mask: 001 111 110 101 100 011 010 001 000 001 000 001 000 001 000 001 000 Bitwise AND 110 100 010 000 Bitwise XOR Message Passing Computing

  34. Programming It Mask: 010 111 110 101 100 011 010 001 000 010 000 010 000 Bitwise AND 100 000 Bitwise XOR Message Passing Computing

  35. Programming It Mask: 100 111 110 101 100 011 010 001 000 100 000 Bitwise AND 000 Bitwise XOR Message Passing Computing

  36. Reduce P0 P1 P2 P3 data data data data + Buffer May or may not be synchronous Message Passing Computing

  37. What is MPI? • A message passing library specification • Message-passing model • Not a compiler specification (i.e. not a language) • Not a specific product • Designed for parallel computers, clusters, and heterogeneous networks • Lets users, tool writers, library developers concentrate on their code as opposed to the low level communication code • API • Middleware Message Passing Computing

  38. The MPI Process • Development began in early 1992 • Open process/Broad participation • IBM,Intel, TMC, Meiko, Cray, Convex, Ncube • PVM, p4, Express, Linda, … • Laboratories, Universities, Government • Final version of draft in May 1994 • Public and vendor implementations are now widely available Message Passing Computing

  39. Why Message Passing? • Message passing is a mature paradigm • CSP was developed in 1978 • Well understood • Relatively easy to match to distributed hardware • Goal was to provide a full featured portable system • Modularity • Peak performance • Portability • Heterogeneity • Performance measurement tools Message Passing Computing

  40. Features • Communicators • A collection of processes that can send messages to each other • Point-to-point Communication • Collective Communication • Barrier synchronization • Broadcast • Gather/Scatter data • All-to-all exchange of data • Global reduction • Scan across all members of a communicator Message Passing Computing

  41. Bare bones MPI Program #include <mpi.h> void main( int argc, char **argv ) { // Non-MPI Stuff can go here MPI_Init( &argc, &argv ); // Your parallel code goes here MPI_Finalize(); // Non-MPI Stuff can go here } Message Passing Computing

  42. Odds and Ends • Even though programs are running on different processors you can print using printf() • No promise about ordering • Very useful for debugging • Supposedly scanf() • Be sure to use the –i option • Although it appears that argc and argv do what you expect, in some implementations they do not work • Send messages instead • Be careful with random number generators • If everyone seeds with the same value, numbers will not be very random Message Passing Computing

  43. Communicators • Many MPI calls require a communicator • A communicator is a collection of processes that can send messages to each other • Think of a communicator as defining a group • Only processes in the same communicator can communicate • Allows you to segment your communication traffic • Every process belongs to the MPI_COMM_WORLD communicator Message Passing Computing

  44. Getting Information • You can gather information about your environment • MPI_Comm_Rank( communicator, &retVal ); • Returns your rank – original process gets 0 • MPI_Get_processor_name( str_array, &length ); • Returns information about processor • MPI_MAX_PROCESSOR_NAME Message Passing Computing

  45. HelloWorldPrint.c #include <stdio.h> #include <mpi.h> void main ( int argc, char** argv ) { int myRank; int nameLen; char myName[ MPI_MAX_PROCESSOR_NAME ]; /* Initialize MPI */ MPI_Init( &argc, &argv ); /* Obtain information about the process */ MPI_Comm_rank( MPI_COMM_WORLD, &myRank ); MPI_Get_processor_name( myName, &nameLen ); /* Standard print */ printf( "Hello world from process #%d on %s\n", myRank, myName ); /* Terminate MPI */ MPI_Finalize(); } Message Passing Computing

  46. Compiling Parallel Programs • All clusters within the CS department are running Sun’s HPC software • Contains a variety of tools – including MPI • Everything (including documentation) is in /opt/SUNWhpc • Executables are in /opt/SUNWhpc/bin • Probably should add that to your path • Note that only the “clusters” have this software installed • See http://www.cs.rit.edu/~ark/runningpj.shtml for details • Compile MPI C programs using mpcc mpcc HelloWorldPrint.c –o hello –lmpi Message Passing Computing

  47. CS Parallel Resources • SMP parallel computers • paradise/parasite -- 8 processors, 1.35 GHz clock, 16 GB RAM • paradox/paragon -- 4 processors, 450 MHz clock, 4 GB RAM • Cluster parallel computer • paranoia.cs.rit.edu (296 MHz clock, 192 MB RAM) • 32 backend computers (thug01 through thug32) -- each an UltraSPARC-IIe CPU, 650 MHz clock, 1 GB RAM • 100-Mbps switched Ethernet backend interconnection network • Hybrid SMP cluster parallel computer (Not for class use) • tardis.cs.rit.edu (CPU, 650 MHz clock, 512 MB RAM) • 10 backend computers (dr00 through dr09) -- each with two AMD Opteron four processors, 2.6 GHz clock, 8 GB RAM • 1-Gbps switched Ethernet backend interconnection Message Passing Computing

  48. Running Parallel Programs • Rules of Engagement • Use the paradox and paradise machines to run SMP parallel programs. • Use the java mprun command on the paranoia machine to run MPI cluster parallel programs. Do not use the mprun command directly. • Run parallel java cluster programs on the paranoia machine • Details at: http://www.cs.rit.edu/~ark/runningpj.shtml • Account setup • You need to setup your account so you can ssh to the parallel machines without specifying a password • You need to include the parallel java libraries in your classpath Message Passing Computing

  49. Sample Run paranoia> mpcc HellowWorldPrint.c -o hello -l mpi paranoia> java mprun -np 6 hello Job 2, thug05, thug06, thug07, thug08, thug09, thug10 Hello world from process #1 on thug06 SunOS 5.9 SUNW,Sun-Blade-100 Sun_Microsystems Hello world from process #2 on thug07 SunOS 5.9 SUNW,Sun-Blade-100 Sun_Microsystems Hello world from process #3 on thug08 SunOS 5.9 SUNW,Sun-Blade-100 Sun_Microsystems Hello world from process #4 on thug09 SunOS 5.9 SUNW,Sun-Blade-100 Sun_Microsystems Hello world from process #5 on thug10 SunOS 5.9 SUNW,Sun-Blade-100 Sun_Microsystems Hello world from process #0 on thug05 SunOS 5.9 SUNW,Sun-Blade-100 Sun_Microsystems paranoia> Message Passing Computing

  50. Sending/Receiving Messages • MPI places messages inside “envelopes” • Point-to-Point Messages are sent/received using • MPI_Send( buffer, count, type, dest, tag, comm ); • MPI_Recv( buffer, count, type, src, tag, comm ); • These are blocking calls • Return when buffer is available/full Message Passing Computing

More Related