1 / 71

Ace104 Lecture 8

Ace104 Lecture 8. Tightly Coupled Components MPI (Message Passing Interface). Motivation. To this point we have focused on highly granular, loosely coupled components via web services (ie using SOAP/XML/http) Some components need to couple more tightly Rate and volume of data exchange, e.g.

laksha
Télécharger la présentation

Ace104 Lecture 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ace104Lecture 8 Tightly Coupled Components MPI (Message Passing Interface)

  2. Motivation • To this point we have focused on highly granular, loosely coupled components via web services (ie using SOAP/XML/http) • Some components need to couple more tightly • Rate and volume of data exchange, e.g. • Granularity of interfaces • These components are normally controlled in a unified “back-end” environment, so inter-component security is a less prominent issue

  3. Multi-grained services • Tight coupling implies fine granularity, but not necessarily an rpc architectural style • Real world architectures are built of multi-grain components • Low granularity loosely coupled components communicating via web services • These components themselves are made up of high granularity(sub)- components communicating via some more efficient mechanism • Java rmi • Raw sockets • MPI, etc.

  4. Role of MPI -- HPC is not all • One good example of this is speeding up numerical operation by parallelization • Risk management, option pricing, data mining, flow simulation, etc. • These faster components can then be coupled via web services (e.g. this is the common architectural model of Grid Computing) • However, tight coupling is more general than parallel computing • Can be used for any sub-service where performance matters; has gained popularity recently in this area.

  5. Standardization • Parallel computing community has resorted to “community-based” standards • HPF • MPI • OpenMP? • Some commercial products are becoming “de facto” standards, but only because they are portable • TotalView parallel debugger, PBS batch scheduler

  6. Risks of Standardization • Failure to involve all stakeholders can result in standard being ignored • application programmers • researchers • vendors • Premature standardization can limit production of new ideas by shutting off support for further research projects in the area

  7. Models for Parallel Computation • Shared memory (load, store, lock, unlock) • Message Passing (send, receive, broadcast, ...) • Transparent (compiler works magic) • Directive-based (compiler needs help) • Others (BSP, OpenMP, ...) • Task farming (scientific term for large transaction processing)

  8. The Message-Passing Model • A process is (traditionally) a program counter and address space • Processes may have multiple threads (program counters and associated stacks) sharing a single address space • Message passing is for communication among processes, which have separate address spaces • Interprocess communication consists of • synchronization • movement of data from one process’s address space to another’s

  9. What is MPI? • A message-passing library specification • extended message-passing model • not a language or compiler specification • not a specific implementation or product • For parallel computers, clusters, and heterogeneous networks • Full-featured • Designed to provide access to advanced parallel hardware for end users, library writers, and tool developers

  10. Where Did MPI Come From? • Early vendor systems (Intel’s NX, IBM’s EUI, TMC’s CMMD) were not portable (or very capable) • Early portable systems (PVM, p4, TCGMSG, Chameleon) were mainly research efforts • Did not address the full spectrum of issues • Lacked vendor support • Were not implemented at the most efficient level • The MPI Forum organized in 1992 with broad participation by: • vendors: IBM, Intel, TMC, SGI, Convex, Meiko • portability library writers: PVM, p4 • users: application scientists and library writers • finished in 18 months

  11. Novel Features of MPI • Communicators encapsulate communication spaces for library safety • Datatypes reduce copying costs and permit heterogeneity • Multiple communication modes allow precise buffer management • Extensive collective operations for scalable global communication • Process topologies permit efficient process placement, user views of process layout • Profiling interface encourages portable tools

  12. MPI References • The Standard itself: • at http://www.mpi-forum.org • All MPI official releases, in both postscript and HTML • Books: • Using MPI: Portable Parallel Programming with the Message-Passing Interface, 2nd Edition, by Gropp, Lusk, and Skjellum, MIT Press, 1999. Also Using MPI-2, w. R. Thakur • MPI: The Complete Reference, 2 vols, MIT Press, 1999. • Designing and Building Parallel Programs, by Ian Foster, Addison-Wesley, 1995. • Parallel Programming with MPI, by Peter Pacheco, Morgan-Kaufmann, 1997. • Other information on Web: • at http://www.mcs.anl.gov/mpi • pointers to lots of stuff, including other talks and tutorials, a FAQ, other MPI pages

  13. send/recv • Basic MPI functionality • MPI_Send(void *buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm) • MPI_Recv(void *buf, int count, MPI_Datatype type, int src, int tag, MPI_Comm comm, MPI_Status stat) • *stat is a C struct returned with at least the following fields • stat.MPI_SOURCE • stat.MPI_TAG • stat.MPI_ERROR

  14. Blocking vs. non-blocking • Send/recv functions in previous slide is referred to as blocking point-to-point communication • MPI also has non-blocking send/recv functions that will be studied next class – MPI_Isend, MPI_Irecv • Semantics between two are very different – must be very careful to understand rules to write safe programs

  15. Blocking recv • Semantics of blocking recv • A blocking receive can be started whether or not a matching send has been posted • A blocking receive returns only after its receive buffer contains the newly received message • A blocking receive can complete before the matching send has completed (but only after it has started)

  16. Blocking send • Semantics of blocking send • Can start whether or not a matching recv has been posted • Returns only after message in data envelope is safe to be overwritten • This can mean that date was either buffered or that it was sent directly to receive process • Which happens is up to implementation • Very strong implications for writing safe programs

  17. Examples MPI_Comm_rank(MPI_COMM_WORLD, rank); if (rank == 0){ MPI_Send(sendbuf, count, MPI_DOUBLE, 1, tag, comm); MPI_Recv(recvbuf, count, MPI_DOUBLE, 1, tag, comm, stat); } else if (rank == 1){ MPI_Recv(recvbuf, count, MPI_DOUBLE, 0, tag, comm, stat) MPI_Send(sendbuf, count, MPI_DOUBLE, 0, tag, comm) } Is this program safe? Why or why not? Yes, this is safe even if no buffer space is available!

  18. Examples MPI_Comm_rank(MPI_COMM_WORLD, rank); if (rank == 0){ MPI_Recv(recvbuf, count, MPI_DOUBLE, 1, tag, comm, stat); MPI_Send(sendbuf, count, MPI_DOUBLE, 1, tag, comm); } else if (rank == 1){ MPI_Recv(recvbuf, count, MPI_DOUBLE, 0, tag, comm, stat); MPI_Send(sendbuf, count, MPI_DOUBLE, 0, tag, comm); } Is this program safe? Why or why not? No, this will always deadlock!

  19. Examples MPI_Comm_rank(MPI_COMM_WORLD, rank); if (rank == 0){ MPI_Send(sendbuf, count, MPI_DOUBLE, 1, tag, comm); MPI_Recv(recvbuf, count, MPI_DOUBLE, 1, tag, comm, stat); } else if (rank == 1){ MPI_Send(sendbuf, count, MPI_DOUBLE, 0, tag, comm); MPI_Recv(recvbuf, count, MPI_DOUBLE, 0, tag, comm, stat); } Is this program safe? Why or why not? Often, but not always! Depends on buffer space.

  20. Message order • Messages in MPI are said to be non-overtaking. • That is, messages sent from a process to another process are guaranteed to arrive in same order. • However, nothing is guaranteed about messages sent from other processes, regardless of when send was initiated

  21. dest = 1 tag = 1 dest = 1 tag = 4 dest = * tag = 1 dest = * tag = 1 dest = 2 tag = * dest = 2 tag = * dest = * tag = * dest = 1 tag = 1 dest = 1 tag = 2 dest = 1 tag = 3 Illustration of message ordering P0 (send) P1 (recv) P2 (send)

  22. Another example int rank = MPI_Comm_rank(); if (rank == 0){ MPI_Send(buf1, count, MPI_FLOAT, 2, tag); MPI_Send(buf2, count, MPI_FLOAT, 1, tag); } else if (rank == 1){ MPI_Recv(buf2, count, MPI_FLOAT, 0, tag); MPI_Send(buf2, count, MPI_FLOAT, 2, tag); else if (rank == 2){ MPI_Recv(buf1, count, MPI_FLOAT, MPI_ANY_SOURCE, tag); MPI_Recv(buf2, count, MPI_FLOAT, MPI_ANY_SOURCE, tag); }

  23. Illustration of previous code send send recv send recv recv Which message will arrive first? Impossible to say!

  24. Progress • Progress • If a pair of matching send/recv has been initiated, at least one of the two operations will complete, regardless of any other actions in the system • send will complete, unless recv is satisfied by another message • recv will complete, unless message sent is consumed by another matching recv

  25. Fairnesss • MPI makes no guarantee of fairness • If MPI_ANY_SOURCE is used, a sent message may repeatedly be overtaken by other messages (from different processes) that match the same receive.

  26. Send Modes • To this point, we have studied non-blocking send routines using standard mode. • In standard mode, the implementation determines whether buffering occurs. • This has major implications for writing safe programs

  27. Other send modes • MPI includes three other send modes that give the user explicit control over buffering. • These are: buffered, synchronous, and ready modes. • Corresponding MPI functions • MPI_Bsend • MPI_Ssend • MPI_Rsend

  28. MPI_Bsend • Buffered Send: allows user to explicitly create buffer space and attach buffer to send operations: • MPI_BSend(void *buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm) • Note: this is same as standard send arguments • MPI_Buffer_attach(void *buf, int size); • Create buffer space to be used with BSend • MPI_Buffer_detach(void *buf, int *size); • Note: in detach case void * argument is really pointer to buffer address, so that add • Note: call blocks until message has been safely sent • Note: It is up to the user to properly manage the buffer and ensure space is available for any Bsend call

  29. MPI_Ssend • Synchronous Send • Ensures that no buffering is used • Couples send and receive operation – send cannot complete until matching receive is posted and message is fully copied to remove processor • Very good for testing buffer safety of program

  30. MPI_Rsend • Ready Send • Matching receive must be posted before send, otherwise program is incorrect • Can be implemented to avoid handshake overhead when program is known to meet this condition • Not very typical + dangerous

  31. Implementation oberservations • MPI_Sendcould be implemented as MPI_Ssend, but this would be weird and undesirable • MPI_Rsend could be implemented as MPI_Ssend, but this would eliminate any performance enhancements • Standard mode (MPI_Send) is most likely to be efficiently implemented

  32. MPI’s Non-blocking Operations • Non-blocking operations return (immediately) “request handles” that can be tested and waited on. MPI_Isend(start, count, datatype, dest, tag, comm, request) MPI_Irecv(start, count, datatype, dest, tag, comm, request) MPI_Wait(&request, &status) • One canalso test without waiting: MPI_Test(&request, &flag, status)

  33. Multiple Completions • It is sometimes desirable to wait on multiple requests: MPI_Waitall(count, array_of_requests, array_of_statuses) MPI_Waitany(count, array_of_requests, &index, &status) MPI_Waitsome(count, array_of_requests, array_of indices, array_of_statuses) • Thereare corresponding versions of test for each of these.

  34. Embarrassingly parallel examples Mandelbrot set Monte Carlo Methods Image manipulation

  35. Embarrassingly Parallel • Also referred to as naturally parallel • Each Processor works on their own sub-chunk of data independently • Little or no communication required

  36. Mandelbrot Set • Creates pretty and interesting fractal images with a simple recursive algorithm zk+1 = zk * zk + c • Both z and c are imaginary numbers • for each point c we compute this formula until either • A specified number of iterations has occurred • The magnitude of z surpasses 2 • In the former case the point is not in the Mandelbrot set • In the latter case it is in the Mandelbrot set

  37. Parallelizing Mandelbrot Set • What are the major defining features of problem? • Each point is computed completely independently of every other point • Load balancing issues – how to keep procs busy • Strategies for Parallelization?

  38. Mandelbrot Set Simple Example • See mandelbrot.c and mandelbrot_par.c for simple serial and parallel implementation • Think how load balacing could be better handled

  39. Monte Carlo Methods • Generic description of a class of methods that uses random sampling to estimate values of integrals, etc. • A simple example is to estimate the value of pi

  40. 1 Using Monte Carlo to Estimate p • Fraction of randomly • Selected points that lie • In circle is ratio of areas, • Hence pi/4 • Ratio of Are of circle to Square is pi/4 • What is value of pi?

  41. Parallelizing Monte Carlo • What are general features of algorithm? • Each sample is independent of the others • Memory is not an issue – master-slave architecture? • Getting independent random numbers in parallel is an issue. How can this be done?

  42. MPI Datatypes • The data in a message to send or receive is described by a triple (address, count, datatype), where • An MPI datatype is recursively defined as: • predefined, corresponding to a data type from the language (e.g., MPI_INT, MPI_DOUBLE) • a contiguous array of MPI datatypes • a strided block of datatypes • an indexed array of blocks of datatypes • an arbitrary structure of datatypes • There are MPI functions to construct custom datatypes, in particular ones for subarrays

  43. MPI Tags • Messages are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message • Messages can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive • Some non-MPI message-passing systems have called tags “message types”. MPI calls them tags to avoid confusion with datatypes

  44. MPI is Simple • Many MPI programs can be written using just these six functions, only two of which are non-trivial: • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_SEND • MPI_RECV

  45. Collective Operations in MPI • Collective operations are called by all processes in a communicator • MPI_BCAST distributes data from one process (the root) to all others in a communicator • MPI_REDUCE combines data from all processes in communicator and returns it to one process • In many numerical algorithms, SEND/RECEIVE can be replaced by BCAST/REDUCE, improving both simplicity and efficiency

  46. Example: PI in C - 1 #include "mpi.h" #include <math.h> int main(int argc, char *argv[]) {int done = 0, n, myid, numprocs, i, rc;double PI25DT = 3.141592653589793238462643;double mypi, pi, h, sum, x, a;MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);while (!done) { if (myid == 0) { printf("Enter the number of intervals: (0 quits) "); scanf("%d",&n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break;

  47. Example: PI in C - 2 h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is approximately %.16f, Error is .16f\n", pi, fabs(pi - PI25DT));}MPI_Finalize(); return 0; }

  48. Alternative Set of 6 Functions • Using collectives: • MPI_INIT • MPI_FINALIZE • MPI_COMM_SIZE • MPI_COMM_RANK • MPI_BCAST • MPI_REDUCE

  49. Process 0 Process 1 User data Local buffer the network Local buffer User data Buffers • When you send data, where does it go? One possibility is: Buffering

  50. Avoiding Buffering • It is better to avoid copies: Process 0 Process 1 User data the network User data This requires that MPI_Send wait on delivery, or that MPI_Send return before transfer is complete, and we wait later.

More Related