1 / 19

Evaluating Parallel Programs

Evaluating Parallel Programs. Cluster Computing, UNC-Charlotte, B. Wilkinson. Sequential execution time , t s : Estimate by counting computational steps of best sequential algorithm.

aden
Télécharger la présentation

Evaluating Parallel Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating Parallel Programs Cluster Computing, UNC-Charlotte, B. Wilkinson.

  2. Sequential execution time, ts: Estimate by counting computational steps of best sequential algorithm. Parallel execution time, tp: In addition to number of computational steps, tcomp, need to estimate communication overhead, tcomm: tp = tcomp + tcomm

  3. Computational Time Count number of computational steps. When more than one process executed simultaneously, count computational steps of most complex process. Generally, function of n and p, i.e. tcomp = f (n, p) Often break down computation time into parts. Then tcomp = tcomp1 + tcomp2 + tcomp3 + … Analysis usually done assuming that all processors are same and operating at same speed.

  4. Communication Time Many factors, including network structure. As a first approximation, use tcomm = tstartup + ntdata tstartup -- startup time, essentially time to send a message with no data. Assumed to be constant. tdata -- transmission time to send one data word, also assumed constant, and there are n data words.

  5. Idealized Communication Time Star tup time The equation to compute the communication time ignore the fact that the source and destination may not be directly linked in a real system so that the message may pass through intermediate nodes. It is also assumed that the overhead incurred by including information other than data in the packet is constant and can be part of startup time. Number of data items ( n )

  6. Final communication time, tcomm Summation of communication times of all sequential messages from one process, i.e. tcomm = tcomm1 + tcomm2 + tcomm3 + … Communication patterns of all processes assumed same and take place together so that only one process need be considered. Both tstartup and tdata, measured in units of one computational step, so that can add tcomp and tcomm together to obtain parallel execution time, tp.

  7. Communication Time of Broadcast/Gather • If broadcast is done through single shared wire for Ethernet, the time complexity is O(1) for single data item and O(w) if w data items. • If binary tree is used as the underlying network structure and 1-to-N fan-out broadcast is used, then what about communication cost for p final destinations (leaf nodes) using w messages? • We assume the left and right child will receive the message from their parent in a sequential way. However, at each level, different parent nodes will send out the message at the same time.

  8. 1-to-N fan-out Broadcast • tcomm = 2 (log p) (tstartup + wtdata ) • It depends on number of levels and number of nodes at each level. • For a binary tree and p final destinations at the leave level.

  9. Benchmark Factors With ts, tcomp, and tcomm, can establish speedup factor and computation/communication ratio for a particular algorithm/implementation: Both functions of number of processors, p, and number of data elements, n.

  10. Factors give indication of scalability of parallel solution with increasing number of processors and problem size. Computation/communication ratio will highlight effect of communication with increasing problem size and system size. We wish to have dominant factor in computation instead of communication, as n increases, communication can be ignored and adding more processors can improve the performance.

  11. Example • Adding n numbers using two computers, each adding n/2 numbers each. • Numbers initially held in one computer. Computer 1 Computer 2 Send n/2 numbers tcomm = tstartup +(n/2)tdata Add up n/2 numbers tcomp = n/2 Send result back tcomm = tstartup + tdata tcomp = 1 Add partial sums

  12. Overall tcomm = 2tstartup +(n/2 + 1)tdata = O(n) tcomp = n/2 + 1 = O(n) Computation/Communication ratio = O(1)

  13. Another problem Computation time complexity = O(n2) Communication time complexity = O(n) Computation/Communication ratio = O(n)

  14. Cost Cost = (execution time ) x (number of processors) Cost of sequential computation = ts Cost of parallel computation = tp x p Cost-optimal algorithm When parallel computation cost is proportional to sequential computation: Cost = tp x p = k x ts k is a constant

  15. Example Suppose ts = O(n log n) for the best sequential program where n = number of data item p = number of processors For cost optimality if tp = O(n log n) / p = O(n/p log n) Not cost optimal if tp = O(n^2/p ) A parallel algorithm is cost-optimal if parallel time complexity times the number of processors equals the sequential time complexity.

  16. Evaluating programs • Measuring the execution time • Time-complexity analysis gives an insight into the parallel algorithm and is useful in comparing different algorithms. We want to know how the algorithm actually performs in a real system. • We can measure the elapsed time between two points in the code in seconds. • System calls, such as clock(), time(), or gettimeofday() or MPI_Wtime() • Example: L1: time(&t1); . . L2: time(&t2); elapsed_time = difftime(t2, t1);

  17. Communication Time by the Ping-Pong Method • Point-to-point communication time of a specific system can be found using the ping-pong method. • One process p0 sends a message to another process, say p1. Immediately upon receiving the message, p1 sends the message back to p0. The time is divided by two to obtain an estimate of the time of one-way communication. For example, at p0: time(&t1); send(&x, p1); recv(&x, p1); time(&t2); elapsed_time = 0.5* difftime(t2, t1);

  18. Profilling • A profile of a program is a histogram or graph showing the time spent on different part of the program. Showing the number of times certain source code are executed. • It can help to identify certain hot spot places in a program visited many times during the execution. These places could be optimized first.

  19. Program Profile Histogram Statement number of region of program

More Related