1 / 44

Online Computation of Critical Paths for Multithreaded Languages

Online Computation of Critical Paths for Multithreaded Languages. Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa University of Tokyo. Presentation Outline. What is a critical path? Background & Overview Our work Target language Critical path computation algorithm Instrumentation scheme

joella
Télécharger la présentation

Online Computation of Critical Paths for Multithreaded Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Computation ofCritical Paths for Multithreaded Languages Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa University of Tokyo HIPS 2000

  2. Presentation Outline • What is a critical path? • Background & Overview • Our work • Target language • Critical path computation algorithm • Instrumentation scheme • Experimental results • Related work HIPS 2000

  3. CP length: 31 What is a Critical Path (CP)? • The longest execution path • Nodes: sequential program parts • Edges: fork/sync points 2 4 1 3 2 3 1 6 7 2 5 8 HIPS 2000

  4. Benefits of Getting CPs(1/2) • CP info gives us • Performance upper bound = Exec. time lower bound = lim {exec. time}PE→∞ • Important parts in need of tuning HIPS 2000

  5. Benefits of Getting CPs(2/2) • CP info is useful for • Tuning • CP is short → Overhead should be reduced • Otherwise → CP should be shortened • Performance prediction • TP = T1 / P + T∞ (by Cilk group) • Exec. time is close to CP length → More processors: futile HIPS 2000

  6. Presentation Outline • What is a critical path? • Background & Overview • Our work • Target language • Critical path computation algorithm • Instrumentation scheme • Experimental results • Related work HIPS 2000

  7. This Work • Computing critical paths • Primary targets: • Multithreaded languages • Shared-memory machines • On-the-fly • Not using tracefiles • Source code instrumentation HIPS 2000

  8. Background(Shortcoming of Existing Work) • Cilk [Frigo et al. 98] • Provides online computation of CPs • Supports fork-join synchronization only • Unrealistic setting • Fork: zero cost • Join: zero cost HIPS 2000

  9. Contribution • Developed algorithm for computing CPs • It deals with languages with threads and synchronization via first-class data • Not limited to fork-join model • It takes fork / communication cost into account • It gives length of each subpath in a CP • Helps us “pinpoint” important program parts • Demonstrated its usefulness through experiments using SMP HIPS 2000

  10. CP Info Example • Displaying a sequence of all subpaths in a CP frame entry point frame exit point time ============================================================= main() --- move_mols(mols,100) 741 usec spawn 10 usec move_mols(mols,n) --- spawn move_one_mol(mols[i]) 39 usec spawn 10 usec move_one_mol(molp) --- return 4982 usec communication 15 usec v = recv(r) --- send(s, v*2) 128 usec communication 15 usec u = recv(s) --- die 1207 usec ============================================================= critical path length 7147 usec HIPS 2000

  11. Presentation Outline • What is a critical path? • Background & Overview • Our work • Target language • Critical path computation algorithm • Instrumentation scheme • Experimental results • Related work HIPS 2000

  12. send(r,8) th1 8 th2 v = recv(r) 8 Target Language • Sequential language(C, Scheme, …)+ Threadsspawn f(x1,…,xn)+ Channels • are first-class sync. media • can express locks, barriers,and monitors r HIPS 2000

  13. Beginning of Program End of Program Sample Program main() { spawn sum(r,vec); ... v = recv(r); ... die; } sum(r,vec) { ... ... send(r,ans); } HIPS 2000

  14. Presentation Outline • What is a critical path? • Background & Overview • Our work • Target language • Critical path computation algorithm • Instrumentation scheme • Experimental results • Related work HIPS 2000

  15. Behavior of Sample Program DAG-structured execution main spawn sum(r,vec) v=recv(r) die sum(r,vec) send(r,ans) Nodes: fork & sync. points Edges: inter-node dependencies HIPS 2000

  16. Three Kinds of Edges (Dependencies) • Arithmetic edges • Spawn edges • Communication edges main spawn sum(r,vec) v=recv(r) die 8 14 2 3 9 sum(r,vec) send(r,ans) 5 HIPS 2000

  17. thrown away CP Computation AlgorithmBasic Idea • DAG not constructed • Each thread keeps only the longest path up to the current program point Path1 main recv Path2 HIPS 2000

  18. Key Questions • How to determine edge values? • How to compute CP withoutconstructing DAG? • How to manage CP info? • How to keep the longest path? HIPS 2000

  19. Y Z Determining Edge Values • Computing the amount of time that elapsed after leaving the previous node t1=time() t2=time() t3=time() 8 6 X HIPS 2000

  20. 8 6 Y Z L2: L3: X L1: Extending CP withArithmetic Edge CP=({…},{…},{…}, {L1,L2,8}) CP=({…},{…},{…}, {L1,L2,8}, {L2,L3,6}) CP=({…},{…},{…}) CP=({…},{…},{…}, {L1,L2,8}) CP info = a sequence of edge info The amount of time in nodes: NOT accounted HIPS 2000

  21. spawn Y Z Extending CP withSpawn Edge X CP=({…},{…},{…}) CP=({…},{…},{…}) CP=({…},{…},{…}, {…,…,Cspawn }) HIPS 2000

  22. [v, CPsend] Extending CP withCommunication Edge CPsend=({…},{…}) Piggyback a sent value with CP send recv CPsend=({…},{…}) CPsend=({…},{…}, {…,…,Ccomm }) HIPS 2000

  23. [v, CPsend] Keeping the Longest Path(Throwing Shorter Paths Away) CPsend = … send CP=max( CPsend, CPrecv ) recv CPsend=({…},{…}, {…,…,Ccomm }) CPrecv = … HIPS 2000

  24. Presentation Outline • What is a critical path? • Background & Overview • Our work • Target language • Critical path computation algorithm • Instrumentation scheme • Experimental results • Related work HIPS 2000

  25. Instrumentation • Source-to-source transformation • Independent of the implementation details • Ex. management of activation frames • Instrumentation code is inserted into • Sends, recvs, spawns • Entry/exit points of functions HIPS 2000

  26. Transformation Rule Example l: v= recv(r); Compute CP up to recv Receive a value piggybacked with CP t= time() -et; [v, cp’] = recv(r); cp’’ =addCommEdge(cp’) if(t+length(cp) < length(cp’)){ cp = cp’ el = l; et=time(); } else { et= time() -t; } Extend CP with comm. edge Compare the two CPs Use the sender’s CP Use the receiver’s CP HIPS 2000

  27. Comm. edges may connect different pairs send send send send recv recv recv recv • The amounts of time for each part vary(e.g., cache effects) X Y X Y 28 5 Discussion (1/2)-- Nondeterminism -- • DAG shape varies between different runs HIPS 2000

  28. Discussion (2/2)-- What we Compute as CP -- • CP of a DAG created in an actual run • Programs may give different CPsin different runs • Other reasonable ways? HIPS 2000

  29. Presentation Outline • What is a critical path? • Background & Overview • Our work • Target language • Critical path computation algorithm • Instrumentation scheme • Experimental results • Related work HIPS 2000

  30. Experiments • Schematic: concurrent OO language [Taura et al. 96] • Sun Ultra Enterprise 10000 • UltraSPARC x 64 • Apps: • Prime • Natural Language Parser • Raytrace • Timer function:gethrtime() HIPS 2000

  31. Purpose of Experiments • Checking that execution timesget close to computed CPs • Identifying how large instrumentation overhead is HIPS 2000

  32. Raytrace We could predict the best performance by using only one processor HIPS 2000

  33. Prime Small (< 5%) difference between the actual execution time and the predicted execution time HIPS 2000

  34. Information Useful for Future Tuning of Prime • Gathering primes into a list → 95 % of CP • Dividing prime candidates by smaller primes → 5% of CP HIPS 2000

  35. Natural Language Parser HIPS 2000

  36. Information Useful for Future Tuning of NL Parser • Application of lexical rules → 4 % of CP • Application of production rules → 96% of CP HIPS 2000

  37. Instrumentation Overhead(Execution Time on One Processor) HIPS 2000

  38. Presentation Outline • What is a critical path? • Background & Overview • Our work • Target language • Critical path computation algorithm • Instrumentation scheme • Experimental results • Related work HIPS 2000

  39. Which function should we tune??? Related Work (1/2) • Cilk • Breakdown of CP not shown • CP info: not detailed enough for tuning % foo -nproc 10 20 result: 524288 Running time on 10 procs: 416.33 ms Total work = 3.94 s Critical path = 1.08 ms Parallelism = 2800.92 % HIPS 2000

  40. Related Work (2/2) • Paradyn [Hollingsworth 98] • Main target is message-passing programs • It does not display all subpaths in CP • Tracefile-based offline scheme(Dimemas [Pallas] etc.) • Tracefile contains the parameters and the timings of all communication operations • Required memory/storage is very large HIPS 2000

  41. Summary (1/2) • Scheme for online CP computation • Supports synchronization via first-class data • Piggybacking communicated values with CP info • Keeping the maximum of two paths in receives • Takes spawn/communication cost into account • Shows all subpaths in CP • Attaching subpath info in each CP update HIPS 2000

  42. Summary (2/2) • CP info we compute • Helps predict the MP performance • Small (< 10%) difference between • Actual execution time • Predicted execution time • Gives a useful guide to tuning • Prime: Tune list construction part! • Parser: Tune production rule application part! HIPS 2000

  43. Future Work • More precise performance prediction • Taking thread mapping into account • Adaptive optimization using CP info • Time-consuming optimizations are applied to the parts included in CP HIPS 2000

  44. Any Comments? HIPS 2000

More Related