1 / 110

Analyses and Optimizations for Multithreaded Programs

Analyses and Optimizations for Multithreaded Programs. John Whaley IBM Tokyo Research Laboratory. Martin Rinard, Alex Salcianu, Brian Demsky MIT Laboratory for Computer Science. Motivation. Threads are Ubiquitous Parallel Programming for Performance Manage Multiple Connections

clint
Télécharger la présentation

Analyses and Optimizations for Multithreaded Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyses and Optimizations for Multithreaded Programs John Whaley IBM Tokyo Research Laboratory Martin Rinard, Alex Salcianu, Brian Demsky MIT Laboratory for Computer Science

  2. Motivation • Threads are Ubiquitous • Parallel Programming for Performance • Manage Multiple Connections • System Structuring Mechanism • Overhead • Thread Management • Synchronization • Opportunities • Improved Memory Management

  3. What This Talk is About • New Abstraction: Parallel Interaction Graph • Points-To Information • Reachability and Escape Information • Interaction Information • Caller-Callee Interactions • Starter-Startee Interactions • Action Ordering Information • Analysis Algorithm • Analysis Uses (synchronization elimination, stack allocation, per-thread heap allocation)

  4. Outline • Example • Analysis Representation and Algorithm • Lightweight Threads • Results • Conclusion

  5. Sum Sequence of Numbers 9 8 1 5 3 7 2 6

  6. 1 5 3 7 2 6 9 8 Group in Subsequences

  7. 1 5 3 7 2 6 9 8 + + + + 10 17 8 6 Sum Subsequences (in Parallel)

  8. 1 5 3 7 2 6 9 8 + + + + 17 10 8 6 Add Sums Into Accumulator Accumulator 0

  9. 1 5 3 7 2 6 9 8 + + + + 17 10 8 6 Add Sums Into Accumulator Accumulator 17

  10. 1 5 3 7 2 6 9 8 + + + + 17 10 8 6 Add Sums Into Accumulator Accumulator 23

  11. 1 5 3 7 2 6 9 8 + + + + 17 10 8 6 Add Sums Into Accumulator Accumulator 33

  12. 1 5 3 7 2 6 9 8 + + + + 17 10 8 6 Add Sums Into Accumulator Accumulator 41

  13. Common Schema • Set of tasks • Chunk tasks to increase granularity • Tasks have both • Independent computation • Updates to shared data

  14. Realization in Java class Accumulator { int value = 0; synchronized void add(int v) { value += v; } }

  15. 0 2 6 Realization in Java class Task extends Thread { Vector work; Accumulator dest; Task(Vector w, Accumulator d) { work = w; dest = d; } public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } } Task work dest Vector Accumulator

  16. 0 2 6 Realization in Java class Task extends Thread { Vector work; Accumulator dest; Task(Vector w, Accumulator d) { work = w; dest = d; } public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } } Enumeration Task work dest Vector Accumulator

  17. Realization in Java void generateTask(int l, int u, Accumulator a) { Vector v = new Vector(); for (int j = l; j < u; j++) v.addElement(new Integer(j)); Task t = new Task(v,a); t.start(); } void generate(int n, int m, Accumulator a) { for (int i = 0; i < n; i ++) generateTask(i*m, i*(m+1), a); }

  18. Task Generation Accumulator 0

  19. Task Generation Accumulator 0 Vector

  20. 2 Task Generation Accumulator 0 Vector

  21. 2 6 Task Generation Accumulator 0 Vector

  22. 2 6 Task Generation Task work dest Accumulator 0 Vector

  23. 2 8 6 9 Task Generation Task work dest Accumulator 0 Vector Vector

  24. 2 8 6 9 Task Generation Task work dest Accumulator 0 Vector dest work Task Vector

  25. 1 2 8 6 5 9 Task Generation Task work dest Accumulator 0 Vector dest dest Task work work Task Vector Vector

  26. Analysis

  27. Analysis Overview • Interprocedural • Interthread • Flow-sensitive • Statement ordering within thread • Action ordering between threads • Compositional, Bottom Up • Explicitly Represent Potential Interactions Between Analyzed and Unanalyzed Parts • Partial Program Analysis

  28. Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Abstraction: Points-to Graph • Nodes Represent Objects • Edges Represent References work dest Vector Accumulator

  29. Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Inside Nodes • Objects Created Within Current Analysis Scope • One Inside Node Per Allocation Site • Represents All Objects Created At That Site work dest Vector Accumulator

  30. Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Outside Nodes • Objects Created Outside Current Analysis Scope • Objects Accessed Via References Created Outside Current Analysis Scope work dest Vector Accumulator

  31. Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Outside Nodes • One per Static Class Field • One per Parameter • One per Load Statement • Represents Objects Loaded at That Statement work dest Vector Accumulator

  32. Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Inside Edges • References Created Inside Current Analysis Scope work dest Vector Accumulator

  33. Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Outside Edges • References Created Outside Current Analysis Scope • Potential Interactions in Which Analyzed Part Reads Reference Created in Unanalyzed Part work dest Vector Accumulator

  34. Concept of Escaped Node • Escaped Nodes Represent Objects Accessible Outside Current Analysis Scope • parameter nodes, load nodes • static class field nodes • nodes passed to unanalyzed methods • nodes reachable from unanalyzed but started threads • nodes reachable from escaped nodes • Node is Captured if it is Not Escaped

  35. Why Escaped Concept is Important • Completeness of Analysis Information • Complete information for captured nodes • Potentially incomplete for escaped nodes • Lifetime Implications • Captured nodes are inaccessible when analyzed part of the program terminates • Memory Management Optimizations • Stack allocation • Per-Thread Heap Allocation

  36. Intrathread Dataflow Analysis • Computes a points-to escape graph for each program point • Points-to escape graph is a pair <I,O,e> • I - set of inside edges • O - set of outside edges • e - escape information for each node

  37. Dataflow Analysis • Initial state: I : formals point to parameter nodes, classes point to class nodes O: Ø • Transfer functions: I´ = (I – KillI) U GenI O´ = O U GenO • Confluence operator is U

  38. Intraprocedural Analysis • Must define transfer functions for: • copy statement l = v • load statement l1 = l2.f • store statement l1.f = l2 • return statement return l • object creation site l = new cl • method invocation l = l0.op(l1…lk)

  39. copy statement l = v KillI= edges(I, l) GenI= {l} × succ(I, v) I´ = (I – KillI) U GenI Existing edges l v

  40. copy statement l = v KillI= edges(I, l) GenI= {l} × succ(I, v) I´ = (I – KillI) U GenI Generated edges l v

  41. load statement l1 = l2.f SE= {n2 in succ(I, l2) . escaped(n2)} SI= U{succ(I, n2, f) . n2 in succ(I, l2)} case 1: l2 does not point to an escaped node (SE= Ø) KillI= edges(I, l1) GenI= {l1} × SI Existing edges l1 f l2

  42. load statement l1 = l2.f SE= {n2 in succ(I, l2) . escaped(n2)} SI= U{succ(I, n2, f) . n2 in succ(I, l2)} case 1: l2 does not point to an escaped node (SE= Ø) KillI= edges(I, l1) GenI= {l1} × SI Generated edges l1 f l2

  43. load statement l1 = l2.f case 2: l2 does point to an escaped node (not SE=Ø) KillI= edges(I, l1) GenI= {l1} × (SIU {n}) GenO= (SE× {f}) × {n} where n is the load node for l1 = l2.f Existing edges l1 l2

  44. load statement l1 = l2.f case 2: l2 does point to an escaped node (not SE=Ø) KillI= edges(I, l1) GenI= {l1} × (SIU {n}) GenO= (SE× {f}) × {n} where n is the load node for l1 = l2.f Generated edges l1 n f l2

  45. store statement l1.f = l2 GenI= (succ(I, l1) × {f}) × succ(I, l2) I´ = I U GenI Existing edges l1 l2

  46. store statement l1.f = l2 GenI= (succ(I, l1) × {f}) × succ(I, l2) I´ = I U GenI Generated edges l1 f l2

  47. object creation site l = new cl KillI= edges(I, l) GenI= {<l, n>} where n is inside node for l = new cl Existing edges l

  48. object creation site l = new cl KillI= edges(I, l) GenI= {<l, n>} where n is inside node for l = new cl Generated edges n l

  49. Method Call • Analysis of a method call: • Start with points-to escape graph before the call site • Retrieve the points-to escape graph from analysis of callee • Map outside nodes of callee graph to nodes of caller graph • Combine callee graph into caller graph • Result is the points-to escape graph after the call site

  50. a t v Start With Graph Before Call Points-to Escape Graph before call to t = new Task(v,a)

More Related