Analyses and Optimizations for Multithreaded Programs

Analyses and Optimizations for Multithreaded Programs John Whaley IBM Tokyo Research Laboratory Martin Rinard, Alex Salcianu, Brian Demsky MIT Laboratory for Computer Science

Motivation • Threads are Ubiquitous • Parallel Programming for Performance • Manage Multiple Connections • System Structuring Mechanism • Overhead • Thread Management • Synchronization • Opportunities • Improved Memory Management

What This Talk is About • New Abstraction: Parallel Interaction Graph • Points-To Information • Reachability and Escape Information • Interaction Information • Caller-Callee Interactions • Starter-Startee Interactions • Action Ordering Information • Analysis Algorithm • Analysis Uses (synchronization elimination, stack allocation, per-thread heap allocation)

Outline • Example • Analysis Representation and Algorithm • Lightweight Threads • Results • Conclusion

Sum Sequence of Numbers 9 8 1 5 3 7 2 6

1 5 3 7 2 6 9 8 Group in Subsequences

1 5 3 7 2 6 9 8 + + + + 10 17 8 6 Sum Subsequences (in Parallel)

1 5 3 7 2 6 9 8 + + + + 17 10 8 6 Add Sums Into Accumulator Accumulator 0

Common Schema • Set of tasks • Chunk tasks to increase granularity • Tasks have both • Independent computation • Updates to shared data

Realization in Java class Accumulator { int value = 0; synchronized void add(int v) { value += v; } }

0 2 6 Realization in Java class Task extends Thread { Vector work; Accumulator dest; Task(Vector w, Accumulator d) { work = w; dest = d; } public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } } Task work dest Vector Accumulator

0 2 6 Realization in Java class Task extends Thread { Vector work; Accumulator dest; Task(Vector w, Accumulator d) { work = w; dest = d; } public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } } Enumeration Task work dest Vector Accumulator

Realization in Java void generateTask(int l, int u, Accumulator a) { Vector v = new Vector(); for (int j = l; j < u; j++) v.addElement(new Integer(j)); Task t = new Task(v,a); t.start(); } void generate(int n, int m, Accumulator a) { for (int i = 0; i < n; i ++) generateTask(i*m, i*(m+1), a); }

Task Generation Accumulator 0

Task Generation Accumulator 0 Vector

2 Task Generation Accumulator 0 Vector

2 6 Task Generation Accumulator 0 Vector

2 6 Task Generation Task work dest Accumulator 0 Vector

2 8 6 9 Task Generation Task work dest Accumulator 0 Vector Vector

2 8 6 9 Task Generation Task work dest Accumulator 0 Vector dest work Task Vector

1 2 8 6 5 9 Task Generation Task work dest Accumulator 0 Vector dest dest Task work work Task Vector Vector

Analysis

Analysis Overview • Interprocedural • Interthread • Flow-sensitive • Statement ordering within thread • Action ordering between threads • Compositional, Bottom Up • Explicitly Represent Potential Interactions Between Analyzed and Unanalyzed Parts • Partial Program Analysis

Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Abstraction: Points-to Graph • Nodes Represent Objects • Edges Represent References work dest Vector Accumulator

Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Inside Nodes • Objects Created Within Current Analysis Scope • One Inside Node Per Allocation Site • Represents All Objects Created At That Site work dest Vector Accumulator

Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Outside Nodes • Objects Created Outside Current Analysis Scope • Objects Accessed Via References Created Outside Current Analysis Scope work dest Vector Accumulator

Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Outside Nodes • One per Static Class Field • One per Parameter • One per Load Statement • Represents Objects Loaded at That Statement work dest Vector Accumulator

Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Inside Edges • References Created Inside Current Analysis Scope work dest Vector Accumulator

Analysis Result for run Method public void run() { int sum = 0; Enumeration e = work.elements(); while (e.hasMoreElements()) sum += ((Integer) e.nextElement()).intValue(); dest.add(sum); } this Enumeration Task • Outside Edges • References Created Outside Current Analysis Scope • Potential Interactions in Which Analyzed Part Reads Reference Created in Unanalyzed Part work dest Vector Accumulator

Concept of Escaped Node • Escaped Nodes Represent Objects Accessible Outside Current Analysis Scope • parameter nodes, load nodes • static class field nodes • nodes passed to unanalyzed methods • nodes reachable from unanalyzed but started threads • nodes reachable from escaped nodes • Node is Captured if it is Not Escaped

Why Escaped Concept is Important • Completeness of Analysis Information • Complete information for captured nodes • Potentially incomplete for escaped nodes • Lifetime Implications • Captured nodes are inaccessible when analyzed part of the program terminates • Memory Management Optimizations • Stack allocation • Per-Thread Heap Allocation

Intrathread Dataflow Analysis • Computes a points-to escape graph for each program point • Points-to escape graph is a pair <I,O,e> • I - set of inside edges • O - set of outside edges • e - escape information for each node

Dataflow Analysis • Initial state: I : formals point to parameter nodes, classes point to class nodes O: Ø • Transfer functions: I´ = (I – KillI) U GenI O´ = O U GenO • Confluence operator is U

Intraprocedural Analysis • Must define transfer functions for: • copy statement l = v • load statement l1 = l2.f • store statement l1.f = l2 • return statement return l • object creation site l = new cl • method invocation l = l0.op(l1…lk)

copy statement l = v KillI= edges(I, l) GenI= {l} × succ(I, v) I´ = (I – KillI) U GenI Existing edges l v

copy statement l = v KillI= edges(I, l) GenI= {l} × succ(I, v) I´ = (I – KillI) U GenI Generated edges l v

load statement l1 = l2.f SE= {n2 in succ(I, l2) . escaped(n2)} SI= U{succ(I, n2, f) . n2 in succ(I, l2)} case 1: l2 does not point to an escaped node (SE= Ø) KillI= edges(I, l1) GenI= {l1} × SI Existing edges l1 f l2

load statement l1 = l2.f SE= {n2 in succ(I, l2) . escaped(n2)} SI= U{succ(I, n2, f) . n2 in succ(I, l2)} case 1: l2 does not point to an escaped node (SE= Ø) KillI= edges(I, l1) GenI= {l1} × SI Generated edges l1 f l2

load statement l1 = l2.f case 2: l2 does point to an escaped node (not SE=Ø) KillI= edges(I, l1) GenI= {l1} × (SIU {n}) GenO= (SE× {f}) × {n} where n is the load node for l1 = l2.f Existing edges l1 l2

load statement l1 = l2.f case 2: l2 does point to an escaped node (not SE=Ø) KillI= edges(I, l1) GenI= {l1} × (SIU {n}) GenO= (SE× {f}) × {n} where n is the load node for l1 = l2.f Generated edges l1 n f l2

store statement l1.f = l2 GenI= (succ(I, l1) × {f}) × succ(I, l2) I´ = I U GenI Existing edges l1 l2

store statement l1.f = l2 GenI= (succ(I, l1) × {f}) × succ(I, l2) I´ = I U GenI Generated edges l1 f l2

object creation site l = new cl KillI= edges(I, l) GenI= {<l, n>} where n is inside node for l = new cl Existing edges l

object creation site l = new cl KillI= edges(I, l) GenI= {<l, n>} where n is inside node for l = new cl Generated edges n l

Method Call • Analysis of a method call: • Start with points-to escape graph before the call site • Retrieve the points-to escape graph from analysis of callee • Map outside nodes of callee graph to nodes of caller graph • Combine callee graph into caller graph • Result is the points-to escape graph after the call site

a t v Start With Graph Before Call Points-to Escape Graph before call to t = new Task(v,a)

Analyses and Optimizations for Multithreaded Programs

Analyses and Optimizations for Multithreaded Programs

Presentation Transcript

Analysis of Multithreaded Programs

Multithreaded and Distributed Programming – How Distributed Programs Communicate

Pointer Analysis for Multithreaded Programs

Multithreaded and Distributed Programming – How Distributed Programs Communicate

Pointer and Escape Analysis for Multithreaded Programs

Structure-driven Optimizations for Amorphous Data-parallel Programs

A Structure Layout Optimization for Multithreaded Programs

Eraser: A dynamic data race detector for multithreaded programs

Optimizations for Faster Execution of Esterel Programs

Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs

Iterative Context Bounding for Systematic Testing of Multithreaded Programs

A Modular Checker for Multithreaded Programs

Pointer and Escape Analysis for (Multithreaded) Programs

Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs

Runtime Safety Analysis of Multithreaded Programs

Communication Optimizations in Titanium Programs

Interprocedural analyses and optimizations

Runtime Safety Analysis of Multithreaded Programs

Optimizations for Faster Simulation of Esterel Programs