1 / 31

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis. Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer. Microsoft Research University of Washington UC Berkeley. Motivation. Static analysis for program verification

Télécharger la présentation

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft Research University of Washington UC Berkeley

  2. Motivation • Static analysis for program verification • Complex dataflow analyses are popular • SLAM, ESP, BLAST, CQual, … • Flow-Sensitive • Interprocedural • Expensive! • Cut down on “data flow facts” • Without losing anything important

  3. General Idea • If complex analysis is worse than O(N) • And you have a cheap analysis that • Is O(N) • Reduces N • Then composing them saves time

  4. Value Flow Graph (VFG) • Variant of a points-to graph • Encodes the flow of values in the program • Conservative approximation • Lightweight, fast to compute and query • Early queries can safely reduce • data-flow facts considered • program points considered • Like slicing a program wrt. value flow

  5. Computing a VFG • Use a subtyping-based pointer analysis • We used One-Level Flow [Das] • Process all assignments • Not just those involving pointers • Represent constant values explicitly • Put them in the graph • Label graph with source locations • Encodes program slices

  6. Example Points-To Graph 1: int a, *x; 2: x = &a; 3: *x = 7; x Points-to Edge a Source “Address” Node x Expr Node

  7. One Level Flow Graph Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; a Source “Address” Node x Expr Node

  8. Value Flow Graph 2 Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; 2 7 a Source “Address” Node x Expr Node 3 2 2,3

  9. VFG Properties • Computed in almost-linear time • Get points-to sets from VFG in linear time • Backwards reachability via flow edges • Gather up all variables • Get value flow from VFG in linear time • Backwards reachability via flow edges • Follow points-to edges up one

  10. VFG Query: Points-To of x 2 Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; 2 7 a Source “Address” Node x Expr Node 3 2 2,3

  11. VFG Query: Value Flow into a 2 Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; 2 7 a Source “Address” Node x Expr Node 3 2 2,3

  12. VFG Summary • Computed in almost-linear time • Queries complete in linear time • Approximates flow of values in program • Show two applications that benefit • ESP • SLAM

  13. Application 1: ESP • Verification tool for large C++ programs • Tracks “typestate” of values • Encoded as Finite State Machine • Special Error state • Core: interprocedural data-flow engine • Flow sensitive: state at every point • Performed bottom-up on call graph • Requires function summaries

  14. ESP Function Summaries • Consider stateful memory locations • Summarize function behavior for each loc • Reducing number of locs would be good! • But C has evil casts, so types cannot be used • Worst case set of locations: • All globals and formal parameters • Everything transitively reachable from there

  15. Reduce Location Set • Location L needs to be considered in F if • Some exp E has its state changed in F • Value held by L at entry to F can flow into E • Assuming state-changing ops are known • Query VFG to find values that flow in

  16. ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } Locations to consider for foo() summary: { e, *e, f, *f, g, *g, h, *h }

  17. ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } • Compute VFG • (2) Query value flow on *p • (3) Reduced locations to consider for foo() summary: { e, f } • (4) Reduce lines to consider for dataflow

  18. ESP Results • FILE * output in GCC • 140 KLOC, 2149 functions, 66 files, 1068 globals • VFG Queries take 200 seconds • Reduce average number of locations per function summary from 1100 to <1 • Median of 15 for functions with >0 • Verification takes 15 minutes • Infeasible otherwise

  19. Application 2: SLAM • Validates temporal safety properties • Boolean abstraction • Interprocedural dataflow analysis • Counterexample-driven refinement • Convert C program to Boolean program • Exhaustive dataflow analysis • No errors? Program is safe. • Real error? Program has a bug. • False error? Add predicates, repeat.

  20. Boolean Programs int x,y; x = 5; y = 6; x = x * 2; y = y * 2; assert(x<y) bool p,q; p = 1; q = 1; p = 0; q = 0; q = 1; assert(q) p means “x == 5” q means “x < y” Predicates (important!) C Program Boolean Program

  21. SLAM Predicates • Hard to come up with good predicates • Counterexample-driven refinement • Picks good predicates • Is very slow • Taking all possible predicates • Is even slower • Want “all the useful” predicates

  22. Speeding Up SLAM • For a simple subset of C • Similar to “Copy Constants” • Use VFG to find a sufficient set of predicates • Provably sufficient for this subset • If this set fails to prove the real program • Fall back on counterexample-driven refinement

  23. A Simple Language s ::= vi = n // constants | vi = vj // variable copy | if (*) s1 else s2 // condition ignored | vi = fun(vj, …) // function call | return(vi) // function return | assert(vi» vj) // safety property

  24. Predicate Discovery • High-level idea • Each flow edge in the VFG means “values may flow from X to Y” • Add predicates to see if they do • For each assert(vi» vj) • Consider the chain of values flowing to vi, vj • Add an equality predicate for each link • Use constants to resolve scoping

  25. SLAM Example int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 2 c 4

  26. Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 Predicates: b == r r == 3 r == f f == a a == 1

  27. Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 Predicates: b == r r == 3 r == f f == a // no scope! a == 1

  28. Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 Predicates: b == r b == r r == 3 r == 3 r == f r == f f == a // no scope! f == 1 f == 3 a == 1 a == 1 a == 3

  29. Why does this work? • Simple language • No arithmetic, etc. • Just copying around initial values • Knowing final values of variables • Completely decides safety condition • Still related to real life • Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.

  30. Some SLAM Results Generated predicates are between all and two-thirds of the necessary predicates. However, since SLAM must iterate once to generate 3-7 missing predicates, the net performance increase is more than linear. Predicates can be specialized or simplified if the assert() condition is a common relational operator (e.g., x==y, x<y, x==5).

  31. Conclusions • Complex interprocedural analyses can benefit from inexpensive value-flow • VFG encodes value flow • Constructed and queried quickly • Prune the set of dataflow facts and program points considered • Large net performance increase

More Related