Control Flow Analysis for Functional and Object-Oriented Languages

Learn about function pointers, constraint-based program analysis, and techniques for control flow analysis in languages like C++, Java, and ML. Gain insight into the dynamic dispatch problem and explore the formal specifications of 0-CFA.

  1. Control Flow Analysis Mooly Sagiv http://www.math.tau.ac.il/~sagiv/courses/pa.html Tel Aviv University 640-6706 Sunday 18-21 Scrieber 8 Monday 10-12 Schrieber 317 Textbook Chapter 3(Simplified+OO)

  2. Goals • Understand the problem of Control Flow Analysis • in Functional Languages • In Object Oriented Languages • Function Pointers • Learn Constraint Based Program Analysis Technique • General • Usage for Control Flow Analysis • Algorithms • Systems • Similarities between Problems &Techniques

  3. Outline • A Motivating Example (OO) • The Control Flow Analysis Problem • A Formal Specification • Set Constraints • Solving Constraints • Adding Dataflow information • Adding Context Information • Back to the Motivating Example • Conclusions

  4. A Motivating Example class Vehicle Object { int position = 10; void move(x1 : int) { position = position + x1 ;}} class Car extends Vehicle { int passengers; void await(v : Vehicle) { if (v.position < position) then v.move(position - v.position); else self.move(10); }} class Truck extends Vehicle { void move(x2 : int) { if (x2 < 55) position = position + x2; }} void main { Car c; Truck t; Vehicle v1; new c; new t; v1 := c; c.passangers := 2; c.move(60); v1.move(70); c.await(t) ;}

  5. The Control Flow Analysis (CFA) Problem • Given a program in a functional programming language with higher order functions(functions can serve as parameters and return values) • Find out for each function invocation which functions may be applied • Obvious in C without function pointers • Difficult in C++, Java and ML • The Dynamic Dispatch Problem

  6. An ML Example let f = fn x => x 1 ; g = fn y => y + 2 ; h = fn z => z + 3; in (f g) + (f h)

  7. An ML Example let f = fn x => /* {g, h} */ x 1 ; g = fn y => y + 2 ; h = fn z => z + 3; in (f g) + (f h)

  8. The Language FUN • Notations • e  Exp // expressions (or labeled terms) • t  Term // terms (or unlabeled terms) • f, x  Var // variables • c  Const // Constants • op  Op // Binary operators • l  Lab // Labels • Abstract Syntax • e ::= tl • t ::= c | x | fn x  e // function definition | fun f x  e // recursive function definition | e1 e2 // function applications | if e0 then e1 else e2 | let x = e1 in e2 | e1 op e2

  9. A Simple Example ((fn x  x1)2 (fn y  y3)4)5

  10. An Example which Loops (let g = fun f x  (f1 (fn y  y2)3)4)5 (g6 (fn z  z7)8)9)10

  11. The 0-CFA Problem • Compute for every program a pair (C, ) where: • C is the abstract cache associating abstract values with labeled program points •  is the abstract environment associating abstract values with variables • Formally • v  Val = P(Term) // Abstract values •   Env = Var  Val // Abstract environment • C  Cache - Lab  Val // Abstract Cache • For function application (t1l1 t2l2)l C(l1) determine the function that can be applied • These maps are finite for a given program • No context is considered for parameters

  12. Possible Solutions for ((fn x  x1)2 (fn y  y3)4)5

  13. (let g = fun f x  (f1 (fn y  y2)3)4)5 (g6 (fn z  z7)8)9)10 Shorthand sf  fun f x  (f1 (fn y  y2)3)4 idy  fn y  y2 idz  fn z  z7 C(1) = {sf} C(2) = {} C(3) = {idy} C(4) = {} C(5) = {sf} C(6) = {sf} C(7) = {} C(8) = {idy} C(9) = {} C(10) = {} (x) = {idy , idy } (y) = {} (z) = {}

  14. Relationship to Dataflow Analysis • Expressions are side effect free • no entry/exit • A single environment • Represents information at different points via maps • A single value for all occurrences of a variable • Function applications act similar to assignments • “Definition” - Function abstraction is created • “Use” - Function is applied

  15. A Formal Specification of 0-CFA • A Boolean function  define when a solution is acceptable • (C, )  e means that (C, ) is acceptable for the expression e • Define  by structural induction on e • Every function is analyzed once • Every acceptable solution is sound (conservative) • Many acceptable solutions • Generate a set of constraints • Obtain the least acceptable solution by solving the constraints

  16. Syntax Directed 0-CFA(Simple Expressions) [const] (C, )  cl always [var] (C, )  xl if  (x)  C (l)

  17. Syntax Directed 0-CFAFunction Abstraction [fn] (C, )  (fn x  e)l if: (C, ) e fn x  e C(l) [fun] (C, )  (fun f x  e)l if: (C, ) e fun x  e C(l) fun x  e (f)

  18. Syntax Directed 0-CFAFunction Application [app] (C, )  (t1l1 t2l2)l if: (C, )  t1l1 (C, )  t2l2 for all fn x  t0l0 C(l): C (l2)   (x) C(l0)  C(l) for all fun x  t0l0 C(l): C (l2)   (x) C(l0)  C(l)

  19. Syntax Directed 0-CFAOther Constructs [if] (C, )  (if t0l0 then t1l1 else t2l2)l if: (C, )  t0l0 (C, )  t1l1 (C, )  t2l2 C(l1)  C(l) C(l2)  C(l) [let] (C, )  (let x = t1l1 in t2l2)l if: (C, )  t1l1 (C, )  t2l2 C(l1)   (x) C(l2)  C(l) [op] (C, )  (t1l1 op t2l2)l if: (C, )  t1l1 (C, )  t2l2

  20. Possible Solutions for ((fn x  x1)2 (fn y  y3)4)5

  21. Set Constraints • A set of rules of the form: • lhs  rhs • {t}  rhs’  lhs rhs (conditional constraint) • lhs, rhs, rhs’ are • terms • C(l) • (x) • The least solution (C, ) can be found iterativelly • start with empty sets • add terms when needed • Efficient cubic graph based solution

  22. Syntax Directed Constraint Generation (Part I) C* cl  = {} C* xl  = { (x)  C (l)} C* (fn x  e)l = C*  e   { {fn x  e} C(l)} C* (fun x  e)l = C*  e   { {fun x  e}C(l)}  {{fun x  e}(f)} C*  (t1l1 t2l2)l = C*  t1l1   C* t2l2  {{t} C(l)  C (l2)   (x) | t=fn x  t0l0 Term*}  {{t} C(l)  C (l0)  C (l) | t=fn x  t0l0 Term*} {{t} C(l)  C (l2)   (x) | t=fun x  t0l0 Term*}  {{t} C(l)  C (l0)  C (l) | t=fun x  t0l0 Term*}

  23. Syntax Directed Constraint Generation (Part II) C* (if t0l0 then t1l1 else t2l2)l = C*  t0l0   C*  t1l1   C* t2l2  {C(l1)  C (l)}  {C(l2)  C (l)} C*(let x = t1l1 in t2l2)l  =C*  t1l1   C* t2l2  {C(l1)   (x)}  {C(l2)  C(l)} C* (t1l1 op t2l2)l  =C*  t1l1   C* t2l2

  24. Set Constraints for ((fn x  x1)2 (fn y  y3)4)5

  25. Iterative Solution to the Set Constraints for ((fn x  x1)2 (fn y  y3)4)5

  26. Adding Data Flow Information • Dataflow values can affect control flow analysis • Example(let f = (fn x (if (x1 > 02)3 then (fn y  y4)5 else (fn z  56)7)8)9in ((f10 311)12 013)14)15

  27. Adding Data Flow Information • Add a finite set of “abstract” values per program Data • Update Val = P(TermData) •   Env = Var  Val // Abstract environment • C  Cache - Lab  Val // Abstract Cache • Generate extra constraints for data • Obtained a more precise solution • A special of case of product domain (4.4) • The combination of two analyses may be more precise than both • For some programs may even be more efficient

  28. Adding Dataflow Information (Sign Analysis) • Sign analysis • Add a finite set of “abstract” values per program Data = {P, N, TT, FF} • Update Val = P(TermData) • dc is the abstract value that represents a constant c • d3 = {p} • d-7= {n} • dtrue= {tt} • dfalse= {ff} • Every operator is conservatively interpreted

  29. Syntax Directed Constraint Generation (Part I) C* cl  = dc C (l)} C* xl  = { (x)  C (l)} C* (fn x  e)l = C*  e   { {fn x  e} C(l)} C* (fun x  e)l = C*  e   { {fun x  e}C(l)}  {{fun x  e}(f)} C*  (t1l1 t2l2)l = C*  t1l1   C* t2l2  {{t} C(l)  C (l2)   (x) | t=fn x  t0l0 Term*}  {{t} C(l)  C (l0)  C (l) | t=fn x  t0l0 Term*} {{t} C(l)  C (l2)   (x) | t=fun x  t0l0 Term*}  {{t} C(l)  C (l0)  C (l) | t=fun x  t0l0 Term*}

  30. Syntax Directed Constraint Generation (Part II) C* (if t0l0 then t1l1 else t2l2)l = C*  t0l0   C*  t1l1   C* t2l2  {dt C (l0)  C(l1)  C (l)}  {df C (l0) C(l2)  C (l)} C*(let x = t1l1 in t2l2)l  =C*  t1l1   C* t2l2  {C(l1)   (x)}  {C(l2)  C(l)} C* (t1l1 op t2l2)l  =C*  t1l1   C* t2l2  {C(l1) op C(l2)  C(l)}

  31. Adding Context Information • The analysis does not distinguish between different occurrences of a variable(Monovariant analysis) • Example(let f = (fn x x1) 2 in ((f3 f4)5 (fn y y6) 7)8)9 • Source to source can help (but may lead to code explosion) • Example rewrittenlet f1 = fn x1 x1in letf2 = fn x2  x2 in (f1 f2) (fn y y)

  32. Simplified K-CFA • Records the last k dynamic calls (for some fixed k) • Similar to the call string approach • Remember the context in which expression is evaluated • Val is now P(Term)Contexts •   Env = Var Contexts Val • C  Cache - LabContexts Val

  33. 1-CFA • (let f = (fn x x1) 2 in ((f3 f4)5 (fn y y6) 7)8)9 • Contexts • [] - The empty context • [5] The application at label 5 • [8] The application at label 8 • Polyvariant Control FlowC(1, [5]) =  (x, 5)= C(2, []) = C(3, []) =  (f, []) = ({(fn x x1)}, [])C(1, [8]) =  (x, 8)= C(7, []) = C(8, []) = C(9, []) = ({(fn y y6)}, [])

  34. The Motivating Example class Vehicle Object { int position = 10; void move(x1 : int) { position = position + x1 ;}} class Car extends Vehicle { int passengers; void await(v : Vehicle) { if (v.position < position) then v.move(position - v.position); else self.move(10); }} class Truck extends Vehicle { void move(x2 : int) { if (x2 < 55) position = position + x2; }} void main { Car c; Truck t; Vehicle v1; new c; new t; v1 := c; c.passangers := 2; c.move(60); v1.move(70); c.await(t) ;}

  35. Missing Material • Efficient Cubic Solution to Set Constraints www.cs.berkeley.edu/Research/Aiken/bane.html • Experimental results for OO www.cs.washington.edu/research/projects/cecil • Operational Semantics for FUN (3.2.1) • Defining acceptability without structural induction • More precise treatment of termination (3.2.2) • Needs Co-Induction (greatest fixed point) • Using general lattices as Dataflow values instead of powersets (3.5.2) • Lower-bounds • Decidability of JOP • Polynomiality

  36. Conclusions • Set constraints are quite useful • A Uniform syntax • Can even deal with pointers • But semantic foundation is still based on abstract interpretation • Techniques used in functional and imperative (OO) programming are similar • Control and data flow analysis are related

