930 likes | 1.06k Vues
This lecture, part of the Spring 2014 program on Analysis and Verification, delves into abstract interpretation techniques essential for static analysis. Key topics include semantic domains, posets, complete lattices, and the construction of new lattices from existing ones. The discussion encompasses join and meet operators, disjunctive completions, and relational products of lattices. Practical applications are demonstrated through a software package built on the Soot framework, illustrating the implementation of variable equalities analysis and the solving of monotone systems to enhance understanding of static analysis in programming.
E N D
Spring 2014Program Analysis and Verification Lecture 10: Abstract Interpretation II Roman Manevich Ben-Gurion University
Previously • Semantic domains • Preorders • Partial orders (posets) • Pointed posets • Ascending/descending chains • The height of a poset • Join and Meet operators • Complete lattices • Constructing new lattices from old • Abstract Interpretation package – domains
A taxonomy of semantic domain types Join/Meet exist for every subset of D Join/Meet exist for every finite subset of D (alternatively, binary join/meet) Complete Lattice(D, , , , , ) Lattice(D, , , , , ) Meet of the empty set Join of the empty set Join semilattice(D, , , ) Meet semilattice(D, , , ) poset with LUB for all ascending chains Complete partial order (CPO)(D, , ) reflexivetransitiveanti-symmetric: d d’ and d’ d implies d = d’ Partial order (poset)(D, ) • reflexive: d dtransitive: d d’, d’ d’’ implies d d’’ Preorder(D, )
Cartesian product of complete lattices • For two complete lattices L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) • Define the posetLcart = (D1D2, cart, cart, cart, cart, cart)as follows: • (x1, x2) cart (y1, y2) iffx1 1 y1 andx2 2 y2 • cart = ? cart = ? cart = ? cart = ? • Lemma: L is a complete lattice • Define the Cartesian constructor Lcart = Cart(L1, L2)
Disjunctive completion • For a complete lattice L = (D, , , , , ) • Define the powerset latticeL = (2D, , , , , ) = ? = ? = ? = ? = ? • Lemma: L is a complete lattice • L contains all subsets of D, which can be thought of as disjunctions of the corresponding predicates • Define the disjunctive completion constructorL = Disj(L)
Relational product of lattices • L1 = (D1, 1, 1, 1, 1, 1)L2 = (D2, 2, 2, 2, 2, 2) • Lrel = (2D1D2, rel, rel, rel, rel, rel)as follows: • Lrel = Disj(Cart(L1, L2)) • Lemma: L is a complete lattice
Finite maps • For a complete latticeL = (D, , , , , )and finite set V • Define the posetLVL = (VD, VL, VL, VL, VL, VL)as follows: • f1 VLf2iff for all vVf1(v) f2(v) • VL = ? VL = ? VL = ? VL = ? • Lemma: L is a complete lattice • Define the map constructor LVL = Map(V, L)
The collecting lattice Lattice for a given control-flow node v: Lv=(2State, , , , , State) Lattice for entire control-flow graph with nodes V: LCFG = Map(V, Lv) We will use this lattice as a baseline for static analysis and define abstractions of its elements
Software package: paver142 • Built on top of the Soot compiler framework for Java • Download from web-site • Includes all necessary Soot jar files
Example analyses Soot-specific utilities Infrastructurefor implementingstatic analysis
Today Solving monotone systems Fixed-points Vanilla static analysis algorithm Chaotic iteration
Abstract interpretation via abstraction generalizes axiomatic verification statement S abstract semantics abstract representationof sets of states abstract representationof sets of states abstract representationof sets of states abstraction abstraction statement S collecting semantics set of states set of states {P} S {Q} sp(S, P)
Abstract interpretation via concretization abstract representationof sets of states abstract representationof sets of states statement S abstract semantics concretization concretization set of states set of states set of states statement S collecting semantics models(P) {P} models(sp(S, P)) S models(Q) {Q}
Missing knowledge Collecting semantics Abstract semantics Connection between collecting semantics and abstract semantics Algorithm to compute abstract semantics
The collecting lattice (sets of states) Lattice for a given control-flow node v: Lv=(2State, , , , , State) Lattice for entire control-flow graph with nodes V: LCFG = Map(V, Lv) We will use this lattice as a baseline for static analysis and define abstractions of its elements
Collecting semantics as equation system Semantic function for assume x>0 Semantic function for x:=x-1 lifted to sets of states entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1 A vector of variables R[0, 1, 2, 3, 4] R[0] = {xZ} // established inputR[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2] A (recursive) system of equations
General definition entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1 • A vector of variables R[0, …, k] one per input/output of a node • R[0] is for entry • For node n with multiple predecessors add equationR[n] = {R[k] | k is a predecessor of n} • For an atomic operation node R[m] S R[n] add equationR[n] = S R[m] • Transform if bthenS1elseS2to (assumeb; S1) or (assumeb; S2)
Static analysis • R[0] = {xZ} // established input • R[1] = R[0] R[4] • R[2] = assume x>0 R[1] • R[3] = assume x0 R[1] • R[4] = x:=x-1 R[2] • R[0]# = {xZ}# • R[1]# = R[0] R[4] • R[2]# = assume x>0#R[1] • R[3]# = assume x0#R[1] • R[4]# = x:=x-1#R[2] • Given a system of equationsfor the collecting semanticsA static analysis solves a corresponding system of equations over an abstract domain • Questions: • What is the relation between the solutions?Next lecture • How do you solve the second system? This lecture
Equation systems in general For R[i]=f[i] R Usually f[i] reads only a small subset of R – D[i]. We say that R[i] depends on D[i] • R[0] = {xZ} // established input • R[1] = R[0] R[4] • R[2] = R[1] {s | s(x) > 0} • R[3] = R[1] {s | s(x) 0} • R[4] = x:=x-1 R[2] • Let L be a complete lattice (D, , , , , ) • Let R be a vector of analysis variables R[0, …, n] D… D • Let F be a vector of functions of the type F[i] : R[0, …, n] R[0, …, n] • A system of equationsR[0] = f[0](R[0], …, R[n])…R[n] = f[n](R[0], …, R[n]) • In vector notation R = F(R) • Questions: • Does a solution always exist? • If so, is it unique? • If so, is it computable?
Equation systems in general If it does – it is a fixed point of this equation • Let L be a complete lattice (D, , , , , ) • Let R be a vector of analysis variables R[0, …, n] D… D • Let F be a vector of functions of the type F[i] : R[0, …, n] R[0, …, n] • A system of equationsR[0] = f[0](R[0], …, R[n])…R[n] = f[n](R[0], …, R[n]) • In vector notation R = F(R) • Questions: • Does a solution always exist? • If so, is it unique? • If so, is it computable?
Monotone functions Let L1=(D1, ) and L2=(D2, ) be two posets A function f : D1D2 is monotone if for every pair x, y D1x y implies f(x) f(y) A special case: L1=L2=(D, ) f : DD
Monotone function L1 L2 f f y f(y) f(x) 2 3 4 x 1
Important cases of monotonicity • Join: f(X, Y) = X Y is monotone in each operand • Prove it! • Set lifting function: for a set X and any function gF(X) = { g(x) | x X } is monotone w.r.t. • Prove it! • Notice that the collecting semantics function is defined in terms of • Join (set union) • Semantic function for atomic statements lifted to sets of states • Conclusion: collecting semantics function is monotone
Extensive/reductive functions Let L=(D, ) be a poset A function f : DD is extensiveif for every x D, we have that x f(x) A function f : DD is reductiveif for every x D, we have that x f(x)
Fixed points Red(f) gfp Fix(f) lfp Ext(f) fn() • Does a solution always exist? Yes • If so, is it unique? No, but it has least/greatest solutions • If so, is it computable? Under some conditions… • L = (D, , , , , ) • f : DDmonotone • Fix(f) = { d | f(d) = d } • Red(f) = { d | f(d) d } • Ext(f) = { d | d f(d) } • Theorem [Tarski 1955] • lfp(f) = Fix(f) = Red(f) Fix(f) • gfp(f) = Fix(f) = Ext(f) Fix(f)
Fixed point example F(d) : Fixed point d xZ xZ 0 0 = entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x0} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2]
Pre-fixed point example F(d) : pre-fixed point d xZ xZ 0 0 entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x<-5} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2]
Post-fixed point example F(d) : post-fixed point d xZ xZ 0 0 entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x<9} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2]
Recap • A system of equations of the form R=F(R) where R draws its elements from a complete latticeL= (D, , , , , ) • Tarski’s fixed point theorem ensures us that there exists a least fixed point: lfp(f) = Fix(f) • However, it is not an algorithm since D is often infinite • Ineffective when D is finite • We need a more constructive way of computing lfp(f)
Continuous functions • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • A function f is continuous if for every increasing chain Y D*, f(Y) = { f(y) | yY} • Lemma: if f is continuous then f is monotone • Proof: assume x yTherefore xy=yThen f(y) = f(xy) = f(x) f(y), which means f(x) f(y)
Kleene’s fixed point theorem • Let L = (D, , , ) be a complete partial order and a continuous function f: DD thenlfp(f) = nNfn() • That is, take the ascending chain f() f(f()) … fn() …and return the supremum • Why is this an ascending chain? • But how do you know if a function f is continuous
Continuity and ACC condition • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes:d0 d1 … dn = dn+1 = dn+2 = … • Lemma: Monotone functions on posets satisfying ACC are continuousProof:We need to show thatf(Y) = { f(y) | yY } • Every ascending chain Y eventually stabilizes d0 d1 … dn = dn+1 = … hence dn is the least upper bound of {d0, d1, … , dn},thus f(Y) = f(dn) • From monotonicity of f we get thatf(d0) f(d1) … f(dn) = f(dn+1) = … Hence f(dn) is the least upper bound of {f(d0), f(d1), … , f(dn)},thus { f(y) | yY } = f(dn)
Resulting algorithm Mathematical definition lfp(f) = nNfn() lfp fn() Algorithm d := whilef(d) ddod := f(d)returnd … f2() f() Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone
Vanilla algorithm Non-incremental. Most variables don’t change. Problem Definition: • Lattice of properties L of finite height (ACC) • For each statement define a monotone transformer Preparation: • Parse program into AST • Convert AST into CFG • Generate system of equations from CFG Analysis: • Initialize each analysis variable with • Update all analysis variables of each equation until reaching a fixed point
Chaotic iteration fori:=1 to n do X[i] := WL = {1,…,n}while WL do j := pop WL // choose index non-deterministically N := F[i](X) if N X[i] then X[i] := Nadd all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i])return X • Input: • A cpoL = (D, , , ) satisfying ACC • Ln = LL … L • A monotone function f : DnDn • A system of equations { X[i] | f(X) | 1 i n} • Output: lfp(f) • A worklist-based algorithm
Chaotic iteration for static analysis • Specialize chaotic iteration for programs • Create a CFG for program • Choose a cpo of properties for the static analysis to infer: L = (D, , , ) • Define variables R[0,…,n] for input/output of each CFG node such that R[i]D • For each node v let vout be the variable at the output of that node:vout = F[v]( u | (u,v) is a CFG edge) • Make sure each F[v] is monotone • Variable dependence determined by outgoing edges in CFG