Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II

Spring 2014Program Analysis and Verification Lecture 10: Abstract Interpretation II Roman Manevich Ben-Gurion University

Syllabus

Previously • Semantic domains • Preorders • Partial orders (posets) • Pointed posets • Ascending/descending chains • The height of a poset • Join and Meet operators • Complete lattices • Constructing new lattices from old • Abstract Interpretation package – domains

Abstract domain types

A taxonomy of semantic domain types Join/Meet exist for every subset of D Join/Meet exist for every finite subset of D (alternatively, binary join/meet) Complete Lattice(D, , , , , ) Lattice(D, , , , , ) Meet of the empty set Join of the empty set Join semilattice(D, , , ) Meet semilattice(D, , , ) poset with LUB for all ascending chains Complete partial order (CPO)(D, , ) reflexivetransitiveanti-symmetric: d  d’ and d’  d implies d = d’ Partial order (poset)(D, ) • reflexive: d  dtransitive: d  d’, d’  d’’ implies d  d’’ Preorder(D, )

Composing domains

Cartesian product of complete lattices • For two complete lattices L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) • Define the posetLcart = (D1D2, cart, cart, cart, cart, cart)as follows: • (x1, x2) cart (y1, y2) iffx1 1 y1 andx2 2 y2 • cart = ? cart = ? cart = ? cart = ? • Lemma: L is a complete lattice • Define the Cartesian constructor Lcart = Cart(L1, L2)

Disjunctive completion • For a complete lattice L = (D, , , , , ) • Define the powerset latticeL = (2D, , , , , ) = ?  = ?  = ?  = ?  = ? • Lemma: L is a complete lattice • L contains all subsets of D, which can be thought of as disjunctions of the corresponding predicates • Define the disjunctive completion constructorL = Disj(L)

Relational product of lattices • L1 = (D1, 1, 1, 1, 1, 1)L2 = (D2, 2, 2, 2, 2, 2) • Lrel = (2D1D2, rel, rel, rel, rel, rel)as follows: • Lrel = Disj(Cart(L1, L2)) • Lemma: L is a complete lattice

Finite maps • For a complete latticeL = (D, , , , , )and finite set V • Define the posetLVL = (VD, VL, VL, VL, VL, VL)as follows: • f1 VLf2iff for all vVf1(v)  f2(v) • VL = ? VL = ? VL = ? VL = ? • Lemma: L is a complete lattice • Define the map constructor LVL = Map(V, L)

The collecting lattice Lattice for a given control-flow node v: Lv=(2State, , , , , State) Lattice for entire control-flow graph with nodes V: LCFG = Map(V, Lv) We will use this lattice as a baseline for static analysis and define abstractions of its elements

Implementation

Software package: paver142 • Built on top of the Soot compiler framework for Java • Download from web-site • Includes all necessary Soot jar files

Example analyses Soot-specific utilities Infrastructurefor implementingstatic analysis

Existing analyses

Implementing abstract domains

Variable equalities analysis

Today Solving monotone systems Fixed-points Vanilla static analysis algorithm Chaotic iteration

Abstract interpretation via abstraction generalizes axiomatic verification statement S abstract semantics abstract representationof sets of states abstract representationof sets of states abstract representationof sets of states  abstraction abstraction statement S collecting semantics set of states set of states {P} S {Q}  sp(S, P)

Abstract interpretation via concretization abstract representationof sets of states abstract representationof sets of states statement S abstract semantics concretization concretization set of states set of states set of states statement S  collecting semantics  models(P) {P} models(sp(S, P)) S models(Q) {Q}

Missing knowledge Collecting semantics Abstract semantics Connection between collecting semantics and abstract semantics Algorithm to compute abstract semantics

Review of collecting semantics

The collecting lattice (sets of states) Lattice for a given control-flow node v: Lv=(2State, , , , , State) Lattice for entire control-flow graph with nodes V: LCFG = Map(V, Lv) We will use this lattice as a baseline for static analysis and define abstractions of its elements

Collecting semantics as equation system Semantic function for assume x>0 Semantic function for x:=x-1 lifted to sets of states entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1 A vector of variables R[0, 1, 2, 3, 4] R[0] = {xZ} // established inputR[1] = R[0]  R[4]R[2] = R[1]  {s | s(x) > 0}R[3] = R[1]  {s | s(x)  0}R[4] = x:=x-1 R[2] A (recursive) system of equations

General definition entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1 • A vector of variables R[0, …, k] one per input/output of a node • R[0] is for entry • For node n with multiple predecessors add equationR[n] = {R[k] | k is a predecessor of n} • For an atomic operation node R[m] S R[n] add equationR[n] = S R[m] • Transform if bthenS1elseS2to (assumeb; S1) or (assumeb; S2)

Static analysis • R[0] = {xZ} // established input • R[1] = R[0]  R[4] • R[2] = assume x>0 R[1] • R[3] = assume x0 R[1] • R[4] = x:=x-1 R[2] • R[0]# = {xZ}# • R[1]# = R[0]  R[4] • R[2]# = assume x>0#R[1] • R[3]# = assume x0#R[1] • R[4]# = x:=x-1#R[2] • Given a system of equationsfor the collecting semanticsA static analysis solves a corresponding system of equations over an abstract domain • Questions: • What is the relation between the solutions?Next lecture • How do you solve the second system? This lecture

Solving equation systems

Equation systems in general For R[i]=f[i] R Usually f[i] reads only a small subset of R – D[i]. We say that R[i] depends on D[i] • R[0] = {xZ} // established input • R[1] = R[0]  R[4] • R[2] = R[1]  {s | s(x) > 0} • R[3] = R[1]  {s | s(x)  0} • R[4] = x:=x-1 R[2] • Let L be a complete lattice (D, , , , , ) • Let R be a vector of analysis variables R[0, …, n]  D… D • Let F be a vector of functions of the type F[i] : R[0, …, n]  R[0, …, n] • A system of equationsR[0] = f[0](R[0], …, R[n])…R[n] = f[n](R[0], …, R[n]) • In vector notation R = F(R) • Questions: • Does a solution always exist? • If so, is it unique? • If so, is it computable?

Equation systems in general If it does – it is a fixed point of this equation • Let L be a complete lattice (D, , , , , ) • Let R be a vector of analysis variables R[0, …, n]  D… D • Let F be a vector of functions of the type F[i] : R[0, …, n]  R[0, …, n] • A system of equationsR[0] = f[0](R[0], …, R[n])…R[n] = f[n](R[0], …, R[n]) • In vector notation R = F(R) • Questions: • Does a solution always exist? • If so, is it unique? • If so, is it computable?

Monotone systems

Monotone functions Let L1=(D1, ) and L2=(D2, ) be two posets A function f : D1D2 is monotone if for every pair x, y D1x y implies f(x)  f(y) A special case: L1=L2=(D, ) f : DD

Monotone function L1 L2 f  f  y f(y) f(x) 2 3 4 x 1

Important cases of monotonicity • Join: f(X, Y) = X  Y is monotone in each operand • Prove it! • Set lifting function: for a set X and any function gF(X) = { g(x) | x X } is monotone w.r.t.  • Prove it! • Notice that the collecting semantics function is defined in terms of • Join (set union) • Semantic function for atomic statements lifted to sets of states • Conclusion: collecting semantics function is monotone

Fixed points

Extensive/reductive functions Let L=(D, ) be a poset A function f : DD is extensiveif for every x D, we have that x f(x) A function f : DD is reductiveif for every x D, we have that x f(x)

Fixed points  Red(f) gfp Fix(f) lfp Ext(f) fn()  • Does a solution always exist? Yes • If so, is it unique? No, but it has least/greatest solutions • If so, is it computable? Under some conditions… • L = (D, , , , , ) • f : DDmonotone • Fix(f) = { d | f(d) = d } • Red(f) = { d | f(d)  d } • Ext(f) = { d | d  f(d) } • Theorem [Tarski 1955] • lfp(f) = Fix(f) = Red(f)  Fix(f) • gfp(f) = Fix(f) = Ext(f)  Fix(f)

Fixed point example F(d) : Fixed point d xZ xZ 0 0 = entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x0} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0]  R[4]R[2] = R[1]  {s | s(x) > 0}R[3] = R[1]  {s | s(x)  0}R[4] = x:=x-1 R[2]

Pre-fixed point example F(d) : pre-fixed point d xZ xZ 0 0  entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x<-5} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0]  R[4]R[2] = R[1]  {s | s(x) > 0}R[3] = R[1]  {s | s(x)  0}R[4] = x:=x-1 R[2]

Post-fixed point example F(d) : post-fixed point d xZ xZ 0 0  entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x<9} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0]  R[4]R[2] = R[1]  {s | s(x) > 0}R[3] = R[1]  {s | s(x)  0}R[4] = x:=x-1 R[2]

Recap • A system of equations of the form R=F(R) where R draws its elements from a complete latticeL= (D, , , , , ) • Tarski’s fixed point theorem ensures us that there exists a least fixed point: lfp(f) = Fix(f) • However, it is not an algorithm since D is often infinite • Ineffective when D is finite • We need a more constructive way of computing lfp(f)

Computingthe least Fixed point

Continuous functions • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • A function f is continuous if for every increasing chain Y  D*, f(Y) = { f(y) | yY} • Lemma: if f is continuous then f is monotone • Proof: assume x yTherefore xy=yThen f(y) = f(xy) = f(x)  f(y), which means f(x)  f(y)

Kleene’s fixed point theorem • Let L = (D, , , ) be a complete partial order and a continuous function f: DD thenlfp(f) = nNfn() • That is, take the ascending chain  f()  f(f())  …  fn()  …and return the supremum • Why is this an ascending chain? • But how do you know if a function f is continuous

Continuity and ACC condition • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes:d0 d1  …  dn = dn+1 = dn+2 = … • Lemma: Monotone functions on posets satisfying ACC are continuousProof:We need to show thatf(Y) = { f(y) | yY } • Every ascending chain Y eventually stabilizes d0 d1  …  dn = dn+1 = … hence dn is the least upper bound of {d0, d1, … , dn},thus f(Y) = f(dn) • From monotonicity of f we get thatf(d0)  f(d1)  …  f(dn) = f(dn+1) = … Hence f(dn) is the least upper bound of {f(d0), f(d1), … , f(dn)},thus { f(y) | yY } = f(dn)

Resulting algorithm  Mathematical definition lfp(f) = nNfn() lfp fn() Algorithm d := whilef(d)  ddod := f(d)returnd … f2() f()  Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone

Our very first genericstatic analysis algorithm

Vanilla algorithm Non-incremental. Most variables don’t change. Problem Definition: • Lattice of properties L of finite height (ACC) • For each statement define a monotone transformer Preparation: • Parse program into AST • Convert AST into CFG • Generate system of equations from CFG Analysis: • Initialize each analysis variable with  • Update all analysis variables of each equation until reaching a fixed point

Chaotic iteration

Chaotic iteration fori:=1 to n do X[i] := WL = {1,…,n}while WL  do j := pop WL // choose index non-deterministically N := F[i](X) if N  X[i] then X[i] := Nadd all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i])return X • Input: • A cpoL = (D, , , ) satisfying ACC • Ln = LL … L • A monotone function f : DnDn • A system of equations { X[i] | f(X) | 1  i  n} • Output: lfp(f) • A worklist-based algorithm

Chaotic iteration for static analysis • Specialize chaotic iteration for programs • Create a CFG for program • Choose a cpo of properties for the static analysis to infer: L = (D, , , ) • Define variables R[0,…,n] for input/output of each CFG node such that R[i]D • For each node v let vout be the variable at the output of that node:vout = F[v]( u | (u,v) is a CFG edge) • Make sure each F[v] is monotone • Variable dependence determined by outgoing edges in CFG

Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II

Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II

Presentation Transcript

Policy Analysis and Program Evaluation

ITK Lecture 4 Images in ITK

Linear Programming: Sensitivity Analysis and Interpretation of Solution

口译简述

Bauxite and Aluminum: A Cradle to Grave Analysis

Lecture 9: Gene expression analysis/Clustering

1443-501 Spring 2002 Lecture #24

Mailing Innovation Cap Metro Area Focus Group June 4 2014 Lance Bell Program Manager Business Mailer Support HQ

Lecture series: Data analysis

CUDA Lecture 3 Parallel Architectures and Performance Analysis

Compilation 0368-3133 (Semester A, 2013/14)

Static Analysis with Abstract Interpretation

Chapter 8 Linear Programming: Sensitivity Analysis and Interpretation of Solution

EKG Interpretation

Program Verification by Lazy Abstraction

Fall 2014-2015 Compiler Principles Lecture 1: Lexical Analysis

Educational Research: Data analysis and interpretation – 2 Inferential statistics

NYSFAAA Verification Training 2012-2013

Exercise Solutions: Functional Verification