TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A

The Fixpoint Brush in The Art of Invariant Generation SumitGulwani Microsoft Research, Redmond, USA sumitg@microsoft.com Workshop on Invariant Generation, Floc 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA

Art of Invariant Generation • Program Transformations • Reduce need for sophisticated invariant generation. • E.g., control-flow refinement (PLDI ‘09), loop-flattening/peeling (PLDI ‘10/POPL ’08) non-standard cut-points (PLDI ‘08) quantitative attributes instrumentation (POPL ’10). • Colorful Logic • Language of Invariants. • E.g., arithmetic, uninterpreted fns, lists/arrays. • Fixpoint Brush • Automatic generation of invariants in some shade of logic (such as conjunctive/k-disjunctive/predicate abstraction). • E.g., Iterative, Template/Constraint based, Proof rules.

Fixpoint Brush • Iterative and monotonic • Forward • Backward • Template/Constraint based • Proof rules • Iterative, but non-monotonic • Probabilistic Inference • Learning

Iterative Forward: Examples Start with Precondition and propagate facts forward using Existential elimination operator until fixpoint. • Data-flow analysis • Join at merge points • Finite abstract domains • Abstract Interpretation • Join at merge points Join(1 , 2) = strongest  s.t.1) and 2) • Widening for infinite height abstract domains. • Model Checking • Analyze all paths precisely without join at merge points. • Finite abstractions (e.g., boolean abstraction) required • Counterexample guided abstraction refinement • BDD based data-structures for efficiency

Difference Constraints • Abstract element: • conjunction of xi-xj·cij • can be represented using matrix M, where M[i][j]=cij • Decide(M): • M’ := Saturate(M); • Saturation means closure with deductions of form: x-y ≤ c1 and y-z ≤ c2 implies x-z ≤ c1 +c2 • Declare unsatiff9i: M’[i][i] < 0 • Join(M1, M2): • M’1 := Saturate(M1); M’2 := Saturate(M2); • Let M3 be s.t. M3[i][j] = Max { M’1[i][j], M’2[i][j] } • return M3

Example: Abstract Interpretation using Difference Constraints true y := 0; z := 2; y=0, z=2 ? y=0 Æ z=2 ? 0·y·1 Æ z=y+2 0·y·2 Æ z=y+2 0·y Æ z=y+2 0·y<51 Æ z=y+2 1·y·2 Æ z=y+2 1·y<51 Æ z=y+2 y=1 Æ z=3 ? y < 50 y=50 Æ z=y+2 False True Assert (z=52) y=0 Æ z=2 0·y·1 Æ z=y+2 0·y<50 Æ z=y+2 ? y++; z++;

Combination: Example of Join Algorithm z=a-1 Æ y=F(a) z=b-1 Æ y=F(b) Joinla+uf z=a-1 a=ha,bi y=F(a) a=ha,bi z=b-1 b=ha,bi y=F(b) b=ha,bi Joinuf Joinla ha,bi=1+z y=F(ha,bi) { ha,bi } Eliminateuf+la y=F(1+z)

Iterative Forward: References • Uninterpreted Functions • A Polynomial-Time Algorithm for Global Value Numbering; Gulwani, Necula; SAS ‘04 • Linear Arithmetic + Uninterpreted Functions • Combining Abstract Interpreters; Gulwani, Tiwari; PLDI ’06 • Theory of Arrays/Lists • Quantified Abstract Domains • Lifting Abstract Interpreters to Quantified Logical Domains; Gulwani, McCloskey, Tiwari; POPL ‘08 • Discovering Properties about Arrays in Simple Programs; Halbwachs, Péron; PLDI ‘08 • Shape Analysis • Parametric Shape Analysis via 3-Valued Logic; Sagiv, Reps, Wilhelm; POPL ‘99, TOPLAS ‘02

Iterative Forward: References • Theory of Arrays/Lists • Combination of Shape Analysis + Arithmetic • A Combination Framework for tracking partition sizes; Gulwani, Lev-Ami, Sagiv; POPL ’09 • Non-linear Arithmetic • User-defined axioms + Expression Abstraction • A Numerical Abstract Domain Based on Expression Abstraction and Max Operator with Application in Timing Analysis; Gulavani, Gulwani; CAV ’08 • Polynomial Equalities • An Abstract Interpretation Approach for Automatic Generation of Polynomial Invariants; Rodriguez-Carbonell, Kapur; SAS ‘04

Fixpoint Brush • Iterative and monotonic • Forward • Backward • Template/Constraint based • Proof-rules • Iterative, but non-monotonic • Probabilistic Inference • Learning

Iterative Backward • Comparison with Iterative Forward • Positives: Can compute preconditions, Goal-directed • Negatives: Requires assertions or template assertions. • Transfer Function for Assignment Node is easier. • Substitution takes role of existential elimination. • Transfer Function for Conditional Node is challenging. • Requires abductive reasoning/under-approximations. • Abduct(,g) = weakest ’ s.t. ’Æg)  • Case-split reasoning as opposed to Saturation based, and hence typically not closed under conjunctions. • Optimally weak solutions for negative unknowns as opposed to optimally strong solutions for positive unknowns.

Iterative Backward: References Program Verification using Templates over Predicate Abstraction; Srivastava, Gulwani; PLDI ‘09 Assertion Checking Unified; Gulwani, Tiwari; VMCAI ‘07 Computing Procedure Summaries for Interprocedural Analysis; Gulwani, Tiwari; ESOP ‘07

Template-based Invariant Generation Pre while (c) S Post Base Case Precision Inductive Case Pre ) I I Æ:c ) Post (I Æ c)[S] ) I VCGen I 8X 9 I Verification Constraint (Second-order) • Key Idea: Reduce the second-order verification constraint to a first-ordersatisfiability constraint that can be solved using off-the-shelf SAT/SMT solvers • Choose a template for I (specific color/shade in some logic). • Convert 8 into 9. Goal-directed invariant generation for verification of a Hoare triple (Pre, Program, Post)

Key Idea in reducing 8 to 9 for various Domains Trick for converting 8 to 9 is known for following domains: • Linear Arithmetic • Farkas Lemma • Linear Arithmetic + Uninterpreted Fns. • Farkas Lemma + Ackerman’s Reduction • Non-linear Arithmetic • Grobner Basis • Predicate Abstraction • Boolean indicator variables + Cover Algorithm (Abduction) • Quantified Predicate Abstraction • Boolean indicator variables + More general Abduction

Template-based Invariant Generation: References • Linear Arithmetic • Constraint-based Linear-relations analysis; Sankaranarayanan, Sipma, Manna; SAS ’04 • Program analysis as constraint solving; Gulwani, Srivastava, Venkatesan; PLDI ‘08 • Linear Arithmetic + Uninterpreted Fns. • Invariant synthesis for combined theories; Beyer, Henzinger, Majumdar, Rybalchenko; VMCAI ‘07 • Non-linear Arithmetic • Non-linear loop invariant generation using Gröbner bases; Sankaranarayanan, Sipma, Manna; POPL ’04 • Predicate Abstraction • Constraint-based invariant inference over predicate abstraction; Gulwani, Srivastava, Venkatesan; VMCAI ’09 • Quantified Predicate Abstraction • Program verification using templates over predicate abstraction; Srivastava, Gulwani; PLDI ‘09

Template-based Invariant Generation • Linear Arithmetic • Linear Arithmetic + Uninterpreted Fns. • Non-linear Arithmetic • Predicate Abstraction • Quantified Predicate Abstraction

Farkas Lemma 8X Æk(ek¸0) ) e¸0 iff 9k¸0 8X (e ´ + kkek) Example Let’s find Farkas witness ,1,2 for the implication x¸2 Æ y¸3 ) 2x+y¸6 9,1,2¸0 s.t. 8x,y [2x+y-6 ´ +1(x-2) + 2(y-3)] Equating coefficients of x,y and constant terms, we get: 2=1Æ 1=2Æ -6=-21-32 which implies =1, 1=2, 2=1.

Solving 2nd order constraints using Farkas Lemma 9I 8X 1(I,X) • Second-order to First-order • Assume I has some form, e.g., j ajxj¸ 0 • 9I 8X 1(I,X) translates to 9aj8X 2(aj,X) • First-order to “only existentially quantified” • Farkas Lemma helps translate 8 to 9 • 8X (Æk(ek¸0) ) e¸0) iff9k¸0 8X (e ´ + kkek) • Eliminate X from polynomial equality by equating coefficients. • 9aj8X 2(aj,X) translates to 9aj9¸k3(aj,¸k) • “only existentially quantified” to SAT • Bit-vector modeling for integer variables

Example [n=1 Æ m=1] x := 0; y := 0; while (x < 100) x := x+n; y := y+m; [y ¸ 100] n=m=1 Æ x=y=0 ) I I Æ x¸100 ) y¸100 I Æ x<100 ) I[xÃx+n, yÃy+m] VCGen I Invariant Template Satisfying Solution Loop Invariant a0 + a1x + a2y + a3n + a4m ¸ 0 b0 + b1x + b2y + b3n + b4m ¸ 0 c0 + c1x + c2y + c3n + c4m ¸ 0 y ¸ x m ¸1 n · 1 a2=b0=c4=1, a1=b3=c0=-1 a2=b2=1, a1=b1=-1 y ¸ x m ¸ n a0 + a1x + a2y + a3n + a4m ¸ 0 b0 + b1x + b2y + b3n + b4m ¸ 0 UNSAT Invalid triple or Imprecise Template a0 + a1x + a2y + a3n + a4m ¸ 0

Template-based Invariant Generation • Linear Arithmetic • Linear Arithmetic + Uninterpreted Fns. • Non-linear Arithmetic • Predicate Abstraction • Quantified Predicate Abstraction

Solving 2nd order constraints using Boolean Indicator Variables + Cover Algorithm 9I 8X 1(I,X) • Second-order to First-order (Boolean indicator variables) • Assume I has the form P1Ç P2, where P1, P2µ P Let bi,jdenote presence of pj2P in Pi Then, I can be written as (Æj b1,j)pj) Ç (Æj b2,j)pj) 9I 8X 1(I,X) translates to 9bi,j 8X 2(bi,j,X) • Can generalize to k disjuncts • First-order to “only existentially quantified” • Cover Algorithm helps translate 8 to 9 • 9bi,j8X 2(bi,j,X) translates to SAT formula 9bi,j 3(bi,j)

Example [m > 0] x := 0; y := 0; while (x < m) x := x+1; y := y+1; [y=m] VCGen m>0 ) I[xÃ0, yÃ0] I Æ x¸m) y=m I Æ x<m ) I[xÃx+1, yÃy+1] I x·y, x¸y, x<y x·m, x¸m, x<m y·m, y¸m, y<m Suppose P = and k = 1 Then, I has the form P1, where P1µ P m>0 ) P1[xÃ0, yÃ0] (1) P1Æ x¸m) y=m (2) P1Æ x<m ) P1[xÃx+1, yÃy+1] (3)

Example (1) m>0 ) P1 [xÃ0, yÃ0] x·y, x¸y, x<y x·m, x¸m, x<m y·m, y¸m, y<m 0·0, 0¸0, 0<0 0·m, 0¸m, 0<m 0·m, 0¸m, 0<m xÃ0, yÃ0 Hence, (1) ´ P1 doesn’t contain x<y, x¸m, y¸m ´:bx¸mÆ:bx<y Æ:by¸m (2) P1Æ x¸m) y=m (i) y·mÆy¸m (ii) x<m (iii) x·yÆy·m There are 3 maximally-weak choices for P1 (computed using Predicate Cover Algorithm) Hence, (2) ´ P1 contains at least one of above combinations ´ (bx<m) Ç (by·mÆ by¸m)Ç (bx·yÆby·m)

Example (1) :bx¸mÆ:bx<y Æ:by¸m Obtained from solving local/small SMT queries (2) (bx<m)Ç (by·mÆby¸m)Ç (bx·yÆby·m) (3) (by·m) (by<mÇby·x))Æ:bx<mÆ:by<m SAT Solver by·x , by·m, bx·y: true rest: false [m > 0] x := 0; y := 0; while (x < m) x := x+1; y := y+1; [y=m] I I: (y=xÆy·m)

Bonus Material Where can we go? Going beyond Invariant Generation with Constraint-based techniques…

Example: Bresenham’s Line Drawing Algorithm [0<Y·X] v1:=2Y-X; y:=0; x:=0; while (x · X) out[x] := y; if (v1<0) v1:=v1+2Y; else v1:=v1+2(Y-X); y++; return out; [8k (0·k·X ) |out[k]–(Y/X)k| · ½)] Postcondition: The best fit line shouldn’t deviate more than half a pixel from the real line, i.e., |y – (Y/X)x| · 1/2

Transition System Representation [0<Y·X] v1:=2Y-X; y:=0; x:=0; while (x·X) v1<0: out’=Update(out,x,y) Æv’1=v1+2Y Æ y’=y Æ x’=x+1 v1¸0: out’=Update(out,x,y) Æv1=v1+2(Y-X) Æ y’=y+1Æ x’=x+1 [8k (0·k·X ) |out[k]–(Y/X)k| · ½)] [Pre] sentry; while (gloop) gbody1: sbody1; gbody2: sbody2; [Post] Or, equivalently, Where, gbody1: v1<0 gbody2: v1¸0 gloop: x·X sentry: v1’=2Y-X Æ y’=0 Æ x’=0 sbody1: out’=Update(out,x,y) Æv’1=v1+2Y Æ x’=x+1 Æ y’=y • sbody2: out’=Update(out,x,y) Æv’1=v1+2(Y-X) Æ x’=x+1 Æ y’=y+1

Verification Constraint Generation & Solution PreÆsentry) I’ I ÆgloopÆgbody1Æsbody1) I’ I ÆgloopÆgbody2Æsbody2) I’ • I Æ:gloop) Post Verification Constraint: Given Pre, Post, gloop, gbody1, gbody2, sbody1, sbody2, we can find solution for I using constraint-based techniques. 0<Y·X Æv1=2(x+1)Y-(2y+1)X Æ2(Y-X)·v1·2Y Æ8k(0·k·x ) |out[k]–(Y/X)k| · ½) I:

The Surprise! PreÆsentry) I’ I ÆgloopÆgbody1Æsbody1) I’ I ÆgloopÆgbody2Æsbody2) I’ • I Æ:gloop) Post Verification Constraint: • What if we treat each g and s as unknowns like I? • We get a solution that has gbody1 = gbody2 = false. • This doesn’t correspond to a valid transition system. • We can fix this by encoding gbody1Çgbody2 = true. • We now get a solution that has gloop = true. • This corresponds to a non-terminating loop. • We can fix this by encoding existence of a ranking function. • We now discover each g and s along with I. • We have gone from Invariant Synthesis to Program Synthesis.

Proof Rules Problem specific. Requires understanding of commonly used design patterns. Can provide the most scalable solution, if applicable. Case Study: Ranking function computation for estimating loop bounds.

Ranking Function Consider the loop while (cond) X := F(X) Transition system representation s: condÆ X’=F(X) Example: Transition system representation of the loop while (x < n) {x++; n--;} is x<n Æ x’=x+1 Æ n’=n-1 An integer valued function r is a ranking function for transition s if s ) (r>0) s ) r[X’/X]·r-1 We denote this by Rank(r,s).

Finding Ranking Functions • Iterative Forward • Instrument counter c and find an upper bound n. n-c is a ranking function. (Gulwani, Mehra, Chilimbi, POPL ‘09) • Constraint-based • Assume a template a0 + iaixi for the ranking function r and then solve for ai’s in the constraint 9ai (s ) (r>0 Æ r[X’/X]·r-1) using Farkas lemma • Goal directed • Complete PTIME method for synthesis of linear ranking fns. • Podelski, Rybalchenko; VMCAI ‘04 • Proof Rules • “The Reachability Bound Problem”, Gulwani, Zuleger, PLDI ‘10

Arithmetic Iteration Patterns If s ) (e>0 Æ e[X’/X] < e), then Rank(e,s). Candidates for e: e1-e2, for any conditionals e1> e2 in s • n>0 Æn’·nÆ A[n]A[n’] • Ranking function: n If s ) (e¸1 Æ e[X’/X] · e/2), then Rank(log e,s). Candidates for e: e1/e2 for any conditionals e1 > e2 in s • i’=i/2 Æi>1 • Ranking function: log (i) • i’=2£i Æ i>0 Æ n>i Æ n’=n • Ranking function: log (n/i)

Bit-vector Iteration Pattern Input: bitvector b while (BitScanForward(&id1,b)) b := b | ((1 << id1)-1); if (BitScanForward(&id2,~x) break; • b := b & (~((1 << id2)-1); If s ) (RSB(e’) < RSB(e) Æ e0), then Rank(RSB(e),s) Candidates for e: all bitvector variables x in s • x’=x&(x-1) Æ x0 • Ranking function: x • Ranking function: RSB(x)/2 RSB: Position of rightmost significant 1-bit

Composing ranking functions Let Rank(r1,s1) and Rank(r2,s2). Non-Interference NI(s1,s2,r2): Non-enabling condition: s1 ± s2 = false Rank preserving condition: s1 ) r2[x’/x] · r2 If NI(s1,s2,r2) and NI(s2,s1,r1), thenMax(0,r1) + Max(0,r2) is a ranking fn for s1 Ç s2. If NI(s1,s2,r2) then (r1,r2) is a lexicographic ranking fnfor s1 Ç s2.

Relating values before/after a loop Useful for backward symbolic execution across loops. If s )e’ ≤ e + c, then eafter≤ ebefore+ c£Rank(s) n := m while (e) { …… n := n+3; …… } • nafter·nbefore + 3£Rank(s)

Fixpoint Brush • Iterative and monotonic • Forward • Backward • Template/Constraint based • Proof-rules • Iterative, but non-monotonic • Probabilistic Inference • Learning

Recurrence Solving Techniques Undergraduate Textbook on Algorithms by Cormen, Leiserson, Rivest, Stein describes 3 fundamental methods for recurrence solving: • Iteration Method • Expands/unfolds the recurrence relation

Iteration Method Expand (unfold) the recurrence and express it as a summation of terms. Consider the recurrence:T(n) = n + 3T(n/4) T(n) = n + 3(n/4) + 9T(n/16) = n + 3(n/4) + 9(n/16) + 27T(n/64) ≤ n + 3(n/4) + 9(n/16) + 27(n/64) + … + 3log4n θ(1) ≤ [n ∑i≥0(3/4)i]+ θ(3log4n) = 4n + o(n) = O(n)

Recurrence Solving Techniques vs. Our Fixpoint Brush Recurrence Solving Techniques vs. Our Fixpoint Brush Undergraduate Textbook on Algorithms by Cormen, Leiserson, Rivest, Stein describes 3 fundamental methods for recurrence solving: Example of a recurrence: T(n)=T(n-1)+2n Æ T(0)=0 • Iteration Method • Expands/unfolds the recurrence relation • Similar to Iterative approach • Substitution Method • Assumes a template for a closed-form

Substitution Method Involves guessing the form of the solution and then substituting it in the equations to find the constants. Consider the recurrence: T(n)=T(n-1)+2n Æ T(0)=0 Let T(n) ≤ an2 + bn + c for some a,b,c T(n) ≤ a(n-1)2+b(n-1)+c + 2n = an2 + (b-2a+2)n + (a+c-b) This implies (b-2a+2) ≤ b and (a+c-b) ≤ c T(0) = c This implies c=0 Hence, b ≥ a ≥ 1

Recurrence Solving Techniques vs. Our Fixpoint Brush Recurrence Solving Techniques vs. Our Fixpoint Brush Undergraduate Textbook on Algorithms by Cormen, Leiserson, Rivest, Stein describes 3 fundamental methods for recurrence solving: Example of a recurrence: T(n)=T(n-1)+2n Æ T(0)=0 • Iteration Method • Expands/unfolds the recurrence relation • Similar to Iterative approach • Substitution Method • Assumes a template for a closed-form • Similar to Template/Constraint-based approach • Master’s Method • Provides a cook-book solution for T(n)=aT(n/b)+f(n)

Master’s Method Provides a cook-book method for solving recurrences of the form T(n) = aT(n/b)+f(n), where a≥1 and b>1. If f(n) = O(nlogba – є) for some constant є > 0, then T(n) = θ(nlogba). If f(n) = θ(nlogba), then T(n) = θ(nlogbalg n). If f(n) = O(nlogba+ є) for some constant є > 0, and if a f(n/b) ≤ c f(n) for some constant c < 1 and all sufficiently large n, then T(n) = θ(f(n)). Requires memorization of three cases, but then the solution of many recurrences can be determined quite easily, often with pencil and paper.

Recurrence Solving Techniques vs. Our Fixpoint Brush Recurrence Solving Techniques vs. Our Fixpoint Brush Undergraduate Textbook on Algorithms by Cormen, Leiserson, Rivest, Stein describes 3 fundamental methods for recurrence solving: Example of a recurrence: T(n)=T(n-1)+2n Æ T(0)=0 • Iteration Method • Expands/unfolds the recurrence relation • Similar to Iterative approach • Substitution Method • Assumes a template for a closed-form • Similar to Template/Constraint-based approach • Master’s Method • Provides a cook-book solution for T(n)=aT(n/b)+f(n) • Similar to Proof-rules approach

Fixpoint Brush • Iterative and monotonic • Forward • Backward • Template/Constraint based • Proof-rules • Iterative, but non-monotonic (Distance to fixed-point decreases in each iteration). • Probabilistic Inference • Learning

Example Proof of Validity x = 0 entry y := 50; 1 2 x<100 False True exit 3 y = 100 x < 50 True False 6 4 x:= x+1; y := y+1; x := x+1; 5 7 8

Probabilistic Inference for Program Verification • Initialize invariants at all program points to any element (from an abstract domain over which the proof exists) • Pick a program point (randomly) whose invariant is locally inconsistent & update it to make it less inconsistent.

Consistency of an invariant I at program point  I is consistent at  iff Post() ) I Æ I ) Pre() • Post() is the strongest postcondition of “the invariants at the predecessors of ” at  • Pre() is the weakest precondition of “the invariants at the successors of ” at  • Example: 1 P s 2 • Post(2) = StrongestPost(P,s) • Pre(2) = (c ) Q) Æ (: c ) R) I c 4 3 Q R

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A