From Under-approximations to Over-approximations and Back

From Under-approximations to Over-approximations and Back Complementary material By Yuri Meshman yurime@cs.technion.ac.il

Example Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; Assume we have the following code example. In this case, the ERROR label is not reachable, and we want to prove that with predicate abstraction. First step: we want to know what are all the reachable locations.

ARG Definiton Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We want to build an abstract reachability graph for it. ARG: v1 v2 v3 v4 v5 v6 v2’ v7 v3' v8 v9

ARG Definiton Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We want to build an abstract reachability graph for it. ARG: where v1 – is a map from nodes to control locations (several nodes can map to the same pc) v2 v3 v4 v5 In the graph example maps to the control reaching line i of code.Apostrophes are used to distinguish different nodes mapped to the same revisited line (e.g. , ). v6 v2’ v7 v3' v8 v9

ARG Definiton Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We want to build an abstract reachability graph for it. ARG: where v1 – is a map from edges (E) to actions (instructions) of the program v2 v3 v4 v5 In the graph example , )=“i=0,x=0;” v6 v2’ v7 v3' v8 v9

ARG Definiton Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We want to build an abstract reachability graph for it. ARG: where v1 – is a map from nodes (V) to formulas over program variables. v2 v3 v4 v5 In the graph example option1 : all true – represents reachable locations. v6 v2’ v7 v3' v8 v9

ARG Definiton Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We want to build an abstract reachability graph for it. ARG: where v1 – is a map from nodes (V) to formulas over program variables. v2 v3 v4 v5 In the graph example option2: general formulas over variables – abstracts variables values reaching this location. v6 v2’ v7 v3' v8 v9

ARG Definiton Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We want to build an abstract reachability graph for it. ARG: where v1 – an ancestor relation over the nodes v2 Used to define fixed point, and covered vertexes. If is covered by , we don’t need to explore more iterations of the loop. v3 v4 v5 In the graph example is covered by if: , is dominated by (all paths from pass through it) same code line – the label for is subsumed by label. v6 v2’ v7 v3' v8 v9

ARG Definiton Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We want to build an abstract reachability graph for it. ARG: where v1 – a fixed linearization of the topological order.Gives us the order by which to traverse the graph. v2 v3 v4 v5 In the graph example (one option) 2. v6 v2’ v7 v3' v8 v9

Post operator in abstract interpretation: • Post operator: • Given: • An abstract state u • An operation (instruction from code) • An abstraction level (such as set of predicates) Returns: The successor state abstraction. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return;

Post operator in abstract interpretation: • Post operator: • Given: • An abstract state u • An operation (instruction from code) • An abstraction level (such as set of predicates) Returns: The successor state abstraction. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; Definition: Post(u,v)= such that: Where is the abstraction of state. is the instruction from code and its interpretation under the abstraction

Post operator in abstract interpretation: • Post operator: • Given: • An abstract state u • An operation (instruction from code) • An abstraction level (such as set of predicates) Returns: The successor state abstraction. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; • Example • Assume you have predicates P1:(i<n) P2:(i<=n) • You want to know their values after “i=i+1” (P1`,P2`) • on an abstract edge (u,v) • If only P1 was true before “i=i+1” we don’t know P1`. • But we know that P2` will be true. • If P1 was False that will mean i>=n held before • “i=i+1” which will mean P1 and P2 will be false after it. • And so on..

Post operator in abstract interpretation: • Post operator: • Given: • An abstract state u • An operation (instruction from code) • An abstraction level (such as set of predicates) Returns: The successor state abstraction. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; • Example • Assume you have predicates P1:(i<n) P2:(i<=n) • You want to know their values after “i=i+1” (P1`,P2`) • on an abstract edge (u,v) • P1’= if P1 then F • else unknown • P2’= if P1 then T else if P1 then F • else unknown

Post operator run example Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; P1:(i<n) P2:(i<=n) The transition from v1 to v2 doesn’t change the predicates Post(v1,v2)=true v1 v2

Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; P1:(i<n) P2:(i<=n) v1 v2 The transition from v2 to v3 sets both predicates totrue Post(v2,v3)=P1P2 v3

Post operator run example Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; P1:(i<n) P2:(i<=n) v1 v2 v3 v4 v5 The transition from v3 to v4 or from v3 to v5 doesn’t change the predicates

Post operator run example Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; P1:(i<n) P2:(i<=n) v1 v2 v3 v4 v5 The transition from v3 to v4 or v5 doesn’t change the predicates And so does the transition from v4 to v6 or from v5 to v6. So their join is the same. v6

Post operator run example Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; P1:(i<n) P2:(i<=n) v1 • P1’= if P1 then F • else unknown • P2’= if P1 then T else if (P1P2) then F • else unknown v2 v3 v4 v5 v6 The transition from v6 to v2’ is as previously discussed v2’

Under approximation driven verification:

Under approximation driven verification: Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; For UD – Post operator will always return true. And we will see refinement, using interpolants.

Under approximation driven verification: Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; An initial node is created and given the label true. has a single successor which we will continue to explore. v1 v2

Under approximation driven verification: Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; has a single successor and as previously mentioned, the Post operator will return true. has two possible successors, we will continue to explore for now v1 v2 v3 v7

Under approximation driven verification: Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; . And in that fashion, the exploration will continue until finishing the loopiteration and reaching the beginning of the loop a second time – a node . has two sons, – which indicates a second iteration of the loop and – which indicates exiting the loop after one iteration or more. v1 v2 v3 v4 v5 v6 v2’ v7 v3'

Under approximation driven verification: Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; -s label is subsumed by the one of meaning the exploration of will not provide new information, and itslabel will be the same as the one of This is indicated by the black arrowfrom to . v1 v2 v3 v4 v5 v6 v2’ v7 v3'

Under approximation driven verification: Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; After finishing exploring all the paths,the label of the error node is not false. So we want to check: if there is a concrete counter part to the 2 paths . if not reachable, use interpolants to find new labels that capture why those paths are not reachable. We describe next, how this Counter Example Guided Abstraction Refinement (CEGAR) phase is done. v1 v2 v3 v4 v5 v6 v2’ v7 v3' v8 v9

Building a formula for CEGAR Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We ignore all nodes and edges irrelevant to the abstract path to err. And, we add a boolean variable to each node -- for convenience it will be the name of the node. Intuitively, if are all true then this path will be feasible under concrete execution. Next, we add formulas for edges.Similar to the way it would have been done for Bounded Model Checking. v1 v2 v3 v4 v5 v6 v2’ v7 v8

Building a formula for CEGAR We use Static Single Assignment (SSA) Form. Definition: A program is in SSA form if an assignment to each variable appears at most once in its syntax. Therefore we rename variables for which assignments appear more then once. ““ will be at lines 1—3 will become at line 4 at line 5 etc. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; v1 v2 v3 v4 v5 v6 v2’ v7 v8

Building a formula for CEGAR Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; “6.i = i + 1;” will translate to a formula on the edge : We use the path formulas to capture error execution in the ARG: ) Meaning if is reached then will be taken and will be reached. To avoid name conflicts each time a variable appears on left side of an assignment it receives a new subscript(this is SSA).Such as for. v1 v2 v3 v4 v5 v6 v2’ v7 v8

Building a formula for CEGAR Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; For the graph example we will receive: ) ) ) ) ) 2’) ) ) The formulais UNSAT v1 v2 v3 v4 v5 v6 v2’ v7 v8

Solving the formula for CEGAR Definition An interpolant for ) is such that: 1. 2. 3. is over the intersection of the variables of and . Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return;

Solving the formula for CEGAR Definition An interpolant for ) is such that: 1. 2. 3. is over the intersection of the variables of and . Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; Note: In the following slides links appear to implementation of the formulas in iz3 (for interpolants)and z3 (for general formulas). Pressing the links opens the online z3 or iz3 tool, and pressing play at the opened site should calculate the solutions.

Solving the formula for CEGAR An interpolant for ) is such that: 1. 2. 3. is over the intersection of the variables of and . Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; We have: is UNSAT To derive a new label for we cancalculate an interpolant for and We get: v1 v2 v3 v4 v5 v6 A v2’ v7 B v8

Solving the formula for CEGAR To derive a new label for we can calculate an interpolant for and http://rise4fun.com/iZ3/5b In that case we will receive:(after transforming to nnf ) Informally it means that either execution reaches with or it reaches with .The resulting formula needs cleaning to get a label for Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; v1 v2 v3 A v4 v5 v6 v2’ v7 B v8

Cleaning the formula of CEGAR We want to extract for the label . Why ? Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; v1 v2 v3 A v4 v5 v6 v2’ v7 B v8

Cleaning the formula of CEGAR We want to extract for the label . Why x3? If we return to the equations we got interpolants from ) ) ) ) ) 2’) ) ) is relevant for is relevant for is relevant for is relevant for 7 B

Cleaning the formula of CEGAR We want to extract for the label . To do so: we will quantify all the variables out of scope - in this case ;and quantify all node-variables other then - in this case .To remove the variable we set it to true.http://rise4fun.com/Z3/d8km And so we receive . (actually ) Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; v1 v2 v3 A v4 v5 v6 v2’ v7 B v8

Cleaning the formula of CEGAR Where is the set of variables and the boolean variable we added. (both were so far)

Cleaning the formula of CEGAR Where is the set of variables and the boolean variable we added. (both were so far) means variables relevant to that node.

Cleaning the formula of CEGAR Where is the set of variables and the boolean variable we added. (both were so far) means variables relevant to that node. Why is it quantified for things we want to disappear?

Cleaning the formula of CEGAR Where is the set of variables and the boolean variable we added. (both were so far) means variables relevant to that node. Why is it quantified for things we want to disappear? For example we did: We wanted the invariant that holds at node regardless of whether was reachable or not. So we search solution both for when (reachable) and when .

Cleaning the formula of CEGAR Where is the set of variables and the boolean variable we added. (both were so far) Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; (from the paper) Let . If k=1 then and if k=n then For any two nodess.t. : Where is the formula on the edge as shown previously. v1 v2 v3 v4 v5 v6 v2’ v7 v8

Back to Under approximation driven verification: After cleaning we get a new label per each node. If the label of is not still subsumed by the label of , we continue to explore and iterations 2,3 etc. of the loop. With Post operator returning true as a label for each new node. In this case, the label of is still subsumed by the label of so the algorithm terminates. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; v1 v2 v3 v4 v5 v6 v2’ v7 v3' v8 v9

Over approximation driven verification:

Over approximation driven verification: Assuming we started with operator Post as true, and refinement staged returned as described before. We take the predicates it used, in this case an recalculate Post operator as described before. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; v1 v2 v3 v4 v5 v6 v2’ v7 v3' v8 v9

Over approximation driven verification: Statement “i=0,x=0;” sets both predicatesto true. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; And they stay true through the rest of the program. v1 v1 v2 v2 v3 v4 v5 v6 v2’ v7 v3' v8 v9

UFO:

UFO: In this paper the authors start with UD and after CEGAR continue with the new Post operator they get. Foo(int n): i=0,x=0; while(i<n) if (i <= 2) x = 0;else x = i; i = i + 1; If (x < 0) ERROR return; Meaning, if was not still subsumed by the label of they would have continued exploring from with post operator for . v1 v2 v3 v4 v5 v6 v2’ v7 v3' v8 v9

Boolean/Cartezian Predicate Abstraction Boolean Predicate Abstraction Given predicates we represent them using boolean vectors where . We will have possible states per each program counter location. Cartesian Predicate Abstraction We represent a cross product . At each location we store separately per each predicate if it is . If the predicate can be both we store “”. (Note that is now also part of the state.) A more compact representation (compared to Boolean) but we loose precision.

Results • 105 programs in benchmark • Compared with Wolverine http://www.cprover.org/wolverine/ • 5 versions of UFO • Pure UD called ufoNo (Post returns true) • With Cartesian Predicate abstraction called ufoCP • With Boolean Predicate abstraction called ufoBP • Pure OD with Cartesian Predicate abstraction called CP • Pure OD with Boolean Predicate abstraction called BP • Reports results for instances that should verify (#Safe)number of instances solved.and for instance where an error should be discovered (#Unsafe) number of instances solved.

From Under-approximations to Over-approximations and Back