Recursive Proofs for Inductive Tree Data-Structures

Recursive Proofs for Inductive Tree Data-Structures xiaokangqiu with P. Madhusudan Andrei Stefanescu University of Illinois at Urbana-Champaign POPL 2012, Philadelphia, PA, USA

functional verification of heap-manipulating programs Expressive keep expressiveness Expressive Logics: Separation logics, HOL, Matching logic, etc. Proving programs correct Our Goal give up decidability sound but incomplete preserve automaticity Decidable Logics: LISBQ, CSL, STRANDdec, etc. Bug-finding Automatic

our Strategy • Handle a logic that is very expressive (inevitably undecidable) • Identify a class of simple and naturalproofs C such that • Many correct programs can be proved using a proof in class C • The class C is effectively searchable (searching thoroughly for a proof in C is efficiently decidable) • In this paper, we apply the above strategy to inductive tree data-structures • Exhibit a procedure that can automatically prove routines on BSTs, red-black trees, avl-trees, binomial heaps, etc., written as imperative programs fully functionally correct. All Proofs C

motivation for simple proofs x • Assisted Proof of BST-Search: • ( unfold bst(x)and keys(x) ) • ( view bst and keys as uninterpretedfunctions, or use unification techniques [Suteret al., 2010] )

our contribution • A new recursive extension of FOL, called DRYAD • combines quantifier-free logic with recursive predicates/functions defined on trees • recursive predicates/functions allow stating complex properties of heaps without explicit quantification • A VC-generation algorithm • given a recursive imperative program, with proof annotations in DRYAD (with several key restrictions) • symbolic execution of the program over a footprint structure • unfold recursive definitions to the frontier of the footprint • Solve the validity of the generated VC • the scheme of formula abstraction: replace recursive definitions as uninterpreted functions

dryad example: AVL-search bool find(Node t, Int v) • //@requires • //@ensures • { • if (t = NIL) return false; • tv := t.value; • if (v = tv) return true; • else if (v < tv) { • w := t.left; • r := find(w, v); } • else { • w := t.right; • r := find(w, v); } • }

pre- and post-conditions • program functions: loc f( locv, intz1, …, intzn) • Stringent restrictions: • only one input location parameter (v), which must subtends a tree • When a location is returned, either • old_v and ret_loc point to disjoint trees or • old_v is “havoc”-ed(nothing about v and all locations reachable from v in the post-state is known) • A pre-condition is of the form tree(v) /\ψ(v, z1, …, zn) • A post-condition for f(v, z1, …, zn) is of the form either • havoc(old_v) /\ψ(old_v, old_z1, …, old_zn, ret_loc) or • old_v#ret_loc/\ψ(old_v, old_z1, …, old_zn, ret_loc) ret_loc v ret_loc a) b)

programs and basic blocks We consider annotated imperative programs with recursion only(no while-loops, therefore no loop-invariants) We verify linear blocks of code, called basic blocks (conditionals are replaced with assume statements) bb1 bb2 bb3

verification conditions • Each basic block bb gives a Hoare-triple • (φpre, bb, φpost) • We track the evolving of footprint (the portion of the heap touched explicitly by the program) • a footprint = a symbolic heap + a DRYAD formula • the symbolic heap is a graph structure denoting a portion of the concrete heap

symbolic heaps • A symbolic contains concrete nodes and symbolic nodes, where there is no pointer/data field from symbolic nodes. • A concrete heap CH with nodes Ncorresponds toSH if there is a homomorphism such that • for every symb. node s, h(s) is the root of a tree • for every distinct symb. nodes s and s’ , h(s) and h(s’) are the roots of two disjoint trees x x C2 C2 C1 concrete nodes C1 r r h l l symbolic nodes l l S1 S2 S2 r S1 r l y l y

crucial property of sym heaps: checking tree-ness of nodes • We can determine certain nodes subtend trees by checking the symbolic heap. • Lemma: If s is the root of a tree in the underlying graph of SH, then h(s) also subtends a tree in any corresponding concrete CH. T r l l T cnil l r r T T

expanding the footprint • ( , ) • ( , ) unfold recursive definitions on n n each new symbolic node is different from others n is not nil expand the footprint with respect to n n r l nr nl

handling function calls • ( assume the function call does not havoc v ) • ( , ) • ( , ) m m φwith some adaption incorporate the post-condition r

footprint evolving ( v : i1 ) • ( avl*(t) ) • assume (t ≠ nil); • tv:= t.value; • assume (tv ≠ v); • assume (tv < v); • w := t.left; • r := find(w, v); • return r; • ( avl*(t) /\ keys*(t)= keys*(old_t) • /\ h*(t) = h*(old_t) /\ … ) n0 n0 n0 t t t w w ( t.val : i2 ) ( tv : i4 ) ( ret : i5 ) l l r r ( r : i6 ) n1 n1 n2 n2 r

formula abstraction • How to check the verification condition • (SH, φVC)ψVC? • Procedure: • DropSH after checking tree-ness of nodes required by post, Then check φVCψVC • Replace recursive predicates/functions with uninterpretedpredicates/functions, obtain φabsψabs • Check the validity of φabsψabs in the theory combining uninterpreted functions, linear arithmetic and set/multiset of integers. (Decidable, NP-complete [Kuncaket al., 2010]) • Soundness: • If φabsψabs is valid, then φVCψVC is also valid. • Incompleteness: • E.g., height(x) = 3  size(x) ≤ 7 is valid when interpreted, but is invalid when uninterpreted.

experiments • We verify several inductive tree data-structures appearing in the classical textbook [CLRS: Cormen et al.] • Sorted list, Binary heap, Treap, AVL tree, Red-black tree, B-tree, and Binomial heap • Annotate each standard operation (insert/delete/rotate/merge) with pre-and post-conditions, specifying complex structural and data properties, e.g., for binomial-heap-merge, we check: • what returned is still a binomial heap • the set of keys stored is the union of the two inputs • the order of the binomial heap increases up to 1 • Examine the validity of the VCs in the uninterpreted theory, using a simple decision procedure, which employs Z3, an state-of-the-art SMT solver.

experiment results http://cs.uiuc.edu/~madhu/dryad/

related work • Separation Logic + recursive predicates [Chin et al., 2011] • Formulas are quantified, employs Isabelle and Mona, and is less efficient. • Bedrock [Chlipala, 2011] • Mostly automated, requires proof tactics given by the user. • VeriFast[Jacobs & Piessens, 2008] • Partially automated tool that accepts proof tactics from the user.

conclusion • A scheme for finding simple and natural proofs automatically and efficiently for tree data-structures • Future work: • Extend beyond trees, for arbitrary data-structures? • Handling while-loops: Functional programs [Suter et al., 2010] Imperative recursive programs [this paper]  Imperative while programs • Challenge: Can we build automatic procedures that can verify all data-structure algorithms we hand out to undergraduate CS students?

Recursive Proofs for Inductive Tree Data-Structures

Recursive Proofs for Inductive Tree Data-Structures

Presentation Transcript

Inductive Proofs

Tree Data Structures

Data Structures – Binary Tree

Inductive Proofs and Definitions

Data Structures B-tree

Recursive Definition of Tree Structures

Tree Data Structures

Tree Data Structures

Recursive Data Structures and Grammars

Advanced Tree Data Structures

Different Tree Data Structures for Different Problems

Recursive Data Structures and Grammars

1 1 Binary Tree Data Structures

Linear Recursive Structures(LRS)

Tree Data Structures

Inductive Proofs

Tree Data Structures

Module #15: Inductive Proofs

Tree Data Structures

Tree Data Structures

Examples of class: Recursive data structures

Different Tree Data Structures for Different Problems