Towards a language design for modular software verification

Towards a language design for modular software verification Aleks Nanevski Microsoft Research, Cambridge Joint with Greg Morrisett (Harvard), Lars Birkedal (ITU Copenhagen), Amal Ahmed (TTI-Chicago) Workshop on Effects and Type Theory Tallinn, December 13, 2007

How to design a programming language from scratch with verification in mind? • Simple types have been very successful in preventing a class of programming errors. • But many errors are outside of their reach. • index-out-of-bounds • division-by-zero • invariants on mutable state, or almost anything involving effects • Can a language enforce these deeper properties? • While supporting usual features from programming practice. • Be conservative over simply-typed languages.

Two foundational approaches to program specification and verification • Hoare Logic • starts with an existing language • usually imperative, untyped, first-order • recent extensions to simply-typed functional languages [Honda’05],[Krishnaswami’06],[Birkedal’05] • Dependent type theory • targetspure higher-order lambda calculus • types may capture deep semantic properties of data • integer is even, list has 5 elements, etc. • I want to argue that we essentially want a combination of both.

What limitations of simple types to address? • Simple types cannot specify effects. • These operations are naturally partial, but here they must be “completed”: • perform run-time check • possibly raise exception • Simple types do not capture this partiality.

How to specify effect behavior? • Type-and-effect systems: refine the type with the effect annotation.

Semantic disconnect in type-and-effect systems • Following term would be labeled as throwing DivByZero, in most type-and-effect systems. • Also,execution of div x n will repeat the check for n>0, even if it doesn’t need to. • Also, how to specify dynamically generated exns? • this immediately requires dependent types

How to reconnect type-and-effects with semantics? • Idea: draw effect annotations from logic. • y > 0 is a precondition that must be proved before running div x y. • we will also require postconditions, like in Hoare logic • and proofs • Important: Pre/post-conditions become embedded in types.

Why embed specifications into types? • Captures partiality • e.g., no need to define div x y in case y · 0. • hence, strictly more expressive than Hoare Logic • Enables trade-offs between proving and efficiency • I.e. we can immediately define: • Uniform abstraction over terms, types, specs. • essential for information hiding and scalability • essential for higher-order and local state

Which logic to use for specifications? • It should be able to support all kinds of programming features: • practical data structures (e.g., hash-tables). • higher-order functions, polymorphism. • pointers, aliasing, state ownership • recursion, callcc, IO, concurrency. • Thus, the logic better be very expressive. • Type theory (like Coq) seems perfect. • But need to reconcile it with effects.

Hoare Type Theory (HTT) • Introduce a type corresponding to specs in Hoare Logic (for partial correctness). • Hoare type stands for • stateful programs with • precondition P • postcondition Q • result type A • Simply-typed fragment (almost) core Haskell.

Hoare Type Theory (cont’d) • Fruitful combination of some fundamental PL ideas: • Dijkstra’s predicate transformer. • Curry-Howard isomorphism. • Monads (as in Haskell). • Separation Logic of Reynolds, O’Hearn, et al. • Provably compositional: • components can be specified and checked in isolation. • Prototype under construction as extension of Coq. • Execution by code extraction.

Dependent types and effects

Type theories are unsound if effects are added naively • Propositions like (10 < 0) are types. • Effectful programs can often be given any type: • divergence via infinite recursion • exceptions • mutable state • IO • concurrency • An effectful program can prove that (10 < 0)! • Hence, the system is inconsistent The awkward squad from Haskell

A solution: Monads • Like in Haskell, distinguish purity with types • pure fragment – the underlying type theory • e : nat • e is an integer value • e : ST nat • e is delayed effectful computation. • when executed, it may change the state and diverge. • but since it is delayed, it is actually considered pure. • hence, can safely appear in types, predicates, proofs. • e : ST (10 < 0) • a computation which must diverge when executed.

Refine the monad with pre/post-condition to capture effectful behavior and partiality • Hoare type is a dependent (or indexed) monad. • Formation rule • ST{P}x:A{Q} : Type if • P : heap  Prop • A : Type • x:A |- Q : heap  heap  Prop, where heap = loc  option( a:Type. a), and loc = nat. • Note: postcondition is binary relation on heaps. Variant of VDM notation.

Example: specify function that increments location contents and returns old value • where is true if x points to v:A in h. • Note: before running inc x, must prove that x stores a nat. • because x may store a value of some other type. • because x may be a dangling pointer.

Implementation of inc in Haskell-style do-notation. • HTT implementation typechecks inc as follows: • Compute P,Q=weakest pre/strongest post for the do-block • Then emit obligation to prove the consequence:

Typing of primitive commands designed to compute weakest pre and strongest post • Memory read • (Strong) Memory update

Typing of primitive commands designed to compute weakest pre and strongest post • Memory allocation • Memory deallocation

Fixpoints are a little bit different… • Pre/posts must be given explicitly (for now) • Corresponds to giving loop invariants in Hoare Logic • But should be possible to write a rule that infers the strongest invariant! Future work.

Monadic primitives (unit) • Roughly, corresponds to Hoare Logic rule of variable assignment.

Monadic primitives (bind) • Rule of sequential composition (but higher-order) • Note: quantifications over pre/posts and heaps is essential for obtaining tightest specs.

Monadic primitives (Haskell-style do) • Rule of consequence • Interesting fact: “do” is not ordinary coercion • it is an introduction form for Hoare type • bind is corresponding elimination

Example: counter • Allocate a private location x • Export function that increments x • Executing fcounter; x0f; x1f; x2f will bind 0,1,2 to x0,x1,x2, respectively. • What is the spec for counter?

A specification with nested Hoare types • Problem: x is out of scope in return type.

Hide private state by existential abstraction • Introduce invariant into code to hide how count is kept. • Another problem: • fst(f) 0 h states (x0) h, but we lost connection with i • We will need Separation Logic to handle this.

Proving program correctness in HTT

Weakest pre and strongest post precisely capture the semantics of a program. • Problem: these may not be easy to read! • Remember the example 3-line program:

Here is the computed tightest spec for inc, in Coq syntax. inc : forall x : loc, ST (fun i : heap => (fun i0 : heap => exists v : nat, ptsto x v i0) i /\ (forall (x0 : nat) (m : heap), (fun (y : nat) (i0 m0 : heap) => m0 = i0 /\ ptsto x y i0) x0 i m -> (fun (xv : nat) (i0 : heap) => (fun i1 : heap => exists B : Type, exists w : B, ptsto x w i1) i0 /\ (forall (x1 : unit) (m0 : heap), (fun (_ : unit) (i1 m1 : heap) => m1 = update x (xv + 1) i1) x1 i0 m0 -> (fun (_ : unit) (_ : heap) => True) x1 m0)) x0 m)) (fun (y : nat) (i m : heap) => exists x0 : nat, exists h : heap, (fun (y0 : nat) (i0 m0 : heap) => m0 = i0 /\ ptsto x y0 i0) x0 i h /\ (fun (xv y0 : nat) (i0 m0 : heap) => exists x1 : unit, exists h0 : heap, (fun (_ : unit) (i1 m1 : heap) => m1 = update x (xv + 1) i1) x1 i0 h0 /\ (fun (_ : unit) (r : nat) (i1 f : heap) => r = xv /\ f = i1) x1 y0 h0 m0) x0 y h m)

Luckily, the spec has a lot of structure! • It literally represents the program as a predicate. • We apply the proving strategy from Hoare Logic: • symbolically evaluate the program, one step at a time. • at each step, discharge the verification condition that enables the next evaluation step. • With a twist: Evaluation/VC-generation can be implemented as a set of lemmas. • proving the lemmas verifies the VC-gen implementation.

Example lemma for symbolic evaluation (in Coq syntax) • If program starts with a read from location x: • first prove that x is initialized (ptsto x v i) • then proceed to prove the spec of the continuation. • Other lemmas similar (evals_bind_write, evals_bind_new…) • Applicable lemma can be determined by a tactic. Lemma evals_bind_read : forall (A B : Type) (x : loc) (v : A) (p2 : A -> heap -> Prop) (q2 : A -> B -> heap -> heap -> Prop) (i : heap) (q : B -> heap -> Prop), ptsto x v i -> (p2 v i /\ forall y m, q2 v y i m -> q y m) -> (bind_pre (read_pre A x) (read_post A x) p2 i /\ forall y m, (bind_post (read_pre A x) (read_post A x) p2 q2 y i m -> q y m.

Separation Logic

Large footprints in Hoare Logic • Let inc: • Q: What is known after inc runs in a heap with locations x and y? • A: Only that xv+1, but all info about y is lost. • Spec should explicitly say that y is not changed. • possible to write in ST, but quite inconvenient

Small footprints and Separation Logic • Specs should only describe what the program changes [O’Hearn,Reynolds,Pym,…] • If e : STsep{P}x:A{Q}, then e can run in • any heap containing a subheap i such that P i • diverges, or returns subheap m such that Q i m • part of initial heap outside iis not accessible. • Easier to use than large footprints, but more difficult meta theory.

Separation logic adds two new things: • Separating conjunction (easily definable in HTT): (P * Q) holds of heap h iff P and Q hold of disjoint parts of h • Frame rule of inference: If then • Can we add Frame rule to HTT? How to prove that Frame is sound?

Employ a type-theoretic idea to expedite… • Impose that well-typed programs must satisfy Frame! • Define new monad STsep, over ST: • Then re-type the stateful commands, using rule of consequence.

Programs remain the same, but specs become much simpler • Example: allocation • empty subheap is consumed and replaced by rv • r must be fresh (as new can’t access existing state) • Example: deallocation • subheap x- is consumed and replaced by empty. • Analogy with linear logic.

STsep monad correctly handles private state • Now (fst f) 0 replaces empty from the precondition. • Meaning: initial heap is extended with x0

Meta-theoretic properties:soundness, compositionality, equations

Verification in HTT reduces to typechecking • Theorem: If e:ST{P}r:A{Q}, then E evaluates as expected. • Proved via Preservation and Progress lemmas. • but much more demanding! • Preservation: evaluation preserves types, normal forms, and postconditions. • e.g: if e:ST{T}r:int{r = 55} then e does produce 55. • Progress demands soundness of assertion logic • Requires a denotational model for HTT.

Type checking is syntax directed • Program properties independent of context. • No need for whole program reasoning. • Proofs by induction on program structure. • Program is a proof of its spec: • in the pure case, by Curry-Howard. • in the impure case, by weakest pre/strongest post. • Formal statements of compositionality • In the pure case, substitution principles. • In the impure case, Hoare’s rule of composition.

Denotational models • Denotation for e : ST{P}x:A{Q x} is a predicate transformer: • takes p:heapProp such that 8h. p h  P h • returns q:AheapProp such that 8x h. q x h  9i. p i Æ Q x i h • is monotone • Model suffices for soundness, but too large • e.g., does not support storing monads into heaps • also, requires showing monotonicity before taking fix. • Better, realizability model [Petersen,Birkedal’08]. • But not implemented in Coq, and seems very hard to!

Implementation, related work, future work, summary

Summary • HTT reflects effect information into types via Hoare-style pre/post conditions. • Generalization of monadic type-and-effect systems, but effect annotations are logical predicates over heaps. • Types determine in which context a program may be used (in a context satisfying the precondition). • This is a uniquely type-theoretic property, generalizing ordinary Hoare Logics. • Combines usefully with higher-order features of a type theory like Coq, to represent modes of use of state, like: • freshnes, aliasing, ownership (via Separation Logic) • higher-order and shared local state (via existential abstraction).

Related work • Extended static checking: • ESC/Java, JML, Spec#, SPlint, Cyclone, Sage • Hoare-like annotations verified during typechecking. • Restrictive strategies for dealing with undecidability • Dependent types and effects • [Augustson’98],[Mandelbaum’03],[Zhu,Xi’05],[Shao’05], [Sheard’05],[Westbrook’06],[Taha’07],[Condit’07]. • Programs and specs cannot share pure code (phase separation) • Hoare Logics for higher-order functions: • [Schoeder’02],[Honda’05],[Krishnaswami’06],[Birkedal’04] • Simply-typed underlying languages (with effects) • Hoare triples do not integrate into types.

HTT in comparison to related work. Programming features Fully verified software Java,C#,Haskell,O’Caml Hoare specs (ESC,JML,Spec#,Cyclone) Light dependent types (Cayenne,DML, ATS,Omega) HTT Typed lambda calculus Dependent type theory (Coq,Epigram,NuPRL…) Spec expressiveness

Future work: gain more experience with implementation in Coq • A lot of scaffolding for verification is in place • symbolic evaluation lemmas • tactics for Separation Logic reasoning (were tricky to nail down at first; several wrong starts) • Getting ready to attack larger programs. • Probably start with libraries for imperative data structures. • Largest so far: Hash-table module, Stack module, Parsing combinators. • Experience encouraging: • proofs/code ratio quite large • but proofs were not difficult

Future work: other effects • First attempts at formulating Haskell-style monad for transactional concurrency. • Separate state into private and shared • Reasoning like O’Hearn’s concurrent separation logic • Hoare type is a 4-touple STM{I}{P}x:A{Q} • I – invariant of shared state • Other notions of concurrency? Auxiliary variables, history/prophecy variables? Predicate transformers for concurrency? • IO monad? • Specifications must be limited to statements that are invariant against outside changes to the world. • Continuation monad? (first attempts made)

Future work: better models and axiomatizations • Can we encode equality over effectful code as some reasonable judgment? • Without having to implement involved categorical models.

Hopefully in future not too far, far away… Programming features Fully verified software Java,C#,Haskell,O’Caml Hoare specs (ESC,JML,Spec#,Cyclone) Light dependent types (Cayenne,DML, ATS,Omega) HTT Typed lambda calculus Dependent type theory (Coq,Epigram,NuPRL…) Spec expressiveness

Towards a language design for modular software verification

Towards a language design for modular software verification

Presentation Transcript

Modular Software/ Component Software

Software Design, Verification and Validation

Abstraction and Modular Reasoning for the Verification of Software

DESIGN FOR VERIFICATION

Instructional Design for Language Learning Software

CENTRE FOR FORMAL DESIGN AND VERIFICATION OF SOFTWARE

Towards Modular Code Generators Using Symmetric Language-Aware Aspects

Software Verification

little b, a language for building modular models

Design For Verification

Verification methods - towards a user oriented verification

Abstraction and Modular Reasoning for the Verification of Software

Modular Design

A Formal Object-Oriented Analysis for Software Reliability: Design for Verification

Modular Machine Code Verification

Towards a Language for Graph-based Model Transformation Design Patterns

Interface Grammars for Modular Software Verification

Modular Verification with Shared Abstractions

Software Verification

Towards A Language for Metadata Schemas for Interoperability

Modular Data Structure Verification

Abstraction and Modular Reasoning for the Verification of Software