Type inference in type-based verification

Type inference in type-based verification Dimitrios Vytiniotis, Microsoft Research dimitris@microsoft.com May 2010

Software is hard to get right* Making programming language types more practical and effective this talk * Toyota recalls 2010 models due to faulty software in the brakes. Upgrade your Prius! Which tools can help programmers write reliable code? How to make these tools more practical and effective to use?

Why invest in types? # of bugs A demonstrably simple technology that can eliminate lots of bugs Development with proof assistants this talk Model-driven development Verification condition generation and constraint solving Model checking Other benefits: Integrated verification and development Early error detection Static checks means fast runtime code Force to think about documentation Modular development They scale Programming language types complexity

A brief (hi)story of type expressivity ICFP 2006 ICFP 2009 GADTs ICFP 2006 JFP 2007 First-class polymorphism map::(a->b)->[a]->[b] TLDI 2010 Simple Types 1970 Hindley-Milner ML, Haskell, F# OutsideIn(X) ICFP 2008 2015 ML 2009 Type families NEW: JFP submission inc::Int->Int Type classes Dependent types … The context My work on expressive types The future

A brief (hi)story of type expressivity ICFP 2006 ICFP 2009 GADTs ICFP 2006 JFP 2007 First-class polymorphism Simple types 1970 Hindley-Milner ML, Haskell, F# TLDI 2010 OutsideIn(X) ICFP 2008 2015 ML 2009 Type families NEW: JFP submission Type classes Keeping types practical Dependent types … The context My work on expressive types The future

Types express properties # of bugs [1,2,3,4] :: { l :: List Intwhere foralli < length(l), l[i]<=4 } [1,2,3,4] :: ListWithLength 4 Int [1,2,3,4] :: List NONEMPTY Int [1,2,3,4] :: List Int [1,2,3,4] :: IntList [1,2,3,4] :: Object Our goal: Increase expressivity … … but keep the complexity low Hindley-Milner [Hindley, Damas & Milner] Haskell, ML, F#, also Java, C#, …

Keeping type annotation cost low • Full type inference extremely convenient [no type-induced pain] • mapf list • = case list of nil -> nil • h:tail -> cons (f h) (map f tail) • map<S,T> (f :: S -> T)(list:: [S]) • = case list of nil -> nil<T> • h:tail -> cons<T> (f h) (map<S,T> f tail) StringBuildersb = new StringBuilder(256); varsb = new StringBuilder(256); Full type inference No user annotations at all Full type checking Explicit types everywhere Many traditional languages Int inc(Int x) = x+1 Increased expressivity requires more checking Hindley-Milner inc x = x+1 How to convince the type checker that programs are well-typed?

Keeping types predictable t <- infer e s <- infer u α <- fresh solve(t = s -> α) returnα And theorems that connect typing rules to low level algorithms test1 = p -- ACCEPTED test2 = -- REJECTED let f x = x in f p test1 = … p1 + p2 … -- ACCEPTED test2 = … p2 + p1 … -- REJECTED e :: s -> t u :: s e u :: t Hindley-Milner scores perfect here With simple, robust, declarative typing rules

A brief (hi)story of type expressivity Simple, predictable No user annotations Low expressivity GADTs ICFP 2006 ICFP 2009 First-class polymorphism ICFP 2006 JFP 2007 TLDI 2010 Simple Types 1970 Hindley-Milner ML, Haskell, F# OutsideIn(X) ICFP 2008 2015 ML 2009 Type families NEW: JFP submission Type classes • What are GADTs • Why they are difficult for type inference • Inference vs checking [ICFP 2006] • Simplifying and reducing annotations [ICFP 2009] • How to implement GADTs Dependent types … The context My work on expressive types The future

GADTs in Glasgow Haskell Compiler (GHC) -- An Algebraic Datatype: Integer Lists data IList where Nil :: IList Cons :: Int -> IList -> IList -- A Generalized Algebraic Datatype (GADT) data IListf where Nil :: IListEMPTY Cons :: Int -> IListf -> IListNONEMPTY x = Cons 1 (Cons 2 Nil) head :: IListNONEMPTY -> Int test0 = head x test0 = head Nil Type checker knows x :: IListNONEMPTY REJECTED!

Uses of GADTs • Compiler enforces invariants via type checking • tail :: ListWithLength(S n) -> ListWithLengthn • compile :: Term SOURCE -> Maybe (Term TARGET) • Significant number of research papers • [Cheney & Hinze, Xi, Pottier & Simonet, Pottier & Régis-Gianas, Sulzmann & Stuckey,…] • Verified compiler transformations, data structure implementations, reflection & generic programming, … • Such a cool feature that people are using GADT-inspired tricks in other languages! For example, C. Russo and A. Kennedy have a C# encoding

Example: evaluation of embedded DSL A non-GADT representation A GADT representation data Term where ILit :: Int -> Term And :: Term -> Term -> Term IsZero :: Term -> Term ... eval :: Term -> Val eval (ILiti) = IVali eval (And t1 t2) = case eval t1 of IVal _ -> error BVal b1 -> case eval t2 of IVal _ -> error BVal b2 -> BVal (b1 && b2) ... f= eval (And (ILit 3) (IsZero 0)) data Term a where ILit :: Int -> Term Int And :: Term Bool -> Term Bool -> Term Bool IsZero :: Term Int -> Term Bool ... eval :: Term a -> a eval (ILiti) = i eval (And t1 t2) = eval t1 && eval t2 ... data Val where IVal :: Int -> Val BVal :: Bool -> Val Represents only correct terms Tagless evaluation: efficient code A common example, also appearing in [Peyton Jones, Vytiniotis, Weirich, Washburn , ICFP 2006]

Type checking and GADTs Possible with the help of programmer annotations data Term a where ILit :: Int -> Term Int eval :: Term a -> a eval (ILiti) = i eval _ = … Right-hand side: we must return type a i :: Int Determines the term we analyze That’s fine because we know that (a~Int) from pattern matching Determines the result Pattern matching introduces type equalities, available after the = • In the first branch we learn that a ~ Int

Type inference and GADTs data Term a where ILit :: Int -> Term Int ... -- Get a list of literals in this term getILit (ILiti) = [i] getILit _ = [] Haskell programmers omit type signatures BAD! Here is a possible type of getILit: Term a -> [Int] But if (a ~ Int) is used then there is also another one Term a -> [a]

A threat for modularity btrm :: Term Bool f1 = (getILitbtrm) ++ [0] f2 = (getILitbtrm) ++ [True] test = let getILit (ILiti) = [i] getILit _ = [] in ... Works only with: Term a -> [Int] Works only with: Term a -> [a] And this one? We want to have a unique principal type that we infer once and use throughout the scope of the function Two different “specifications” for getILit

Separating checking and inference [ICFP 2006] getILit (ILiti) = [i] -- inferred: (Term a -> [Int]) getILit :: Term a -> [a] getILit (ILiti) = [i] S. Peyton Jones, D. Vytiniotis, G. Washburn, S. Weirich • Not all programs have principal types, so use annotations to let programmers decide • No annotation: do not use GADT equalities • To use the other type supply an annotation: Annotations determine two interweaved modes of operation: checking mode and inference mode

Discovering a complete implementation Needed to design a type system and a sound and complete algorithm The first work on type inference and GADTs to achieve this Theorem: There exists a provably decidable, sound and complete algorithm for the [ICFP 2006] type system Predictability mandates high-level declarative typing rules That turned out to be possible because: • Typing rules [and algorithm] can “switch” mode when they meet annotations • The GADT checking problem is easy • All non-GADT branches are typed as in Hindley-Milner • This is what GHC implements since 2006 • Extremely effective and popular: http://darcs.net, commercial users, …

[ICFP 2006] was a breakthrough but … opt :: Term b -> Term b eval :: Term a -> a eval x = case opt x of ILiti -> i eval :: Term a -> a eval x = let f x = x in case f (opt x) of ILiti -> i fails typechecks Because no type annotation for f Quite remarkable BUT what about predictability? To reduce required annotations it used some ad-hoc annotation propagation How to improve this?

The Outside-In solution eval :: Term a -> a eval x = let f x = x in case f (opt x) of ILiti -> i Working on the outside of the branch first determines that f (opt x) :: Term a Shrijvers, Sulzmann, Peyton Jones, Vytiniotis [ICFP 2009] perform full inference outside a GADT branch first, and then use what you learnt to go inside the branch Very aggressive type information discovery + a simpler “Outside-In” type system

Simplifying and reducing annotations [ICFP 2009] All type-safe programs Theorem: There exists a provably decidable, sound and complete algorithm for the “Outside-In” type system in [ICFP 2009] Modularity Theorem: All programs with principal types “Outside-In” type system • Fewer annotations needed • Predictability • Forthcoming implementation in GHC, invited paper in special issue of JFP • “the system of this paper is the simplest proposal ever made to solve type inference for GADTs” [anonymous reviewer]

Inferring principal types in [ICFP 2009] data Term a where ILit :: Int -> Term Int If :: Term Bool -> Term a -> Term a -> Term a -- Get the least number in this term findLeast (ILiti) = i findLeast (If cond t1 t2) = let x1 = findLeast t1 x2 = findLeast t2 in if (x1 < x2)then x1 else x2 Not due to arbitrarily choosingTerm a -> Intas previously • REJECTED in [ICFP 2009] • No ad-hoc assumptions about • programmer intentions Because of (x1 < x2), findLeast must return Int. There is a principal type [and ICFP 2009 finds it]: Term a -> Int

The algorithm in [ICFP 2009] Type checker infers partially known type:findLeast :: Term α -> β findLeast (ILiti) = i findLeast (If cond t1 t2) = let x1 = findLeast t1 x2 = findLeast t2 in if (x1 < x2)then x1 else x2 Implication constraints may have many solutions β := Intor β:= α which result in different types. Constraint abduction [Maher] or (rigid) E-unification [Degtyarev & Voronkov, Veanes, Gallier & Snyder, Gurevich] Detecting incomparable solutions only possible in special cases. Mostly negative results about complexity or even decidability of the general problem. NOT VERY ENCOURAGING GADT branches introduce implication constraints that we must solve (α ~ Int) => (β ~ Int)

Restricting implications for Outside-In Constraint A: [α,β] (α ~ Int) => (β ~ Int) Interface:[α,β] findLeast (ILiti) = i findLeast (If cond t1 t2) = let x1 = findLeast t1 x2 = findLeast t2 in if (x1 < x2)then x1 else x2 Constraint B: [α,β] (β ~ Int) Interface:[α,β] • Step 1: Introduce special constraints that record the interface of the branch with the outside • Step 2: Solve non-implication constraint (B) first. Easy, no multitude of solutions to pick from: β := Int • Step 3: Substitute solution on implication constraint (A) [a] (α ~ Int) => (Int~ Int) • Step 4: Solve remaining implications fixing interface variables

A brief (hi)story of type expressivity ICFP 2006 ICFP 2009 GADTs ICFP 2006 JFP 2007 First-class polymorphism TLDI 2010 Simple Types 1970 Hindley-Milner ML, Haskell, F# OutsideIn(X) ICFP 2008 2015 ML 2009 Type families NEW: JFP submission Type classes Dependent types … The context My work on expressive types The future

The Hindley-Milner type system 25 years later Reminder: Hindley-Milner does not need any annotations, at all How all the above affect our “golden standard” of modern type systems? • We had to add user type annotations to HMto get GADTs • Yet another reason for this is first-class polymorphism [THESIS TOPIC] • QML: Explicit first-class polymorphism for ML [Russo, Vytiniotis, ML 2009] • FPH: First-class polymorphism for Haskell [Vytiniotis, Peyton Jones, Weirich, ICFP 2008] • Practical type inference for higher-rank types [Peyton Jones, Vytiniotis, Weirich, Shields, JFP 2007] • The canonical reference for Higher-Rank type systems • Boxy Types [Vytiniotis, Peyton Jones, Weirich, ICFP 2006] • … but are we also forced to remove anything?

let generalization in Hindley-Milner main = let group x y = [x,y] in (group 0 1, group False False) group is polymorphic. We can give it the generalized type group :: foralla. a -> a -> [a] or defer the check to the call sites [Pottier, Sulzmann, HM(X)]: group :: forall a b. (a ~ b) => a -> b -> [a] * trust me For some extensions [e.g. GHCs celebrated type families] we must allow deferring because: no-deferring hard-to-generalize* … but is it practical to defer?

No generalization for let-bound definitions f :: a -> Term a -> Int f x y = let g b = x+ 1 in case y of ILiti -> g () a ~ Int ... errk??? If typing rules allow deferring Then algorithm must not solve any equality [BAD!] completenessproof reveals nasty surprise Well-typed if we defer equality to the call site of g: g :: (a ~ Int) => b -> Int

The proposal [TLDI 2010] • RADICAL: removing a basic ingredient of HM • But not restrictive in practice: • 127 lines affected in 95Kloc of Haskell libraries (0.13%)! • No expressivity loss: • Polymorphism can be recovered with annotations D. Vytiniotis, S. Peyton Jones, T. Schrijvers [TLDI 2010] Abandon generalization of local definitions The only complete algorithms are not practical

OutsideIn(X) OutsideIn(X) [TLDI 2010, new JFP submission] Parameterize “Outside-In” type system and infrastructure [implication constraints] by a constraint theory X and its solver w/o losing inference Do the Hard Work once • Many recent extensions exhibit those problems: • GADTs [previous slides] • Type classes: sort :: forall a. Ord a => [a] -> [a] • Type families: append :: forall n m. (IList n)->(IList m)->(IList (Plus n m)) • Units of measure [Kennedy 94], implicit parameters, functional dependencies, impredicative polymorphism …

OutsideIn(X) – new JFP submission • Substantial article that brings the results of a multi-year collaborative research program together • Many people involved over the years: Simon Peyton Jones, Tom Schrijvers (KU Leuven), Martin Sulzmann (Informatik Consulting Systems AG), Manuel Chakravarty (UNSW), Stephanie Weirich (Penn), Geoff Washburn (LogicBlox) , … • Bonus: a new glorious constraint solver to instantiate X, which improves previous work, and for the first time shows how to deal with all of GHCs tricky features

A brief (hi)story of type expressivity GADTs ICFP 2006 ICFP 2009 First-class polymorphism ICFP 2006 JFP 2007 TLDI 2010 OutsideIn(X) Simple types 1960 ICFP 2008 Hindley-Milner ML, Haskell, F# 2015 ML 2009 Type families NEW: JFP submission Type classes Dependent types … The context My work on expressive types The future

What we did learn We now know about: • Local assumptions [ICFP 2006, ICFP 2009, TLDI 2010] • Local definitions [TLDI 2010] • Generalizing Outside-In with OutsideIn(X) [TLDI 2010] Where to from here?

2015 (And ideas for collaborations!) Open: How to design syntactic language extensions Open: How to trust solver [proof checking, certificates?] UnitTheory.thy A theory of units of measure: [Kennedy, ESOP94] constant kg,hp,sec,m axiom u*1 = u axiom u*v = v*u axiom … A solver for UnitTheory constraints Open: How to type more programs with principal types [revisiting rigid E-unification, better constraint solvers, ideas from SMT solving] DSL Designer / User DSL User Open: How to combine multiple theories and solvers [revisiting Nelson-Oppen] import UnitTheory.thy data Vehicle = Vehicle { weight :: Int[kg] , power :: Int[hp] , ... } ... Type checker/inference OutsideIn(UnitTheory) Yes/No We (the compiler) Programs with principal types • … towards practical pluggable type systems + inference!

Understanding and writing better software • Past:What do GADTs mean? How many functions have type • forall a. [a] -> a -> a • forall a. Term a -> a -> a • [Vytiniotis & Weirich, MFPS XXIII, Vytiniotis & Weirich, JFP 2010] • Past: PL proofs are tedious and error-prone. Mechanize them in proof assistants. The POPLMark Challenge [TPHOLS 2005] • Have been using Isabelle/HOL and Coq in recent works with Claudio Russo and Andrew Kennedy • Ongoing: Typed intermediate languages that better support type equalities and full-blown dependent types [with S. Weirich, S. Zdancewic, S. Peyton Jones] • Ongoing:Adding probabilities to contracts to combine static analysis and testing or statistical methods [with V. Koutavas, TCD] • On the wish list: Macroscopically programming groups of agents of limited computational power

Q-A games for encoding and decoding y n y n y n Imagine a binary format such that everybitstring encodes a non-empty set of type-safe CIL programs Not easy to program from first principle! • Instead, understand and program encoders using question-answer games • Good coding scheme follows by asking good questions! • Recent ICFP 2010 submission with A. Kennedy

Making good software easier to write # bugs A demonstrably simple technology that can already eliminate lots of bugs • This talk: solving research problems to make types more effective and practical: • Catch more bugs • Require little user guidance • Remain predictable and modular Programming language types complexity

Type inference in type-based verification