Type Inference in Programming Languages

Type Inference September 26, 2006

Type Inference • Type inference is typically presented in two different forms: • Type inference rules: Rules define the type of each expression • Clean and concise; needed to study the semantic properties, i.e., soundeness, of the type system • Type inference algorithm: Needed by the compiler writer to deduce the type of each subexpression or to deduce that the expression is ill typed. • Often it is nontrivial to derive an inference algorithm for a given set of rules. There can be many different algorithms for a set of typing rules. http://www.csg.csail.mit.edu/6.827

A Simple Type System(no polymorphism) first ... http://www.csg.csail.mit.edu/6.827

-calculus with Constants & Letrec There are no types in the syntax of the language! • E ::= x | x.E | E E • | Cond (E, E, E) • | PFk(E1,...,Ek) • | CN0 • | CNk(E1,...,Ek) | CNk(SE1,...,SEk) • | let S in E • PF1 ::= negate | not | ... | Prj1| Prj2 | ... • PF2 ::= + | ... • CN0 ::= Number | Boolean • CN2 ::= cons | ... • Statements • S ::=  | x = E | S; S not in initial terms http://www.csg.csail.mit.edu/6.827

int, bool, ... A Simple Type System • Types • t ::=  base types • | t type variables • |  -> 2Function types • Type Environments • TE ::= Identifiers  Types http://www.csg.csail.mit.edu/6.827

Typing: TE ├ e :  • (App) TE ├ e1 :  -> ’ TE ├ e2 :  •  TE ├ (e1 e2) : ’ • (Abs) • TE ├x.e :  -> ’ • (Var) • TE ├ x :  • (Const) • TE ├ c : t • (Let) • TE ├ (let x = e1in e2) : ’ Simple Type Inference Rules TE + {x : } ├ e : ’ (x : t)  TE typeof(c) = t TE+{x : } ├ e1 : TE+{x : } ├ e2 : ’ http://www.csg.csail.mit.edu/6.827

one such rule for each PF one set of such rules for each data structure type Simple Type Inference Rules cont TE ├ eB : bool TE ├ e1 :  TE ├ e2 :  (Cond) TE ├ Cond(eB, e1, e2) :  (PF) TE ├ (e1 + e2) : int (CN) TE ├ CN(e1 , e2 ) : CN(1 , 2) (Pro) TE ├ Prj1(e) : t1 TE ├ e1 : int TE ├ e2 : int TE ├ e1 : 1TE ├ e2 : 2 TE ├ e: CN(1 ,2) http://www.csg.csail.mit.edu/6.827

Simple Inference Algorithm • W(TE, e) returns (S,) such that S (TE) ├ e :  • The type environment TE records the most general type of each identifier while the substitution S records the changes in the type variables • Def W(TE, e) = • Case e of • x = ... • x.e = ... • (e1 e2) = ... • let x = e1 in e2= ... http://www.csg.csail.mit.edu/6.827

Simple Inference Algorithm (cont-1) • Def W(TE, e) = Case e of • c = ({}, Typeof(c)) • x = if (x  Dom(TE)) then Fail • else let= TE(x); • in ({},  ) • x.e = let (S1, 1) = W(TE + { x : u }, e); • in (S1, S1(u) -> 1) • (e1 e2) = • let x = e1 in e2 • = u’s represent new type variables let (S1, 1) = W(TE, e1); (S2, 2) = W(S1(TE), e2); S3 = Unify(S2(1), 2 -> u); in (S3 S2 S1, S3(u)) let (S1, 1) = W(TE + {x : u}, e1); S2 = Unify(S1(u), 1); (S3, 2) = W(S2 S1(TE) + {x : t1}, e2); in (S3 S2 S1, 2) http://www.csg.csail.mit.edu/6.827

These should probably be type constants Simple Inference Algorithm (cont-1) • Cond(eB ,e1, e2) = • (e1 + e2) = • CN(e1, e2) = • Prj1(e) = let (SB, B) = W(TE, eB); (S1, 1) = W(SB(TE), e1); (S2, 2) = W(S1 SB(TE), e2); S3 = Unify(B, bool); S4 = Unify(1,2); in (S4 S3 S2 S1 SB , S4 S32) let (S1, 1) = W(TE, e1); (S2, 2) = W(S1(TE), e2); S3 = Unify(1, int); S4 = Unify(2, int); in (S4 S3 S2 S1, int) let (S1, 1) = W(TE, e1); (S2, 2) = W(S1(TE), e2); in (S2 S1, CN(1 , 2 )) let (S, ) = W(TE, e); S1 = Unify(, CN(u1,u2))); in (S1 S, S1(u1)) http://www.csg.csail.mit.edu/6.827

Type Inference : Example let f = n.cond ((eq0 n), 1, n*(f (pred n)) in f B A W(, A) = ( [ ] , Int -> Int ) W({f : u1}, n.B) = ([Int -> Int / u1] , Int -> Int) W({f : u1, n : u2}, B) = ([Int -> Int / u1, Int / u2] , Int) W({f : u1, n : u2}, (eq0 n)) = ( [Int / u2] , Bool) Unify(u2 , Int) = [ Int/u2 ] W({f : u1, n : Int}, 1) = ( [ ] , Int) W({f : u1, n : Int}, n*(f (pred n))) = ( [Int -> Int / u1] , Int) W({f : u1, n : Int}, n) = ( [ ] , Int) W({f : u1, n : Int}, f (pred n)) = ( [Int -> u3 / u1] , u3) Unify(u1 , Int -> u3) = [ Int -> u3/u1 ] Unify(Int -> Int -> Int, Int -> u3 -> Int) = [ Int / u3] W({f : Int -> Int}, f ) = ( [ ] , Int -> Int) http://www.csg.csail.mit.edu/6.827

Soundness • The proposed type system is said to be sound if e :  then e indeed evaluates to a value in  • A method of proving soundness: • The semantics of the language is defined in terms of a value space that has integer values, Boolean values etc. as subspaces. • Any expression that may evaluate to a special value “wrong” will not type check. • There is no type expression that denotes the subspace “wrong”. Such proofs tend to be difficult ... http://www.csg.csail.mit.edu/6.827

Some observations • A type system restricts the class of programs that are considered “legal” • It is possible a term in the untyped –calculus may not cause any errors but may not be typeable in a particular type system • let • id = x. x • in • ... (id True) ... (id 1) ... This term is not typeable in the simple type system we have discussed so far. http://www.csg.csail.mit.edu/6.827

Polymorphic Types • let • id = x. x • in • ... (id True) ... (id 1) ... Constraints: id :: t1 --> t1 id :: Int --> t2 id :: Bool --> t3 Does not unify!! Solution: Generalize the type variable id :: t1. t1 --> t1 • Different uses of a generalized type variable • may be instantiated differently • id2 : Bool --> Bool • id1 : Int --> Int When can we generalize? http://www.csg.csail.mit.edu/6.827

now... The Hindley-Milner Type System http://www.csg.csail.mit.edu/6.827

A mini Languageto study Hindley-Milner Types • There are no types in the syntax of the language! • The type of each subexpression is derived by the Hindley-Milner type inference algorithm. • Expressions • E ::= c constant • | x variable • | lx. E abstraction • | (E1 E2) application • | let x = E1in E2 let-block http://www.csg.csail.mit.edu/6.827

A Formal Type System • Types • t ::=  base types • | t type variables • |  -> 2Function types • Type Schemes •  ::= t • |t.  • Type Environments • TE ::= Identifiers Type Schemes Note, all the ’s occur in the beginning of a type scheme, i.e., a type  cannot contain a type scheme  http://www.csg.csail.mit.edu/6.827

Instantiations • Type scheme can be instantiated into a type ’ by substituting types for the bound variables of, i.e., ’ = S  for some S s.t. Dom(S)  BV() - ’ is said to be an instance of (>’) - ’ is said to be a generic instance of s when S maps variables to new variables. =t1...tn.  • Example: • =t1. t1 -> t2 • t3 -> t2 is a generic instance ofs • Int-> t2 is a non generic instance of s http://www.csg.csail.mit.edu/6.827

Generalization aka Closing • Generalization introduces polymorphism • Quantify type variables that are free in  but not free in the type environment (TE) • Captures the notion of new type variables of  Gen(TE,) = t1...tn.  where { t1...tn } = FV() - FV(TE) http://www.csg.csail.mit.edu/6.827

Typing: TE ├ e :  • (App) TE ├ e1 :  -> ’ TE ├ e2 :  •  TE ├ (e1 e2) : ’ • (Abs) • TE ├x.e :  -> ’ • (Var) • TE ├ x :  • (Const) • TE ├ c : t • (Let) • TE ├ (let x = e1in e2) : ’ HM Type Inference Rules TE + {x : } ├ e : ’ (x : )  TE s typeof(c) t TE+{x : } ├ e1 : TE+{x:Gen(TE,)} ├ e2 : ’ http://www.csg.csail.mit.edu/6.827

HM Inference Algorithm • Def W(TE, e) = Case e of • c = ({}, Typeof(c)) • x = • x.e = • (e1 e2) = • let x = e1 in e2 • = u’s represent new type variables if (x  Dom(TE)) then Fail else lett1...tn.= TE(x); in ( { }, [ui / ti] ) • let (S1, 1) = W(TE + { x : u }, e); • in (S1, S1(u) -> 1) let (S1, 1) = W(TE, e1); (S2, 2) = W(S1(TE), e2); S3 = Unify(S2(1), 2 -> u); in (S3 S2 S1, S3(u)) let (S1, 1) = W(TE + {x : u}, e1); S2 = Unify(S1(u), 1);  = Gen(S2 S1(TE), S2(1) ); (S3, 2) = W(S2 S1(TE) + {x : }, e2); in (S3 S2 S1, 2) http://www.csg.csail.mit.edu/6.827

Hindley-Milner: Example A x. let f = y.x in (f 1, f True) B W(, A) = ( [ ] , u1 -> (u1,u1) ) ( [ ] , (u1,u1) ) W({x : u1}, B) = ( [ ] , u3-> u1) W({x : u1, f : u2}, y.x) = W({x : u1, f : u2, y : u3}, x) = ( [ ] , u1) Unify(u2 , u3-> u1) = [ (u3-> u1) / u2 ] Gen({x : u1}, u3 -> u1) = u3.u3-> u1 TE = {x : u1, f : u3.u3-> u1} ( [ ] ,u1) W(TE, (f 1) ) = ( [ ] , u4-> u1) W(TE, f) = W(TE, 1) = ( [ ] , Int ) [ Int / u4,u1 / u5 ] Unify(u4 -> u1, Int-> u5) = ... http://www.csg.csail.mit.edu/6.827

Important Observations • Do not generalize over type variables used elsewhere • Let is the only way of defining polymorphic constructs • Generalize the types of let-bound identifiers only after processing their definitions http://www.csg.csail.mit.edu/6.827

Properties of HM Type Inference • It is sound with respect to the type system. • An inferred type is verifiable. • It generates most general types of expressions. • Any verifiable type is inferred. • Complexity • PSPACE-Hard • DEXPTIME-Complete • Nested let blocks http://www.csg.csail.mit.edu/6.827

Extensions • Type Declarations • Sanity check; can relax restrictions • Incremental Type checking • The whole program is not given at the same time, sound inferencing when types of some functions are not known • Typing references to mutable objects • Hindley-Milner system is unsound for a • language with refs (mutable locations) • Overloading Resolution http://www.csg.csail.mit.edu/6.827

Generic vs. Non-generic type variables HM Limitations:l-bound vs Let-bound Variables • Only let-bound identifiers can be instantiated differently. • let • twice f x = f (f x) • in • twice twice succ 4 • vs. • let • twice f x = f (f x) • foo g = (g g succ) 4 • in • foo twice foo is not type correct ! http://www.csg.csail.mit.edu/6.827

(Gen) TE ├ e : t  FV(TE) TE ├ e : t. (Spec) TE ├ e : t. TE ├e :  [t’/t] (Var) (x : t)  TE TE ├ x :  (Let) TE+{x:} ├ e1: TE+{x:} ├ e2:’ TE ├ (let x = e1in e2) : ’ (App) and (Abs) rules remain unchanged. Puzzle: Another set of Inference rules Sound but no inference algorithm ! http://www.csg.csail.mit.edu/6.827

next time --- Overloading http://www.csg.csail.mit.edu/6.827

Type Inference in Programming Languages