A type-checking algorithm

A type-checking algorithm The task: (since we start with empty H, why is the goal not just E?) The rule set (revisited next page) is algorithmic: The rules are syntax-directed • For each expression, a unique potentially applicable rule • For goals in body of a proper rule: • Environment determined by head goal environment • Expression is a direct sub-expression of head goal expression types

types

For deriving an algorithm, we make explicit all conditions in the rules: (the exists is not a problem here --- why?) Why are the conditions placed in these positions? types

The algorithm: types

Multi-arg functions -- left to you types

Certain obvious optimization(s) are left to you types

For an expression w/o let the algorithm works in two stages: • From root of expression to leaves, determining the type environment for each sub-expression • From leaves to root, determining types of sub-expressions H H2 H1 types

For a let expression (or sub-expression): • Apply algorithm to defining expressions, to obtain their types, • Use these types in determining type environment for body H let types

Claim: [correctness of algorithm (w.r.t. rules)] Proof: by induction on E Since a well-typed expression has at most one type under an environment, the algorithm returns a unique type, or fails For a closed program E, the initial call is types

Comment: Type-checker is the component that discovers missing declarations -- free var error Example: (assume this entered as 1st line in an ocaml session) let f = fun n  if n=0 then 1 else n*f(n-1); types

On implementation: A compiler constructs a symbol table: • Types for declared variables(from the declarations) • Hierarchical organization– reflects region hierarchy Type-checking of program phrase performed with respect to right place in the table. types

On the correctness of the rules A well-typed program never goes wrong Specifically: • Never generates a run-time type error • Halts only if reaches an extended value (progress) Together: type safety(some use safety only for the 1st) We prove also: • Never applies rule to argument of wrong type (according to the type declared for the function) In the typed language, this is also a type error types

erase We prove using transition semantics A problem: We deal with two worlds (TFL, FL) each with only some of the concepts. TFL: FL: Typed expressions Typing relation Well typed No Transition semantics Run-time type errors Untyped expressions No typings Transition semantics Run-time type errors types

The solution: a transition relation for TFL: same as but on typed expressions, except the diagram (almost) commutes: (except when?) progress holds for this language as well! Now, transitions, run-time errors apply to upper level -- TFL types

Observation: A run-time type error w.r.t. t is not well-typed: • v1 v2, where v1 is not an operation, nor a function, (hence (canonical forms) its type is not a function type) • If v e2 e3 where v is a non-boolean This covers all r-t type errors that have a transition to ER Corollary: An expression that contains a run-time type error (even not as its selected sub-expression) is not well-typed types

Assume WT(E), but E’s execution generates a type-error Q: How do we show this is impossible? A:we prove a type preservation property This property is the key to type safety How is it related to the informal intro to static type checking? types

A comment: We prove type preservation for regular expressions only; ER has no type (alternatives?) types

Theorem [Type preservation/ Subject reduction] (where E’ is not ER) Proof : induction on selection path of E • this path never goes to a lambda, hence H is fixed We show first the induction step (easier) then the basis (redexes) types

Induction step: case ofE • E= E1 E2  E’1 E2 (case for E2 issymmetric) • If – a step on the test, similar • Tuple – a step on a component, similar • Let – a step on a defining expression, similar Q: Where did we use inversion above? types

Basis: (redexes) types

types

The “difficult” case: function application Lemma: [type preservation under substitution] Intuitively simple, formally by induction on E End of proof of type preservation types

Back to original goal: Corollary (of preservation thm):[correctness of type rules] Corollary: types

Now, transfercorrectness to the FL setting Corollary: If E (in TFL) is well-typed, then • the FL execution of erase(E) will not generate a type error The execution may generate other errors, or be infinite (one reason natural semantics has not been used for the correctness proof) types

Discussion of TFL’s type system The type system is monomorphic • Each literal(number, boolean, operation name) has unique type(transferred to type checker by axiom (const)) • Variables have unique types(by declarations)\ (transferred to type checker by axiom (var)) • Each well-typed expression has a uniquetype (induction on expressions) • Each value (including tuples, functions) has a unique type(if its expression is well-typed) But, are all our assumptions satisfied in real pl’s? types

The operation name + may be associated with two operations (on int, real) (ad-hocpolymorphism) • The operation = has a polymorphic type: (an eq-type) • The constant nil (empty list) (not yet introduced) has a polymorphic type Q: is the type system still monomorphic? In what sense? types

On type equality (equivalence) The type checker uses type equality tests (where?) How is type equality defined? • By structure of the types --structural equivalence Types are equivalent if they have same structure • By name -- name equivalence • Type names are associated (once) to some structure • Types are equivalent only if they have same name Type systems that use `by name’ are called nominal They include: Pascal, Java types

Comments and Discussion Dynamic typing: • Only values(operations & functions included) are typed • Type compatibility determined at run-time (‘last minute’) • Types are general: (n-ary) function, list,  • Functions can be applied to all arguments of right arity (define id (lambda (x) x) (id 3) (id id) • Collections may be heterogeneous types

Static - monomorphic: • Valuesare typed (most, especially functions, are uni-typed) • Cells are (uni-)typed (not seen in TFL) • Cells can be assigned only values of their type • variables,expressions, are statically typed (most expressions are uni-typed) • Types must be detailed • Operations, functions: both inand out types  functions can be applied only to their in type • Collections: include the element type, hence must be homogeneous (& uni-typed) (exception: records) • Conditional expressions are conservatively typed types

Pro and Cons Dynamic: For : flexibility: • non-restrictive types, • no fixed types for cells, expressions, … Against : • increased overhead • extra storage, • extra run-time, • reduced safety (late discovery of many errors) types

Static (mono): For : • Increased performance • Reduced storage (no tags) (but modern pl’s may include tagsforother reasons) • Less run-time checks • Data structure storage and access optimized by type • increased safety • Improved documentation • Early discovery of many errors types

Against : • Conservative checking, more type errors some ok programs are rejected • Monomorphism, restrictive types (uni-typed functions, homogeneous collections)  reduced flexibility, non-generic programs, lack ofreusability Examples: one needs to write • Uni-typed append functions, one for each type • Uni-typed search tree procedures for each type Since parameters must be declared, this cannot be avoided types

Many users do not accept these restrictions, and prefer dynamically typed language (e.g. scriptingpls) What are the possible solutions? The pl research/development community offers: Polymorphic type systems: types

Kinds of polymorphic type systems: • Parametric polymorphism (a-la ML) • Values, in particular functions, have many types, are reusable • Sub-type polymorphism (a-la OO) • Values, in particular functions have many types, are reusable • Collections can have elements of many types In last two decades, approaches to merging the two have been developed war between the dynamic and static schools is still active types

What next? • We extend FL and TFL with various constructs: recursion, cells, … (depends on available time) For each, we examine semantics and typing • We proceed to the environment model In the future, we discuss ML-style parametric polymorphism, hopefully also sub-type polymorphism types

A type-checking algorithm