300 likes | 419 Vues
The secret life of typecheckers. Introduction. This presentation is modeled on a paper by Luca Cardelli (Bell Labs, 1985) A general view of type-checking will be presented from the perspective of the programming language designer We will explore type systems past, present and future.
E N D
Introduction • This presentation is modeled on a paper by Luca Cardelli (Bell Labs, 1985) • A general view of type-checking will be presented from the perspective of the programming language designer • We will explore type systems past, present and future
A little history • Type systems have been around longer than computers • In the 1920s David Hilbert started a program to formalize mathematics as strings of symbols manipulated by logic/grammar rules • Idea was to be able to “mechanically” prove things • Bertrand Russell understood the problems with self-reference and approached Hilbert’s challenge by assigning entities to types • Entities of each type are built up from entities of the preceding type • In 1931 Kurt Gödel proved that consistent systems of any complexity are incomplete, ending Hilbert’s program • Application to programming languages • Computing involves representing and manipulating entities as strings of symbols • Problems of representation and self-reference crop up in numerous ways • We want to mechanically prove things about programs • Types support this
What are types, really? • Types come into play whenever we have a universe of diverse things with a similar representation • Bits in a computer’s memory • XML strings • DNA • If you consider these things in the absence of a type system, you have an “untyped universe” • This means there is really only one type (the memory word, the DNA base pair, etc.)
Operations in untyped universes • Any such universe has various operations that can be performed: • Adding and subtracting (bit strings) • Rendering HTML (XML) • Transcription/translation (DNA) • But these operations are only valid on subsets of the untyped universe • Some XML strings represent HTML documents and some don’t • Some DNA sequences represent valid genes and some don’t • What happens if you blow it? • Tumbolia, the land of dead hiccups and extinguished lightbulbs(Douglas Hofstadter) • The major purpose of a type system is to avoid embarrassing questions about representations, and to forbid situations where these questions might come up (Cardelli)
Type-checking and programming languages • Type-checking avoids these embarrassing questions • Assigns types to constants, operators, variables, and functions • Checks that every operation is performed on inputs of the correct type • Accepts programs that can be proven to have no type errors • Type-checker reads program code and says “ok” or“not ok and here’s why” • By comparison • An interpreter reads program code and executes the instructions • A compiler reads program code and translates it into a different representation of the same program
Type systems in programming languages • The term type system refers to the range of types that can be assigned to variables and values • Base types: int, float, double, etc. • User-defined types (e.g. classes, parameterized types, etc.) • Type systems are somewhat arbitrary, and inspired largely by the typical instruction sets of modern computers • You can create different type systems for the same language that are more or less expressive • Inexpressive type systems are frustrating; they either accept too many erroneous programs, or forbid too many correct ones • Expressive type systems are more precise, rejecting as many erroneous programs as possible and accepting a greater percentage of correct ones
Expressiveness and abstract data types • Imagine a type system that supports only the types intand object • Now you’re compiling this function: intfoo (object o) { return o.bar(); } • Does the type system say yes or no? • If yes, we’re overly permissive – the type-checker doesn’t know whether the “bar” method is really available • If no, we’re overly restrictive • The type system needs to be more expressive – needs to include separate types for each class, etc. • Expressiveness means having a rich language of types enabling the type-checker to determine with the greatest possible precision whether it should accept programs or not
Polymorphism and type inference • Polymorphism gives type-checkers an even bigger headache • Requires a major increase in expressiveness • What is the type of a generic List class? • What is the type of a generic Sort function? • Type checking is simplified by having programmers annotate programs with type information • However this gets painful as the type system becomes expressive • Solution is type inference – let the computer figure out all the types • The goal of type-checking research:Maximize the expressiveness of type systems while minimizing the need for programmers to annotate programs with complex type information
Examples • The best way to explore the subtleties of type systems is to work through examples • Let’s try a few…
Subtyping class Base { }; class Derived : public Base { }; void main(char* args[]) { Base *b = new Derived (); Derived *d = b; } • Is this typesafe? • Should it be accepted by the compiler? • If you add a dynamic cast (i.e. add further annotations to help the compiler), will the compiler add a runtime check? Should it?
Apples and oranges // from one header file struct Apple { int x; }; void appleProcessingService (Apple* a) { } // from another header file struct Orange { int x; }; // source file void main(char* args[]) { appleProcessingService (new Apple()); appleProcessingService (new Orange ()); } • Is this typesafe? Should it be accepted by the compiler? Why (or why not)?
How about this one? // from header file, US edition of software struct Apple { int x; }; void appleProcessingService (Apple* a) { } // from header file, French edition of software structPomme { int x; }; // source file void main(char* args[]) { appleProcessingService (new Apple()); appleProcessingService (new Pomme()); } • Is this typesafe? Should it be accepted by the compiler? Why (or why not)?
Math expressions void main (char* args[]) { int x = 123; int y = 234; int z = x / y; } • Is this typesafe? Should it be accepted by the compiler?
Wouldn’t it be cool if… • We had a “rational” datatype? void main (char* args[]) { int w = 123; int x = 234; rational y = w / x; rational z = w ^ 0.5; } • Any problems here?
What kind of error is this? void main (char* args[]) { int x = 1; int y = 0; int z = x / y; } • Could type systems help us here?
What if we introduced … • A “nonzero” datatype? • Say the compiler requires the divisor to be of type “nonzero”: void main (char* args[]) { int x = 1; nonzero y = 0; int z = x / y; } • Good idea? Or not?
Fibonacci strikes back • Is this typesafe? Could a type-checker prove it? nonzero fib (int x) { if (x < 2) { return 1; } else { return fib(x-1) + fib(x-2); } } • How about this? intinputAndParseNumberFromUser () { } void main (char* args[]) { nonzero x = inputAndParseNumberFromUser (); } • Options?
User-constructed types • Data abstraction implies the ability for programmers to create new types • How do we express the type of variable fooin this example? struct { int x; float y; } foo; • Type theorists usually write the type something like this: (int, float) • The type of an array of integers would be: [int] • An array of arrays of integers would be: [[int]] • The type of a function with an int argument returning a float would be: int → float
User-constructed types • The operators (), [], and → are type constructors • They take types as arguments and define new types • Once you have type constructors, your type system can contain as many types as you like • Type-checker has to cope with all of this, providing a syntax for programmers to write all these types If necessary
Polymorphism • When introducing polymorphic constructs into the language, type constructors are not enough • Type of the Length function for arrays of integers: • [int] → int • Type of the Length function for arrays of anything: • forall (T) { [T] → int } • Introducing polymorphic types into a type system is analogous to introducing functions into a programming language • The above type could also be written: • forall (U) { [U] → int } • U is a type variable and “forall” provides type abstraction • Use of “forall” is called universal quantification because any type can be plugged in to U • Polymorphic types can be specialized: • type V = forall (U) { [U] → int } • type W = V<string>
Why have type notation? • Why do we feel the need to write out these complicated types? • If you’re writing a function, you only need to write the types of the return value and arguments, not the function itself • Two reasons: • If you’re programming with higher order functions (which we’ll be doing more of in the future) it’s helpful to write these types • These functions do have types, regardless of whether we’re writing them out – it would be nice to have a standard notation
Bounded quantification • Bounded quantification is the idea that that only some types can be plugged in to U • For example, if you had a Length function which could only be used on arrays of different kinds of numbers, you could write: • type T = forall (U :: U <= number) { [U] → int } • U is constrained to be a number (or subtype thereof) • But what if you do this? • type V = T<string> • Is that a type mismatch?
Types and kinds • In the spirit of Russell, computer scientists generally like to keep these levels separate. • Higher-level types which ensure correctness of types are called “kinds” • This level of checking is referred to as “kind checking” • There are countless papers floating around with titles like “Is type a type?”and“A new programming language with type : type” • They are exploring the question of whether a type system can operate on itself, or whether levels should be kept separate.
C++ bounded quantification question template <typename T> class Copier { T myStruct; public: void copy () { myStruct.x = myStruct.y; } }; struct IntPair { int x, y; }; struct FloatPair { float x, y; }; struct BogusPair { float x; char* y; }; void main(char* args[]) { Copier<IntPair> cip; cip.copy(); Copier<FloatPair> cfp; cfp.copy(); Copier<BogusPair> cbp; // cbp.copy(); }
Existential quantification • The type of a function that takes an array of T and returns an integer, for some single type T: exists(T) { [T] → int } • At this point we have implicitly defined a type T • We know nothing about type T, except that… • A function of the above type could take a list of them and return an int • T is intuitively a little like a class • It is a type, and we don’t know anything about how it works, but we know a way in which we can use it • Universal and bounded quantification provide the theoretical basis for parameterized types • Existential quantification provides the theoretical basis for information hiding
Type inference • As type systems become more complicated, it becomes more burdensome for programmers to write out types • Would you write expressions like this? forall (U :: U <= number) { [U] → int } myFunction (…) { … } • No – you would just avoid higher order functions • The solution is type inference
Type inference • Allows programmers to omit type declarations and have the compiler infer them • Promises all of the expressiveness of dynamic languages, but with static type safety • Research in this area has come a long way – but there are still valid, type-safe programs which type inference engines cannot handle • Rudimentary type inference (local variables) is coming in .NET 3.0 • Given that type theory experts like Simon Peyton-Jones are at Microsoft we can expect to see this area of .NET evolve rapidly
Ideas for the future • Continue improving type system expressiveness and type inference engines • How about having the type-checker interact with the programmer? e.g. • “Can I assume this will always be an odd number?” • “Can I assume that no instances of this class are constructed outside of this source tree?” • How about monitoring running programs to generate better type annotations for use in future compilations? • How about a graphical interface for creating and manipulating type information
Conclusion • Type-checking is not a simple, tidy field • It’s a matter of tradeoffs and judgment • More expressiveness means that programming languages can become more powerful and polymorphic without compromising type safety • However more expressiveness = more pain for programmers • Working with higher-order functions is great, but not if you have to type 10 lines of type declarations for each line of code • Type inference is a promising solution • Holy grail is to provide all the power of dynamic languages like Lisp, Python, and Ruby with the type safety of C++ and no need to write a single type declaration