Programming Languages 2nd edition Tucker and Noonan

Programming Languages2nd editionTucker and Noonan Chapter 5 Types Types are the leaven of computer programming; they make it digestible. Robin Milner

5.1 Type Errors 5.2 Static and Dynamic Typing 5.3 Basic Types 5.4 Non-Basic Types-through 5.4.2 5.5 Recursive Data Types 5.6 Functions as Types 5.7 Type Equivalence 5.8 Subtypes 5.9 Polymorphism and Generics 5.10 Programmer-Defined Types

What is a Type? • A type is a collection of values and operations defined on those values. • Example: Integer type has values ..., -2, -1, 0, 1, 2, ... and operations +, -, *, /, <, ... • Boolean type has values true and false and operations ||, &&, !, … • Most types have a finite number of values due to fixed size allocation (e.g., can’t express all integers) • Exception: some languages (Java, Haskell) support “unlimited” integer sizes

How Many Types Are Needed? • Early languages, (Fortran, Algol, Cobol), built in all types – no way to define new ones. • If you needed a type color, for example, you could use integers (red = 1, black = 2, etc.) • But - no way to prevent illegal operations. • Types provide ways of effectively modeling a problem solution, so the user should be able to define new types.

5.1 Type Errors • Machine data carries no type information. • Example: 0100 0000 0101 1000 0000 0000 0000 0000 could represent • 3.375 (floating point) • 1,079,508,992 (a 32-bit integer) • 16472 & 0 (two 16-bit integers) • The ASCII characters @ X NUL NUL depending on the interpretation

A type error is an error that arises because an operation is attempted on a data type for which it is undefined. • Type errors are common in assembly language programming; e.g., add two floats using an integer machine operation • Type systems in high level languages reduce the number of type errors by formalizing what is and isn’t allowed. • A typesystem defines the bindings between a variable’s type, values, and the operations that can be performed on the values. • provides a basis for detecting type errors.

5.2 Static and Dynamic Typing • A type systems include constraints that cannot be expressed in BNF, EBNF, or any other notation system for context-free grammars. • Type checking is part of semantic processing • Some languages perform type checking at compile time (e.g, C/C++). (compile-time semantics) • Others (e.g., Perl, Python) perform type checking at run time. • Still others (e.g., Java) do both.

Static/Dynamic Definitions • A language is staticallytyped if the types of all variables are fixed when they are declared at compile time. • A language is dynamicallytyped if the type of a variable can vary at run time depending on the value assigned.

Definition: Strong Typing • A language is stronglytyped if its type system allows all type errors to be detected (either at compile time or at run time.) • Alternate definitions of strong typing exist, however • A strongly typed language can be either statically (Ada, Java) or dynamically (Scheme, Perl) typed • By this definition, C & C++ aren’t strongly typed • For example, union types aren’t type checked

5.3 Basic Types • Consist of atomic values (e.g., floats) • Sometimes called simple or primitive types. • Traditionally, correspond for the most part to data types provided by computers and occupy fixed amounts of memory, based on computer memory structure.

Common Memory Sizes for the Basic Types (On 32-bit Computers) • Nibble: 4 bits • Byte: 8 bits • Half-word: 16 bits • Word: 32 bits • Double word: 64 bits • Quad word: 128 bits In most languages, simple data types are limited to one of these sizes

Basic Types in C, Ada, Java Type C Ada Java Byte --- --- byte Integer short, int, long integer short, int, long Real float, double float, decimal float, double Character char character char Boolean---booleanboolean

Memory Requirements • In some languages some bindings of type to memory size are implementation dependent. • For example, in C/C++ the only language restriction for integers is that sizeOf(short)<=sizeOf(int)<= sizeOf(long) • Java defines size of each (half-word, word, double word)

Floating Point Representation • Floating point numbers (reals) are expressed as the product of a binary number and a power of 2. • IEEE standard is used on computers designed/built after 1980 • If s = sign bit, e = exponent (in excess 127 notation) and m is the mantissa, a single precision representation means (-1)s x 1.m x 2e-127

IEEE 754 Floating Point Number Representation Figure 5.1 (-1)s x 1.m x 2e-127

Floating Point Problems • Integers are exact, floats aren’t necessarily so. • e.g., since 0.2 cannot be represented exactly as a binary fraction, the floating point representation is not exact; result: 0.2 * 5 is not exactly 1.0 • Overflow and underflow are possible • Inaccuracies due to floating point arithmetic may affect results; for example, it’s possible thata + (b + c) ≠ (a + b) + c

Example: a = -1324x103, b = 1325 x 103, c = 5424 x 100 • Assume 4 significant figures for the fraction. (a + b) = -1324 x 103 + 1325 x 103 = 1 x 103 (a + b) + c = 1000 x 100 + 5424 x 100 = 6424 x 100 • Whereas (b + c) = 1325 x 103 + 0005 x 103 = 1330 x 103 a + (b + c) = -1324 x 103 + 1330 x 103 =6 x 103 = 6000 x 100 This is sometimes called representational error

Floating Point Range • Single precision floating point range: • +/- 1038 to 10-38 • Double precision: • +/- 10308 to 10-308 • Overflow: the number is too large to represent • Underflow: the number is so close to zero it cannot be represented

Overflow & Underflow • Arithmetic overflow may occur and give an incorrect result, regardless of whether the values are integers or floats. • Some machines generate an exception when this happens, some (e.g., Java Virtual Machine) don’t. • Underflow occurs when the absolute value of the exponent is too large to fit in the exponent field – only a problem with FP numbers • Some machines flag overflow, some (e.g., JVM, don’t. • With underflow, the number may be set to 0 or some other small number and the hardware may set some kind of status bit or interrupt flag.

Java’s BigInteger and BigDecimal type http://java.sun.com/j2se/1.5.0/docs/api/java/math/BigInteger.html arbitrary-precision integers, semantics of arithmetic operations exactly mimic those of Java's integer arithmetic operators http://java.sun.com/j2se/1.5.0/docs/api/java/math/BigDecimal.html arbitrary-precision signed decimal numbers, consist of an arbitrary precision integer unscaled value and a 32-bit integer scale. A non-negative scale represents the #of digits to the right of the decimal, a negative scale means to multiply the unscaled value by 10-scale. • Some applications; e.g., computer security, need very large numbers. • Not directly supported by hardware – software emulation.

Using Data Types in Expressions • When typed variables, literals, functions are used in programming language statements, several issues arise: • Overloading • Mixed-type expressions • Mixed-type assignments

Operator Overloading • An operator or function is overloaded when its meaning varies depending on the types of its operands or arguments or result. • More about function overloading later • Java: a + b • floating point add • integer add • string concatenation • What if a is an integer and b is a float?

Mixed Mode Expressions • Mixed mode: operands have different types; e.g., one operand an int, the other floating point • Why is it a problem? (Difference in hardware operations) • Most languages have type rules that define how mixed mode expressions will be treated. • Occur in assignments as well as expressions.

Mixed Mode & Type Conversions • Compilers insert code that performs temporary type conversions to make mixed type operands compatible. • Coercion: The conversions are done automatically by the language according to a set of type rules. (implicit conversions) • Casting: The conversion must be specifically requested by the user. (explicit conversions)

Type Conversions • A type conversion is a narrowing conversion if the result type permits fewer bits, thus potentially losing information. • e.g., float to int, or long to short • Otherwise it is termed a widening conversion.

Implicit v Explicit Conversions • Language design decision: Should type conversions be done automatically by the compiler (implicit) or should the language require a cast (explicit)? • Safest: implicit widening conversions are OK; narrowing conversions should be explicit.

Implicit v Explicit Type Conversions intsum; intn; float average; average = sum / n; float num1; float num2; intsum; sum = num1 + num2; C++ widening example C++ narrowing example

Other Basic Types – Design Decisions • Should the language include a Boolean type? • What character set should be used? • Traditionally, ASCII: 7-bit code stored in an 8-bit byte, which allows for 127 characters • ASCII Latin-1 variant uses 8 bits to encode 191 characters for languages with larger alphabets • EBCDIC: 8-bit code developed by IBM • Unicode: UTF-8, UTF-16, UTF-32. Java uses UTF-16 (16-bit code); currently has codes for about 50,000 characters.

5.4 Nonbasic Types • Nonbasic types: • Composed from other data types (e.g., arrays, lists, … ) • Also called structured types • Size of the values of these data types is not fixed by the language but is programmer determined. • Don’t correspond directly to hardware data types.

5.4.1 Enumeration Types • Provided by many languages as a way to improve program readability • Essentially, allow the programmer to assign names to integer values and then use the names in programming (e.g., color, shape, day, …)

Enumeration Types – C/C++ • Example: • enum Day {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday}; • Day myDay = Wednesday; • Values can be compared: • if (myDay == Sunday) ... • Implicit enumeration-type-to-integer conversion is okay; use a cast to go the other direction: • for(myDay = Monday; myDay < Friday; Day(myDay + 1))

Enumeration Types - Java • More powerful than C/C++ version because the Enum class inherits methods from the Java class Object, such as toString. • for (Day d : Day.values()) • System.out.println(d);

5.4.2 Pointers • Design decision: to have pointer types, or not? • C, C++, Ada, Pascal • Java??? • History • Early languages (Fortran, COBOL): static memory • Later languages added pointers for dynamic memory allocation (Pascal, C, C++, Ada) • Modern languages like Java do not have explicit pointers but still support dynamic memory allocation. Reference variables are the equivalent of pointer variables. • Scheme, Prolog, Python, dynamically typed languages in general also have implicit pointers.

Pointer Operations • Pointers and references support the construction of linked data structures/data types. • The value of a pointer type variable is a memory address. • C/C++ operators: • address-of (unary &) returns the address of a variable • dereferencing (unary *) returns the value of a reference • Corresponds to indirect addressing mode in machine code • For an arbitrary variable x: *(&x) = x

C/C++ Examples – Linked List Definitions struct Node { int key; struct Node* next; }; struct Node* head; Node is a recursive data structure

Fig 5.4: A Simple Linked List in C/C++

Pointer Pro and Cons • Pro: Able to build structures that are the “right” size & shape • Dynamic data allocation • “Linkability” • But:…. • Error-prone (mistakes in usage) • Memory leaks (from non-freed dynamic memory) • In C/C++, the duality between pointers and array references can be confusing

C Array/Pointer Example Semantically Identical Functions floatsum(floata[], intn) { inti; float s = 0.0; for (i = 0; i<n; i++) s += a[i]; return s; float sum(float *a, intn) { inti; float s = 0.0; for (i = 0; i<n; i++) s += *a++; return s; Array Notation Pointer Notation

strcpy: A “bad” (good?) C example void strcpy(char *p, char *q) { while (*p++ = *q++) ; // assignment expression ↑ } Strings p and q are arrays of characters ending in a NUL character, which is interpreted as false. (Note that if p is shorter than q, the array boundaries will be overrun.)

Reference Types in Java • In Java, all nonprimitive types are reference types • A variable from a reference type stores the address of a memory object; i.e., it is the equivalent of a pointer variable . • References differ from pointers in that they don’t require specific dereferencing operations. • Dynamic allocation and the construction of linked structures can be done with references as well as with pointers.

5.4.3Arrays and Lists • Arrays: • Sequence of values, accessible by index • All array values are from the same type • Design issues: • Number of dimensions/declaration syntax/index type • Are array dimensions static or dynamic? • Must the array be rectangular? • Are the boundaries of the index set determined by the language or the programmer? • Should the language enforce bounds checking?

Design Issues for Arrays • Declaration syntax/number of dimensions • Technically, C/C++ only allow one dimensionint C[4][3]; //array of 4 3-element rows • Compare to Pascal:TYPE Num = ARRAY[1..4, 1..3] OF Integer;VAR Num:C;and Fortran:integer C(4,3)early Fortran version: DIMENSION C(4,3) • Early languages set an upper limit on the number of dimensions

Array Design Issues • Index type • Usually restricted to integers • Pascal allows any integral type: integers, char, enum

Array Design Issues • Are the bounds of the index set fixed? • C-like languages: (0, …, n-1) • Fortran: defaults to (1, 2, …, n) but modifiable: • INTEGER Counts (-20:20) or twoD (0:3, 1:10) • Pascal & Ada: programmer determined • Ada: a : array(-4..5) of INTEGER; • Pascal: TYPE nums=ARRAY[10…19]OFIntegernums: Numbers

Array Design Issues • Are array dimensions static or dynamic? • Pascal: always static; array dimensions are part of the array type. • Java arrays are sized at runtime, don’t change thereafter • C/C++ permits runtime declaration of array bounds for dynamically declared arrays:int numbers[100];//static sized arrayint* intPtr;cin >> n;intPtr = new int[n];//dynamic array

Array Design Issues • Must the array be rectangular? (that is, should every row be the same length) • C# has jagged arrays, with rows of different length:An array of arrays. int [ ][ ] numbers = new int [3][ ]; // three rowsnumbers[0] = new int[3]; // # of elements in row 0 numbers[1] = new int[4]; // # of elements in row 1 numbers[2] = new int[2]; // # of elements in row 2 numbers[0][0] = 1; Numbers[0][1] = 2; // etc. • VB.Net also permits jagged arrays

Design Issues • Should the language enforce bounds checking? • Java and Ada do; Pascal provides it as an option • Tradeoff – safety versus efficiency • Overwriting array boundaries is one of the most common (and hardest to find) errors in a program • C & C++ generate a memory protection error if the reference is outside the boundaries of the program address space, otherwise no error message is given

Implementation Issues • Dope Vector: array data needed at compile time or run time or both; at a minimum: • Element size & type • Index type/range for each dimension • Start address of array in memory is also required and kept in the symbol table. • Used to calculate memory addresses, & do index range checking if included in language • Building dope vectors is part of the semantic analysis phase of translation

Example of Bounds Checking/Java • Consider the Java declarationint[ ] A = new int[n]; • At runtime, every reference A[i] must be checked: is 0 ≤ i < n ? • Portions of the dope vector must be present at run-time for checking. • C/C++ don’t do bounds checking; so array size not needed at runtime.

Dope Vectors For a one-dimensional array of integers, where integers take 4 bytes of storage • For a two-dimensional array of characters, where characters take 2 bytes

Programming Languages 2nd edition Tucker and Noonan