Symbols and Type-Checking

Symbols and Type-Checking CPSC 388 Ellen Walker Hiram College

Symbol Table is Central • For scanning & parsing • Distinguish “identifier” vs. “keyword” • Tree “decorations” during parsing • For semantic analysis • Insertions / deletions from declarations / end of scope • Type-checking and making sure variables are declared • Code generation • Associating addresses and/or labels with symbols

Symbol Table • Dictionary Data Structure (Hash table) • Insert / Lookup / Delete • Search key is symbol name • Additional attributes in node class (struct) • E.g. const (value), type, function/variable

A Note on Hash Functions • Hash function should use complete name (all characters) • Avoid collisions with “temp1, temp2, temp3…” • Include character positions (don’t simply add up characters) • Avoid collisions “tempx” vs. “xtemp” • Use mod function often to avoid overflow • (a+b)%m = (a%m + b%m)%m

Symbol Types & Attributes • Constant • final int SIZE = 199 • (constant, type=int, value=199) • Variable • int a; • (variable, type=int)

More Types & Attributes • Structure • struct Entry{char *name; int count} • (structure, size=64bits) • Function • int myFun(char *foo, int bar) • (function, 2 parameters, char* + int)

Declaration Before Use • Every symbol is declared before its first use • Declaration inserts all attributes into symbol table • Look up new “id” in table • If declared, all attributes available • Else compilation error • Allows for “one-pass” compilation

Implicit Declaration • Symbols are inserted into table when first seen • Default attributes (e.g. C function returns int, Fortran variable type chosen by first letter) • Attributes determined by use (e.g. lhs of assignment gets type of rhs of assignment)

Scope / Lifetime of a Symbol • Scope: where is symbol visible? • Global • Within function • Within block ({…}) • Lifetime: when is memory allocated? • Static: from declaration on • Automatic: only when visible • Dynamic: explicit alloc/dealloc (run-time only)

Scope / Lifetime Example (C++) int x; //x is global, automatic int count(){ static t = 0; //t is local to count, static t++; } void main(){ cin >>x; for(int i=0;i<x;i++) //i is local to for, automatic count(); }}

Nested Blocks procedure A { int x //visible in A but not B int y //visible in A and B procedure B { int x //visible in B only … } }

Nested Blocks in Symbol Table • When variable becomes visible, insert into symbol table • Before any other variable with same name • Innermost visible variable “shadows” all others • When variable is no longer visible, delete • Outer value uncovered

Implementations • Sorted list • New key must precede equal keys, stop at first match • Binary Tree • Always go left on equal, stop at NULL left child • Hash table • Insert at beginning of collision list, stop at first match

Explicit Scope Operator • Some languages provide an explicit scope operator, eg. String::last(“abc”) //don’t use a local last fn • To implement, each symbol needs a block id • E.g. name of enclosing function or class

Same-Level Duplicates • Disallowed in most languages • Look up symbol before adding • If symbol is in current block, error • Requires block id (or equivalent) in symbol table • Later value would shadow earlier value • Compiler implementation same as nesting • Code is very confusing!

Sequential Evaluation? int i = 5; { int i = 7; int j = 1+i; // j=8 if sequential, … // j=6 if collateral (parallel) } • Collateral implementation might be more efficient (ML, LISP)

Recursive Declaration int factorial(int x){ //recursive function if (x>0) return x*factorial(x-1); else return x; } Class node{ //recursive data structure int value; node * next; }

Implementing Recursive Declarations • Get name into symbol table as soon as possible • Before finishing function or structure • E.g. decl: name ( args ) {/*update symtab*/} statement-block {/*generate code*/} • Once symbol is in table, it’s ok to use • Using a symbol is not re-declaring it! • Prototype also gets name into symbol table

Mutual Recursion & Prototypes int B(int x); //Prototype for B int A(int x){ //Calls B //B already in symbol table from prototype if (x>0) return B(x-1); } int B(int x){ //Calls A if (x!=1) return A(x/2); }

Declaration Example (p. 311) • let declarations w/ initialization in exp • let x=3,y=5 in z=x+y • let x=3 in (let x=5 in y=x+1) • Attributes (for creating symbol tables) • symtab Current symbol table • nestl Current nesting level • err Boolean - is it an error? • intab/outtab Tables before/after declaration

Declaration Attribute Rules S-> exp //initialization & finalization exp.symtab = emptytable exp.nestlevel = 0 S.err = exp.err exp -> id //id must be in symbol table exp.err = not isin(exp.symtab,id.name)

Initialization Attribute Rule decl->id=exp exp.symtab = decl.intab //current symbols exp.nestl = decl.nestl //current nest level decl.outtab = //output table w/ new id if(decl.intab == errtab) || exp.err || lookup(decl.intab, id.name) == decl.nestl then errtab else insert(decl.intab, id.name, decl.nestl)

Let Statement Attribute Rule exp1 -> let dec-list in exp2 dec-list.intab = exp1.symtab dec-list.nestl = exp1.nestl + 1 //nesting exp2.symtab = dec-list.outtab exp2.nestl = dec-list.nestl exp1.err = (dec-list.outtab == errtab) || exp2.err

Data Types - Definitions • Type • Class of possible values (w/operations) • Type inference • Determine result type based on input types • Type checking • Ensure specified types make sense • Assignment statements • Function calls (parameters)

Simple Data Types • Built-in (predefined) • Directly represented in memory (e.g. int, float, double) • Programmer-defined • Subrange (e.g. 1..10) • Enumerated (e.g. {SU, FA, SP})

Type Constructors • Array • Sequence of elements of the same type • One type, explicit size • Record / Struct • Collection of elements of varied types • Many types, implicit size • Union • Choice of types, implicit size (largest one)

More Type Constructors • Pointer / reference • Address of an object of given type • “Dereference” operation follows the pointer • Reference is automatically dereferenced • Function • Maps parameters (of given types) to return value (of given type)

And finally... • Class • Struct + member functions (methods) • Information hiding (public/private) • Inheritance • Polymorphism

Type Names • Define a name to represent a type • typedef hand = array[1..5] of card • typedef vector<int>::iterator iter • Programmer convenience • Another kind of symbol for the symbol table!

Types are structurally equivalent when they are... • Simple and identical • Arrays of the same size and equivalent element type • Structures of equivalent type elements in the same sequence • Assume equivalence for recursive tests! • Pointers to items of equivalent types

Other kinds of equivalence • Name equivalence • Names must exactly match • More restrictive than structural • Declaration equivalence • Types match if names are the same or … • Types X and Y match if “X=Y” is explicitly declared in the code

Type Inference • Declarations cause type of an id to be entered into a symbol table Var-decl-> id: type-exp insert(id.name, type-exp.type) //associate type to id in the symbol table • Assume an array, struct, type has pointers to its parts

Type Checking (p. 330) stmt -> id := exp If not (typeEqual (lookup(id.name),exp.type) type-error(stmt) stmt -> if exp then stmt If not (typeEqual (exp.type, boolean) type-error(stmt)

Array Type Inference / Checking type-exp1 -> array [num] of type-exp2 Type-exp1.type = new typenode (“array”, num.size, type-exp2.type) exp1 -> exp2 [exp3] if (isArrayType(exp2.type) && typeEqual(exp3.type, integer)) exp1.type = exp2.type.child1 else type-error(exp1)

Overloading • Interpretation of a symbol depends on types of related subexpressions • 5.0 + 6.0 vs. “mystring: “ + “abc” • int max (int A[]) vs. double max(double A[]) • Type attributes from symbol table needed to understand (gen. code for) • a+b • c = max(a,b)

Type Conversion • Type “upgrades” in mixed expressions • float + int -> float • Add rules to grammar • Type of expression is checked after each subexpression • If subexpression is “bigger”, upgrade expression type

Type Conversion in Assignment • Can be “upgrade” or “downgrade” • double x = 1+2; //upgraded from int • int z = 5 / 2.0 //2 (info loss!) • Rule sets LHS type from declaration regardless of expression type • Coercion code must be compiled in • (Language designer’s decision whether compiler will do this)

OO Type Conversion • “upgrade” = assignment of superclass to subclass • “downgrade” = assignment of subclass to superclass (with loss of info) • Very general algorithms exist, but are implemented in few languages

Result of Semantic Analysis • Complete symbol table(s) with attributes • Incorporating scoping rules • Additional attributes for grammar non-terminals • (mostly for building symbol tables) • Determination whether semantic errors have occurred (and where)

Semantic Errors • Undeclared symbol (in this scope) • Multiple declarations (in this scope) • Invalid type for statement • E.g. if (“not boolean”) … • Incompatible types in assignment • Incompatible types in function call / no overload available

Attributes of a type • Name (the symbol in the table) • Size (number of bytes taken up) • Type expression • Array element type and size • Structure components • Union alternative types

Symbols and Type-Checking