410 likes | 539 Vues
This document explores the essential concepts of symbol tables and type-checking in compiler design. Key topics include the distinction between identifiers and keywords, the role of symbol tables during parsing, semantic analysis, and the procedures for managing scopes and variable lifetimes. It details the structure and implementation of symbol tables using dictionary data structures like hash tables, as well as the insertion, lookup, and deletion processes. Additionally, it covers attribute handling for constants, variables, and functions, providing examples and guidelines for declaration before use and recursive declarations.
E N D
Symbols and Type-Checking CPSC 388 Ellen Walker Hiram College
Symbol Table is Central • For scanning & parsing • Distinguish “identifier” vs. “keyword” • Tree “decorations” during parsing • For semantic analysis • Insertions / deletions from declarations / end of scope • Type-checking and making sure variables are declared • Code generation • Associating addresses and/or labels with symbols
Symbol Table • Dictionary Data Structure (Hash table) • Insert / Lookup / Delete • Search key is symbol name • Additional attributes in node class (struct) • E.g. const (value), type, function/variable
A Note on Hash Functions • Hash function should use complete name (all characters) • Avoid collisions with “temp1, temp2, temp3…” • Include character positions (don’t simply add up characters) • Avoid collisions “tempx” vs. “xtemp” • Use mod function often to avoid overflow • (a+b)%m = (a%m + b%m)%m
Symbol Types & Attributes • Constant • final int SIZE = 199 • (constant, type=int, value=199) • Variable • int a; • (variable, type=int)
More Types & Attributes • Structure • struct Entry{char *name; int count} • (structure, size=64bits) • Function • int myFun(char *foo, int bar) • (function, 2 parameters, char* + int)
Declaration Before Use • Every symbol is declared before its first use • Declaration inserts all attributes into symbol table • Look up new “id” in table • If declared, all attributes available • Else compilation error • Allows for “one-pass” compilation
Implicit Declaration • Symbols are inserted into table when first seen • Default attributes (e.g. C function returns int, Fortran variable type chosen by first letter) • Attributes determined by use (e.g. lhs of assignment gets type of rhs of assignment)
Scope / Lifetime of a Symbol • Scope: where is symbol visible? • Global • Within function • Within block ({…}) • Lifetime: when is memory allocated? • Static: from declaration on • Automatic: only when visible • Dynamic: explicit alloc/dealloc (run-time only)
Scope / Lifetime Example (C++) int x; //x is global, automatic int count(){ static t = 0; //t is local to count, static t++; } void main(){ cin >>x; for(int i=0;i<x;i++) //i is local to for, automatic count(); }}
Nested Blocks procedure A { int x //visible in A but not B int y //visible in A and B procedure B { int x //visible in B only … } }
Nested Blocks in Symbol Table • When variable becomes visible, insert into symbol table • Before any other variable with same name • Innermost visible variable “shadows” all others • When variable is no longer visible, delete • Outer value uncovered
Implementations • Sorted list • New key must precede equal keys, stop at first match • Binary Tree • Always go left on equal, stop at NULL left child • Hash table • Insert at beginning of collision list, stop at first match
Explicit Scope Operator • Some languages provide an explicit scope operator, eg. String::last(“abc”) //don’t use a local last fn • To implement, each symbol needs a block id • E.g. name of enclosing function or class
Same-Level Duplicates • Disallowed in most languages • Look up symbol before adding • If symbol is in current block, error • Requires block id (or equivalent) in symbol table • Later value would shadow earlier value • Compiler implementation same as nesting • Code is very confusing!
Sequential Evaluation? int i = 5; { int i = 7; int j = 1+i; // j=8 if sequential, … // j=6 if collateral (parallel) } • Collateral implementation might be more efficient (ML, LISP)
Recursive Declaration int factorial(int x){ //recursive function if (x>0) return x*factorial(x-1); else return x; } Class node{ //recursive data structure int value; node * next; }
Implementing Recursive Declarations • Get name into symbol table as soon as possible • Before finishing function or structure • E.g. decl: name ( args ) {/*update symtab*/} statement-block {/*generate code*/} • Once symbol is in table, it’s ok to use • Using a symbol is not re-declaring it! • Prototype also gets name into symbol table
Mutual Recursion & Prototypes int B(int x); //Prototype for B int A(int x){ //Calls B //B already in symbol table from prototype if (x>0) return B(x-1); } int B(int x){ //Calls A if (x!=1) return A(x/2); }
Declaration Example (p. 311) • let declarations w/ initialization in exp • let x=3,y=5 in z=x+y • let x=3 in (let x=5 in y=x+1) • Attributes (for creating symbol tables) • symtab Current symbol table • nestl Current nesting level • err Boolean - is it an error? • intab/outtab Tables before/after declaration
Declaration Attribute Rules S-> exp //initialization & finalization exp.symtab = emptytable exp.nestlevel = 0 S.err = exp.err exp -> id //id must be in symbol table exp.err = not isin(exp.symtab,id.name)
Initialization Attribute Rule decl->id=exp exp.symtab = decl.intab //current symbols exp.nestl = decl.nestl //current nest level decl.outtab = //output table w/ new id if(decl.intab == errtab) || exp.err || lookup(decl.intab, id.name) == decl.nestl then errtab else insert(decl.intab, id.name, decl.nestl)
Let Statement Attribute Rule exp1 -> let dec-list in exp2 dec-list.intab = exp1.symtab dec-list.nestl = exp1.nestl + 1 //nesting exp2.symtab = dec-list.outtab exp2.nestl = dec-list.nestl exp1.err = (dec-list.outtab == errtab) || exp2.err
Data Types - Definitions • Type • Class of possible values (w/operations) • Type inference • Determine result type based on input types • Type checking • Ensure specified types make sense • Assignment statements • Function calls (parameters)
Simple Data Types • Built-in (predefined) • Directly represented in memory (e.g. int, float, double) • Programmer-defined • Subrange (e.g. 1..10) • Enumerated (e.g. {SU, FA, SP})
Type Constructors • Array • Sequence of elements of the same type • One type, explicit size • Record / Struct • Collection of elements of varied types • Many types, implicit size • Union • Choice of types, implicit size (largest one)
More Type Constructors • Pointer / reference • Address of an object of given type • “Dereference” operation follows the pointer • Reference is automatically dereferenced • Function • Maps parameters (of given types) to return value (of given type)
And finally... • Class • Struct + member functions (methods) • Information hiding (public/private) • Inheritance • Polymorphism
Type Names • Define a name to represent a type • typedef hand = array[1..5] of card • typedef vector<int>::iterator iter • Programmer convenience • Another kind of symbol for the symbol table!
Types are structurally equivalent when they are... • Simple and identical • Arrays of the same size and equivalent element type • Structures of equivalent type elements in the same sequence • Assume equivalence for recursive tests! • Pointers to items of equivalent types
Other kinds of equivalence • Name equivalence • Names must exactly match • More restrictive than structural • Declaration equivalence • Types match if names are the same or … • Types X and Y match if “X=Y” is explicitly declared in the code
Type Inference • Declarations cause type of an id to be entered into a symbol table Var-decl-> id: type-exp insert(id.name, type-exp.type) //associate type to id in the symbol table • Assume an array, struct, type has pointers to its parts
Type Checking (p. 330) stmt -> id := exp If not (typeEqual (lookup(id.name),exp.type) type-error(stmt) stmt -> if exp then stmt If not (typeEqual (exp.type, boolean) type-error(stmt)
Array Type Inference / Checking type-exp1 -> array [num] of type-exp2 Type-exp1.type = new typenode (“array”, num.size, type-exp2.type) exp1 -> exp2 [exp3] if (isArrayType(exp2.type) && typeEqual(exp3.type, integer)) exp1.type = exp2.type.child1 else type-error(exp1)
Overloading • Interpretation of a symbol depends on types of related subexpressions • 5.0 + 6.0 vs. “mystring: “ + “abc” • int max (int A[]) vs. double max(double A[]) • Type attributes from symbol table needed to understand (gen. code for) • a+b • c = max(a,b)
Type Conversion • Type “upgrades” in mixed expressions • float + int -> float • Add rules to grammar • Type of expression is checked after each subexpression • If subexpression is “bigger”, upgrade expression type
Type Conversion in Assignment • Can be “upgrade” or “downgrade” • double x = 1+2; //upgraded from int • int z = 5 / 2.0 //2 (info loss!) • Rule sets LHS type from declaration regardless of expression type • Coercion code must be compiled in • (Language designer’s decision whether compiler will do this)
OO Type Conversion • “upgrade” = assignment of superclass to subclass • “downgrade” = assignment of subclass to superclass (with loss of info) • Very general algorithms exist, but are implemented in few languages
Result of Semantic Analysis • Complete symbol table(s) with attributes • Incorporating scoping rules • Additional attributes for grammar non-terminals • (mostly for building symbol tables) • Determination whether semantic errors have occurred (and where)
Semantic Errors • Undeclared symbol (in this scope) • Multiple declarations (in this scope) • Invalid type for statement • E.g. if (“not boolean”) … • Incompatible types in assignment • Incompatible types in function call / no overload available
Attributes of a type • Name (the symbol in the table) • Size (number of bytes taken up) • Type expression • Array element type and size • Structure components • Union alternative types