Semantic Analysis (Symbol Table and Type Checking)
Semantic Analysis (Symbol Table and Type Checking). Chapter 5. The Compiler So Far. Lexical analysis Detects inputs with illegal tokens Parsing Detects inputs with ill-formed parse trees Semantic analysis (contextual Analysis) Catches all remaining errors. What ’ s Wrong?. Example 1
Semantic Analysis (Symbol Table and Type Checking)
E N D
Presentation Transcript
The Compiler So Far • Lexical analysis • Detects inputs with illegal tokens • Parsing • Detects inputs with ill-formed parse trees • Semantic analysis (contextual Analysis) • Catches all remaining errors
What’s Wrong? • Example 1 int y = x + 3; • Example 2 String y = “abc” ; y ++ ;
Why a Separate Semantic Analysis? • Parsing cannot catch some errors • Some language constructs are not context-free • Example: All used variables must have been declared (i.e. inscope) • ex: { int x { .. { .. x ..} ..} ..} • Example: A method must be invoked with arguments of proper type (i.e. typing) • ex: int f(int, int) {…} called by f(‘a’, 2.3, 1)
More problems require semantic analysis • Is x a scalar, an array, or a function? • Is x declared before it is used? • Is x defined before it is used? • Are any names declared but not used? • Which declaration of x does this reference? • Is an expression type-consistent? • Does the dimension of a reference match the declaration? • Where can x be stored? (heap, stack, . . . ) • Does *p reference the result of a malloc()? • Is an array reference in bounds? • Does function foo produce a constant value?
Why is semantic analysis hard? • need non-local information • answers depend on values, not on syntax • answers may involve computation
Symbol Tables • Symbol Tables Environments • Mapping IDs to Types and Locations • Definitions Insert in the table • Use Lookup ID • Scope • Where the IDs are “visible” Ex: formal parameters, local variables in MiniJava -> inside the method where defined -- (private) variables in a class -> inside the class -- (public) method : visible anywhere (unless overridden)
Environments • A set of bindings ( -> ) Initial Env s0 Class C { int a; int b; int c; Env s1 = s0 + {a -> int, b -> int, c -> int} public void m() { System.out.println(a+c); int j = a+b; Env s2 = s1 + {j -> int} String a = “hello”; Env s3 = s2 + {a -> String} System.out.println(a);
Environments (Cont’d) Env s3 = s2 + {a -> String} System.out.println(a); System.out.println(a); System.out.println(a); } Env s1 } Env s0
Implementing Environments • Functional Style • Keep previous env and create new one • When restored, discard new one and back to old • Imperative Style • Destructive update the env(symbol tables) • Undo : need “undo stack”
Multiple Symbol Tables : ML-style structure M = sturct structure E = struct val a = 5 end s0 + s2 structure N = struct val b = 10 val a = E.a + b end s0 + s2 + s4 structure D = struct val d = E.a + N.a end Ends7 Initial Env s0 s1 = {a -> int} s2 = {E -> s1 } s3 = {b -> int,a -> int} s4 = {N -> s3 } s5 = {d -> int} s6 = {D -> s5 } s7 = s2 + s4 + s6
Multiple Symbol Tables : Java-style Initial Env s0 s1 = {a -> int} s2 = {E -> s1 } s3 = {b -> int,a -> int} s4 = {N -> s3 } s5 = {d -> int} s6 = {D -> s5 } s7 = s2 + s4 + s6 Package M; s7 class E { static int a = 5; } s7 class N { static int b = 10 static int a = E.a + b } s7 class D { static int d = E.a+ N.a } s7 End s7
Implementation – Imperative Symbol Table(inefficient nondestructive update) Using a Hash Table Update s s’ = s + {d |-> t4} Undo a t1 d t4 b t3 c t2 See Appel Program 5.2 (p106)
Implementation – Functional Symbol Table • Efficient Functional Approach s’ = s + {a |-> t} would return [s + {a |-> t} ] • If implemented with a Hashtable would have to create O(n) buckets for each scope • Is this a good idea?
dog bat dog camel emu 42 1 3 2 3 Implementation - Tree m1 m2 How could this be implemented? m2 = {m1 + emu |-> 42 } Want m2 from m1 in O(n) m1 = { bat |-> 1 , camel |-> 2, dog |-> 3 }
Symbols v.s Strings as table key • Symbol: • a wrapper for Stirngs • Symbol Representation • Comparing symbols for equality is fast. • Extracting an integer hash key is fast. • Comparing two symbols for “greater-than” is fast. • Properties: • Symbol s1,s2 => • s1 == s2 iff s1.equals(s2) iff s1.string == s2.string • publicclassSymbol{ publicStringtoString(); publicstaticSymbolgetSymbol(Stringn); }
symbol.Symbol publicclassSymbol{ publicStringname; // Symbol cannot be constructed directly privateSymbol(Stringn){name=n;} publicStringtoString(){ returnname;} privatestaticMapmap=newHashtable(); publicstaticSymbolgetSymbol(Stringn){ // or symbol(..) in book Stringu=n.intern(); Symbols=(Symbol)map.get(u); if(s==null){ s=newSymbol(u); map.put(u,s); } returns; } }
s a t1 c c c c b c t4 t4 t4 t4 t4 t4 a t3 b t2 Symbol Table Implementastion(efficient destructive update) Using a Hash Table top: Symbol marker: Binder null null
Some sample program(I) /** * The Table class is similar to java.util.Dictionary, * except that each key must be a Symbol and there is * a scope mechanism. */ public class Table { private java.util.Dictionary dict = new java.util.Hashtable(); private Symbol top; private Binder marks; public Table(){}
Some sample program(II) /** * Gets the object associated with the specified * symbol in the Table. */ public Object get(Symbol key) { Binder e = (Binder)dict.get(key); if (e==null) return null; else return e.value; } /** * Puts the specified value into the Table, * bound to the specified Symbol. */ public void put(Symbol key, Object value) { dict.put(key, new Binder(value, top, (Binder)dict.get(key))); top = key; }
Some sample program(III) /** * Remembers the current state of the Table. */ public void beginScope() {marks = new Binder(null,top,marks); top=null;} /** * Restores the table to what it was at the most recent * beginScope that has not already been ended. */ public void endScope() { while (top!=null) { Binder e = (Binder)dict.get(top); if (e.tail!=null) dict.put(top,e.tail); else dict.remove(top); top = e.prevtop; } top=marks.prevtop; marks=marks.tail; }
Some sample program(IV) package Symbol; class Binder { Object value; Symbol prevtop; Binder tail; Binder(Object v, Symbol p, Binder t) { value=v; prevtop=p; tail=t; } }
Type-Checking in MiniJava • Binding for type-checking in MiniJava • Variable and formal parameter • Var name <-> type of variable • Method • Method name <-> result type, parameters( including position information), local variables • Class • Class name <-> variables, method declaration, parent class
Symbol Table: example See Figure 5.7 on page 111 • Primitive types • int -> IntegerType() • Boolean -> BooleanType() • Other types • Int [] -> IntArrayType() • Class -> IdentifierType(String s)
PARAMS p int q int LOCALS ret int a int FIELDS f C j int[] g int METHODS start int stop boolean B C PARAMS p int LOCALS …. A MiniJava Program and its symbol table(Figure 5.7) class B { C f; int[] j; int q; public int start(int p, int q) { int ret; int a; /* … */ return ret; } public boolean stop(int p) { /* …*/ return false; } } class{ C /* …*/ }
SymbolTable : Real Story class SymbolTable { public SymbolTable(); public boolean addClass(String id, String parent); public Class getClass(String id); public boolean containsClass(String id); public Type getVarType(Method m, Class c, String id); public Method getMethod(String id, String classScope); public Type getMethodType(String id, String classScope); public boolean compareTypes(Type t1, Type t2); }
Be careful! • getVarType(Method m, Class c, String id) • In c.m, find variable id • Precedence: • Local variable in method • Parameter in parameter list • Variable in the class • Variable in the parent class • getMethod(), getMethodType() • May be defined in the parent Classes • compareTypes() • Primitive types : int, boolean, IntArrayType • Subtype : IdentifierType
SymbolTalbe : Class class Class { public Class(String id, String parent); public String getId(); public Type type(); public String parent(); public boolean addMethod(String id, Type type); public Method getMethod(String id); public boolean containsMethod(String id); public boolean addVar(String id, Type type); public Variable getVar(String id); public boolean containsVar(String id); }
SymbolTable : Variable class Variable{ public Variable(String id, Type type); public String id(); public Type type() }
SymbolTable : Method class Method { public Method(String id, Type type); public String getId(); public Type type(); public boolean addParam(String id, Type type); public Variable getParamAt(int i); public Variable getParam(String id); public boolean containsParam(String id); public boolean addVar(String id, Type type); public Variable getVar(String id); public boolean containsVar(String id); }
Type-Checking : Two Phases • Build Symbol Table • Type-check statements and expressions public class Main { public static void main(String [] args) { try { Program root = new MiniJavaParser(System.in).Program(); BuildSymbolTableVisitor v1 = newBuildSymbolTableVisitor(); v1.visit(root); new TypeCheckVisitor(v1.getSymTab()).visit(root); } catch (ParseException e) { System.out.println(e.toString()); } } }
BuildSymbolTableVisitor(); • See Program 5.8 on Page 112 public class BuildSymbolTableVisitor extends TypeDepthFirstVisitor { …. private Class currClass; private Method currMethod; …… // Type t; // Identifier i; public Type visit(VarDecl n) { Type t = visit(n.t); String id = n.i.toString();
BuildSymbolTableVisitor(); - Cont’d if (currMethod == null){ if (!currClass.addVar(id,t)){ error.complain(id + "is already defined in " + currClass.getId()); } } else { if (!currMethod.addVar(id,t)){ error.complain(id + "is already defined in " + currClass.getId() + "." + currMethod.getId()); } } return null; }
BuildSymbolTableVisitor() :TypeVisitor() public Type visit(MainClass n); public Type visit(ClassDeclSimple n); public Type visit(ClassDeclExtends n); public Type visit(VarDecl n); public Type visit(MethodDecl n); public Type visit(Formal n); public Type visit(IntArrayType n); public Type visit(BooleanType n); public Type visit(IntegerType n); public Type visit(IdentifierType n);
TypeCheckVisitor(SymbolTable); • See Program 5.9 on page 113 package visitor; import syntaxtree.*; public class TypeCheckVisitor extends DepthFirstVisitor { static Class currClass; static Method currMethod; static SymbolTable symbolTable; public TypeCheckVisitor(SymbolTable s){ symbolTable = s; }
TypeCheckVisitor(SymbolTable); - Cont’d // Identifier i; // Exp e; public void visit(Assign n) { Type t1 = symbolTable.getVarType(currMethod,currClass, n.i.toString()); Type t2 = n.e.accept( new TypeCheckExpVisitor(symbolTable) ); if (symbolTable.compareTypes(t1,t2)==false){ error.complain("Type error in assignment to " +n.i.toString()); } }
TypeCheckExpVisitor(SymbolTable) package visitor; import syntaxtree.*; public class TypeCheckExpVisitor extends TypeDepthFirstVisitor { // Exp e1,e2; public Type visit(Plus n) { if (! (n.e1.accept(this) instanceof IntegerType) ) { error.complain("Left side of Plus must be of type integer"); } if (! (n.e2.accept(this) instanceof IntegerType) ) { error.complain("Right side of Plus must be of type integer"); } return new IntegerType(); }
TypeCheckVisitor : Visitor() public void visit(MainClass n); public void visit(ClassDeclSimple n); public void visit(ClassDeclExtends n); public void visit(MethodDecl n); public void visit(If n); public void visit(While n); public void visit(Print n); public void visit(Assign n); public void visit(ArrayAssign n);
TypeCheckExpVisitor() : TypeVisitor() public Type visit(And n); // boolean public Type visit(LessThan n); // boolean public Type visit(Plus n); // int public Type visit(Minus n); public Type visit(Times n); public Type visit(ArrayLookup n); // int public Type visit(ArrayLength n); // int public Type visit(Call n); // result type public Type visit(IntegerLiteral n); // int public Type visit(True n); // boolean public Type visit(False n); public Type visit(IdentifierExp n); // symbol table lookup public Type visit(This n); // current class public Type visit(NewArray n); // int[] public Type visit(NewObject n); // public Type visit(Not n); // boolean
Overloading of Operators, …. • When operators are overloaded, the compiler must explicitly generate the code for the type conversion. • 2 + 2 2.0 + 3.4 2.4 + 4 • “abc” + 4 • For an assignment statement, both sides have the same type. When we allow extension of classes, the right hand side is a subtype of lhs. • long x = (int) y + 3 • Person p = new Student();
Method Calls e.m(…) • Lookup method in the SymbolTable to get parameter list and result type • Find m in class e • The parameter types must be matched against the actual arguments. • Result type becomes the type of the method call as a whole. • Etc, etc, …….
TypeChecking method call // Exp e; Identifier i; ExpList el; publicTypevisit(Calln){ Type rcvType = visit(n.e); if(!(receiverType instanceof IdentifierType)) error.complain(…); Method m = symbolTable.getMethod( n.i.toString(), rcvType.toString()); if(n.el.size() != m.getParamSize()) error.complain(…) for(inti=0;i<n.el.size();i++){ Type acType = visit(n.el.get(i)); Type fmType = m.getParam(i); if(!symbolTable.compareType(acType,fmType)) error.complain(…) ; } returnm.type(); }
Error Handling • For a type error or an undeclared identifier, it should print an error message. • And must go on….. • Recovery from type errors? • Do as if it were correct. • Not a big deal in our homework. • Example: • int i = new C(); • int j = i + 1; • still need to insert i into symbol table as an integer so the rest can be typechecked..