SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES

SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module format) as opposed to a program in an intermediate language (and, in fact, for optimization purposes, all compilers) need to make use of a symbol table

The symbol table records information about the identifiers in the source program such as their name, type, no. of dimensions, space assignment, etc.

To illustrate the use of symbol tables, let’s consider a simple compiler, where symbol_stack consists of integers, and the integer associated with an identifier on the stack is the index of the entry for that identifier in the symbol table.

Our symbol stack entries will provide pointers to the entries in the symbol table where the name of the identifier and the offset assigned to it in the data segment is stored. • Negative numbers will be employed on symbol stack as codes to denote the registers, AX, BX, etc.

As identifiers are encountered in the source code, their names are packed onto an array, we will call id_stack, defined as: char id_stack[1000]; • Since strings in C all end in a 00h byte, it is only necessary to specify where on id_stack a name begins, in order to retrieve it.

The symbol table entry for a name does not contain the name itself, but instead a pointer to the beginning of the name on id_stack. • The reason for this is that, since the symbol table is an array of symbol table entries, we would have otherwise have to provide space in each entry for the largest legal name size.

When an identifier is encountered in the source code, the compiler has to search the symbol table to find the entry, if any, for it. • Various methods have been investigated for making this process more efficient, such as the use of binary trees,

But the method of choice has been to derive a number called a hash code from an identifier, and then link all identifiers with the same hash code in a list, which we will refer to as a hashlist

One method for evaluating a hash code, is to add up the ascii codes of the individual characters of the identifier • and then take, as the hash code the remainder of this sum after division by a prime number, such as 127.

The following is sample code for this purpose: int hash(char * name) { int hash_value = 0; int i = 0; while(name[i] != '\0') { hash_value += name[i]; ++i; } return(hash_value % 127); } In this scheme there are 127 hash-lists

A simple symbol table could be defined as follows: typedef struct { int name_index; int offset; int hash_link; } symbol_table_entry; symbol_table_entry symbol_table[1000];

Here name_index is the pointer into ID_S where the name is stored, • offset is the offset in the data segment assigned to the identifier, and • hash_link is a pointer to the symbol table entry for the next identifier encountered, if any, with the same hash code

The entries at symbol_table[0] thru symbol_table[126] are reserved for the heads of the 127 hash-lists.

For example if X1 is the first identifier encountered in the source with hash-code (say) 30, then an entry for it will be made at symbol_table[30]. • If later on, an identifier ZZ is encountered which also has hash-code 30, then an entry will be made for ZZ at the next free index > 127 in symbol_table, and the hash-link in the entry for X1 will be changed from null to point instead to the entry for ZZ.

Within the rules section of the Lex definition file, the regular expression and associated code for an identifier may take a form such as the following: {letter}({letter}|{digit}|'_')* {yylval= find(yytext); return identifier;} where the find function returns the index into the symbol_table of the entry for the identifier, creating an entry if one doesn’t already exist

The find function begins as follows: int find(char * name) { int j; j = hash(name); and proceeds according to the flow-diagram on the next slide

Code Generation Using the Symbol Table Let’s consider the code required in our simple compiler within our Yacc definition file for addition. To avoid complications, let’s assume that the code for our arithmetic expressions requires the use of register AX only

So on symbol stack, positive numbers are indexes of entries for identifiers in symbol_table, and (say) -1 is used as a code for AX: expression : expression ‘+’ term { c code as described below} The c code should check whether $1 and $3 are positive or negative, and generate appropriate object code for each of the 4 cases.

Case where $1 and $3 are both positive: Generate machine code corresponding to: mov AX, symbol_table[$1].offset; add AX, symbol_table[$3].offset; and set $$ = -1

Case where $1 is neg. and $3 is positive: Generate machine code corresponding to: add AX, symbol_table[$3].offset; and set $$ = -1

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES