290 likes | 446 Vues
6. Phase 3 : Code Generation Part I. Overview of compilation. The unit directory. genProg.cxx . What you must do. Compiling, assembling, downloading and running C--. The Gnu assembler and the VxWorks dynamic linker. Monkey see, monkey do. gener.cxx . Structure of an M68K assembly file.
E N D
6. Phase 3 : Code Generation Part I • Overview of compilation. • The unit directory. • genProg.cxx. • What you must do. • Compiling, assembling, downloading and running C--. • The Gnu assembler and the VxWorks dynamic linker. • Monkey see, monkey do. • gener.cxx. • Structure of an M68K assembly file. • Declarations. • Statements.
Overview compiler • A compiler is lexer + syner + gener. • Written lexer and syner. Now you write the gener. • Easiest of the three once you know what M68K code to generate. • And I tell you that bit. stdin stdout Errors || M68K C-- Code
The Unit Directory • The unit directory for this phase is /usr/users/staff/aosc/cm049icp/phase3 • Among other things it contains the following : • genprog.cxx : the test bed program for phase 3. • gener.template : a template file for your phase 3 programs. • gener.h : the header file for phase 3. • trueval, falseval : constants to represent true and false in M68K code. • INT_MAX_16_BIT,INT_MIN_16_BIT : maximum and minimum values for 16 bit 2s complement integers. • RPolish : struct for holding the Reverse Polish representation of C-- expressions. • makefile : the makefile for phase 3.
The Unit Directory II • gener : an executable for my phase 3 program. • tests/test*.c-- : testing programs for the demo. • rpolish.cxx : Reverse Polish conversion programs. RPolish *append(RPolish *rp1, RPolish *rp2) RPolish *toRPF(Factor *fact) RPolish *toRPT(Term *term) RPolish *toRPBE(BasicExp *bexp) RPolish *toRPE(Expression *expr) • partialExp.cxx : partial code generator for expressions. void genExpression(SymTab *st, Expression *expr, int &label, int &finalLabel) • Only handles literal constants. • Do a full implementation of genExpression after you’ve got the rest of it to work.
You must write this subprogram genProg.cxx • The test bed program is as follows : #include “.../phase2/syner.h” #include “.../phase3/gener.h” void main() { SymTab *st = NULL ; AST *ast = NULL ; int label = 0 ; synAnal(st, ast, label) ; generate(st, ast, label) ; } • First calls synAnal to parse the C--, then calls generate to produce M68K code. • Input/Output is from/to stdin/stdout. • For a ‘real’ compiler would use argc/argv and command line arguments to use files for Input/Output.
What You Must Do • Your implementation of generate must be in a file called gener.cxx in your directory. • Take a copy of makefile and gener.template. • Print out a copy of gener.h. • Print out a copy of rpolish.cxx. • Print out a copy of partialExp.cxx. • Useful commands : testphase3, demophase3. • They work as usual. • Your program’s output must be exactly the same as mine to get the marks.
UNIX command. Compiling, Assembling, Downloading & Running C-- • C-- program in prog.c-- : const string s = “Hello\n” ; { cout << s ; } • Make and run gener : jaguar> make gener jaguar> gener < prog.c-- > a.s jaguar> assem a jaguar> • Connect to VxWorks box and download and run : rlogin moloch -> ld < a -> run Hello ->
M68K Assembler & VxWorks Dynamic Linker • assem is a shell script (in /usr/users/staff/aosc/bin) which calls the Gnu Motorola 68000 assembler. • Gnu assembler is a high level Macro-Assembler. • Supports medium level memory management. • Makes variables/constants very easy. • VxWorks has a dynamic linker. • Similar to NT except that it works. • M68K programs contain calls to library subroutines. • e.g. scanf, printf. • Run-time addresses of these subroutines are not known to the compiler. • When programs are downloaded the required addresses are automatically linked in.
Monkey See, Monkey Do • My code generator is in a file called gener in the unit directory for this phase. • /usr/users/staff/aosc/cm049icp/phase3 • To work out what assembly code you need to generate run gener on C-- source code and inspect the output that is produced. • More or less the approach I adopted using C source and the GNU C compiler, cc68k. • Took longer than I expected because GNU assembler uses non-standard M68K assembly code mnemonics and assembler directives. • Bloody idiots. • Rest of this lecture is just a few ‘handy hints’.
Monkey See, Monkey Do II • Monkey see, monkey do is standard in the industry. • Usually have to tweak the instruction set of one chip into the instruction set of another. • Tend to stick to a small set of instructions which are common to all chips. • e.g. MOVE, ADD, JMP etc. • Usually about 20% of a CISC chip’s instruction set. • Main reason for RISC chips. • Why provide lots of instructions that no-one uses? • RISC chips have a lot fewer instructions than CISC chips. • Fewer instructions means less tweaking means less work.
Top Level Structure For gener.cxx • Contents of gener.template : #include <iostream.h> #include <fstream.h> #include <iomanip.h> #include <ctype.h> #include <stddef.h> #include <stdlib.h> #include “.../lib/cstring.h” #include “.../phase2/syner.h” #include “.../phase3/rpolish.cxx” void genHeader() { cout << “genHeader\n” ; } void genFooter(int finalLabel) { cout << “genFooter\n” ;
Top Level Structure For gener.cxx II void genDec(SymTab *st) { cout << “genDec\n” ; } void genDeclarations(SymTab *st) { cout << “genDeclarations\n” ; #include “.../phase3/partialExp.cxx” // Forward Declaration. void genStatements(SymTab *st, AST *ast, int &label, int &finalLabel) ; void genIfSt(SymTab *st, AST *ast, int &label, int &finalLabel) { cout << “genIfSt\n” ; }
Top Level Structure For gener.cxx III void genWhileSt(SymTab *st, AST *ast, int &label, int &finalLabel) { cout << “genWhileSt\n” ; } void genCinSt(SymTab *st, AST *ast, int &label, int &finalLabel) { cout << “genCinSt\n” ; } void genCoutSt(SymTab *st, AST *ast, int &label, int &finalLabel) { cout << “genCoutSt\n” ; }
Top Level Structure For gener.cxx IV void genAssignSt(SymTab *st, AST *ast, int &label, int &finalLabel) { cout << “genAssignSt\n” ; } void genStatements(SymTab *st, AST *ast, int &label, int &finalLabel) { cout << “genStatements\n” ; }
Top Level Structure For gener.cxx V void generate(SymTab *st, AST *ast, int label) { int finalLabel = label++ ; genHeader() ; genDeclarations(st) ; genStatements(st, ast, label, finalLabel) ; genFooter(finalLabel) } // generate • finalLabel used to label the error code for integer overflow. • Avoids using a 2-pass generator.
Structure Of A M68K Assembler File • Assembler code file is made up of 3 parts : • Standard header part. • Specific assembly code generated from C-- source. • Standard footer part. • Standard header : #NO_APP _IOinteger: .asciz “%d” _Eintegeroverflow: .asciz “\n\nInteger Overflow!\n” | | Declarations go here. | .even .globl _run _run: To find out what this means RTFM.
Structure Of A M68K Assembler File II • Standard footer : RTS LfinalLabel: LINK A6,#0 PEA _Eintegeroverflow JBSR __printf ADDQ.W #4,SP UNLK A6 RTS • Code after LfinalLabel label is integer overflow handling code. • Obviously, use value of finalLabel not its name. • RTS : VxWorks calls the assembly code as a subprogram. • ‘\t’ at start of all indented lines throughout assembly code. • No ‘\t’ anywhere else (except in strings).
Variable And Constant Declarations • genDec handles a single declaration. • C-- : int i1 = 0 ; int i2 ; const string str = “Hello\n” ; bool b1 = false ; bool b2 ; • M68K : .comm i1,4 .comm i2,4 Lstr: .asciz “Hello\n” ; .comm b1,4 .comm b2,4 • Note that strings are initialised on declaration.
Code For genDeclarations void genDeclarations(SymTab *st) { SymTab *stsave = NULL ; stsave = st ; while (st != NULL){ genDec(st) ; st = st->next ; } cout << “.even\n” ; cout << “.globl _run\n” ; cout << “_run\n” ; st = stsave ; while (st != NULL) // Initialise ints and bools. st = st->next ; }
Initialising ints and bools • int and bool constants and variables must be initialised when the program runs. • i.e. by M68K MOVE instructions. • In genDeclarations : if (st->initialise != NULL) && (st-type != STRINGDATA) { cout << “\tMOVE.W “ ; if (st->type == INTDATA) cout << “#’ << st->initialise->litInt ; else if (st->type == BOOLDATA) { if (st->initialise->litBool == “true”) cout << ‘#’ << trueval ; else if (st->initalise->litBool == “false”) cout << ‘#’ << falseval ; } cout << ‘,’ << st->ident << endl ; }
genStatements • genStatements simply steps through the AST calling other subprograms to generate the code for individual statements : void genStatements(...) { while (ast != NULL) { if (ast->tag == IFST) genIfSt(st, ast, label, finalLabel) ; else if (ast->tag == WHILEST) genWhileSt(st, ast, label, finalLabel) ; else if (ast->tag == CINST) genCinSt(st, ast, label, finalLabel) ; else if (ast->tag == COUTST) genCoutSt(st, ast, label, finalLabel) ; else if (ast->tag == ASSIGNST) genAssignSt(st, ast, label, finalLabel) ; } ast = ast->next ; } // genStatements
cin Statements • C-- : cin >> invar ; • M68K : LINK A6,#-4 LEA A6@(-4),A0 MOVE.L A0,SP@- PEA _IOinteger JBSR _scanf ADDQ.W #8,SP MOVE.L A6@(-4),invar UNLK A6 MOVE.L invar,D0 CMP.L #INT_MAX_16_BIT,D0 BGT LfinalLabel CMP.L #INT_MIN_16_BIT,D0 BLT LfinalLabel
cout Statements • C-- : cout >> outvar ; • M68K for strings : LINK A6,#-0 PEA Loutvar JSBR _printf ADDQ.W #4,SP UNLK A6 • M68K for ints : LINK A6,#-4 MOVE.L outvar,SP@- PEA _IOinteger JSBR _printf ADDQ.W #4,SP UNLK A6
Assignment Statements • C-- : var = expression ; • M68K : | Code to evaluate expression. MOVE.L D0,var • Code for the expression is generated by genExpression. • Convention : result of the expression will be left in D0. • Next lecture on how to write genExpression. For now just use the partial implementation from partialExp.cxx. • Can only use literal constants.
while Statements • C-- : while (condition) { statements } ; • M68K : Lstartlabel: | Code to evaluate condition. CMP.L trueval,D0 BNE Lendlabel | Code to execute statements. JMP Lstartlabel Lendlabel:
while Statements II • Obviously, use the integer values of ast->whilest->startlabel and ast-whilest->endlabel rather than their names after the Ls. • Code to evaluate condition expression is generated by genExpression. • Initially can only use boolean literal constants. • Code to execute statements is generated by genStatements. • Must be forward declared as it is mutually recursive with genWhileSt and genIfSt.
if Statements • C-- : if (condition) { statements } ; • M68K : | Code to evaluate condition. CMP.L trueval,D0 BNE Lendlabel | Code to execute statements. Lendlabel: • Obviously, use the integer value of ast->ifst->endlabel rather than its name after the Ls. • Code to evaluate condition expression is generated by genExpression. • Code to execute statements is generated by genStatements.
if Statements II • C-- : if (condition) { thenstatements } ; else { elsestatements } ; • M68K : | Code to evaluate condition. CMP.L trueval,D0 BNE Lelselabel | Code to execute thenstatements. JMP Lendlabel Lelselabel: | Code to execute elsestatements. Lendlabel: • Obviously, use the integer values of ast->ifst->elselabel and ast-ifst->endlabel rather than their names after the Ls.
Summary • Copy gener.template, makefile and gener (renamed dhgener) into your directory. • Print out gener.h, rpolish.cxx and partialExp.cxx. • Rename gener.template to gener.cxx. • Complete the stubs in gener.cxx in the following order : • genHeader, genFooter, genDeclarations, genDec, genCinSt, genCoutSt, genAssignSt, genIfSt, genWhileSt. • For now, assume all expressions are simply literal constants. • Use the genExpression in partialExp.cxx. • #included into gener.template.