1 / 21

Code Generation

Code Generation. Compiler Baojian Hua bjhua@ustc.edu.cn. Front End. lexical analyzer. source code. tokens. abstract syntax tree. parser. semantic analyzer. IR. Back End. instruction selector. IR. Assem. register allocator. TempMap. instruction scheduler. Assem. Code Generation.

gamba
Télécharger la présentation

Code Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code Generation Compiler Baojian Hua bjhua@ustc.edu.cn

  2. Front End lexical analyzer source code tokens abstract syntax tree parser semantic analyzer IR

  3. Back End instruction selector IR Assem register allocator TempMap instruction scheduler Assem

  4. Code Generation • Generating code for some ISA • this course uses x86 • Many components • instruction selection, register allocation, scheduling, … • Many different strategies • for this time, we concentrate on a simple one: stack machine • and later in this course, we’d turn to more advanced (and sophisticated) ones

  5. What’s a stack machine? • A stack machine has only an operand stack and no (or few) registers • all computation performed on the operand stack • architecture very simple and uniform • Long history: • Date back at least to 70’s last century • Renew industry’s interest in the recent decade • Sun’s JVM and Microsoft’s CLR, etc.

  6. Stack Machine ISA: s86 // Sample Program push 8 push 2 push x times sub prog -> instr prog -> instr -> push v -> pop id -> add -> sub -> times -> divide v -> num -> id

  7. The simple expression lang’ // or in ML datatype exp = Int of int | Id of string | Add of exp * exp | Sub of exp * exp | Times of exp * exp | Divide of exp * exp // Sample Program 8-2*x // recall our simple // expression language exp -> num -> id -> exp + exp -> exp – exp -> exp * exp -> exp / exp -> (exp)

  8. Code gen’ from exp to s86 C (num) = push num C (id) = push id C (e1 + e2) = C (e1); C (e2); add C (e1 – e2) = C (e1); C (e2); sub C (e1 * e2) = C (e1); C (e2); times C (e1 / e2) = C (e1); C (e2); divide

  9. Code gen’ from exp to s86 // or in ML fun C (e) = case e of Num i => push i | Id s => push s | Add (e1, e2) => C (e1); C (e2); add | … => (* similar *)

  10. Example C (8-2*x) = C(8); C(2*x); sub = push 8; C(2*x); sub = push 8; C(2); C(x); times; sub = …

  11. Moral • Code generation for stack machine is dirty simple • recursive equation from point view of math • recursive function from point view of CS • think before hack! • But we’d have more to say about: • variable storage • more language features • statement, declarations, functions, etc..

  12. Address space 0xffffffff OS • Address space is the way how programs use memory • highly architecture and OS dependent • right is the typical layout of 32-bit x86/Linux 0xc00000000 stack heap data text 0x08048000 0x00100000 BIOS, VGA 0x00000000

  13. Static Storage • Static storage is an area of space in data section • a typical use is to hold C/C++ file scope variables (static) and extern variable (global) • Exp lang’ has only static variables, all can be stored to static section • so require a pass to collect all variables

  14. Declarations // or in ML datatype decs = T of {var: string, ty: tipe} list // Sample Program int x; 8-2*x; // scale exp a bit prog -> decs exp decs -> int id; decs -> exp -> …

  15. Code gen’ rules D (int id; decs) = id: .int 0 D (decs) D ( ) =

  16. Statement // scale the exp a by adding the following: s -> id = e; -> if (e) s else s // compile: CS (id = e;) = C (e); pop id

  17. Statement, cont’ // s86 should also be modified! // compile: CS (if e s1 s2) = C(e); jz .Lfalse .Ltrue: CS(s1) jmp .Lend .Lfalse: CS(s2) .Lend e s1 s2 …

  18. Moral • It’s also straightforward to translate other control structure in this style • while, for, switch, etc.. • This kind of code generation is called recursive decedent • may be done at parsing time • adopted in many compilers • read the offered article on Borland Turbo Pascal 3.0 • you may safely ignore the Pascal-specific features

  19. From s86 to x86 • Run the generated s86 code? • design a virtual machine • as we did in lab #1 • this is also the way of JVM or CLR • translate to native code and then exec’ it • so-called just-in-time (JIT) • the dominant OO method today… • Next, we discuss the 2nd method • by mapping s86 to x86

  20. Operand Stack // x86 does not have a dedicated operand stack? // Solution 1: use the control stack: ebp, esp // leave to you. // Solution 2: make a fake operand stack, as in: .set PAGE, 4096 .data opStack: .space PAGE, 0xcc top: .int opStack+PAGE // “top” points to stack top, and stack grows // down to lower address

  21. Instructions // map fake s86 instructions to x86’s: .macro s86push x sub dword ptr [top], 4 mov ebx, [top] mov eax, \x mov [ebx], eax .endm // others are similar. // Care must be taken to take account of the // machine constraints. For instance, mem-mem // move is illegal on x86.

More Related