250 likes | 370 Vues
This lecture covers the transformation of intermediate code in compiler design, focusing on key concepts such as regular expressions, lexical analysis, and context-free grammars. We also explore various intermediate representations (IR) including high-level, medium-level, and low-level IR, along with their advantages for instruction selection. The discussion includes translating control structures, function calls, and corresponding IR nodes, providing insights on generating efficient code structures from abstract syntax trees. Students should prepare for a preliminary exam and complete recent programming assignments.
E N D
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 13: Transforming Intermediate Code
Administration • Prelim 1 on Monday in class • topics covered: regular expressions, tokenizing, context-free grammars, LL & LR parsers, static semantics • No class Wednesday March 3 • Programming Assignment 2 due Friday March 5 • Read: Appel 7, 8 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Where we are Source code (character stream) Lexical analysis regular expressions Token stream Syntactic Analysis grammars Abstract syntax tree Semantic Analysis static semantics Abstract syntax tree + types Intermediate Code Generation translation functions Intermediate Code CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Intermediate Code • Abstract machine code in tree form • Statements • MOVE, EXP, JUMP, CJUMP, SEQ, LABEL, RET • Expressions • CONST, TEMP, OP, MEM, CALL, ESEQ, LABEL • 13 kinds of tree nodes vs. hundreds of Pentium instructions—easier to generate, reason about CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Intermediate Representations • High-level IR (HIR) AST + extra node types • Medium-level IR (MIR) • intermediate between AST and assembly • other MIRs exist (quadruples, UCODE) • advantage of tree IR: easy to generate, easier to do reasonable instruction selection • Low-level IR (LIR) assembly code + extra pseudo-instructions CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
IR expressions • CONST(i) : the integer constant i • TEMP(t) : a temporary register t. The abstract machine has an infinite number of these • OP(e1, e2) : one of the following operations • PLUS, MINUS, MUL, DIV, MOD • AND, OR, XOR, LSHIFT, RSHIFT, ARSHIFT • MEM(e) : contents of memory locn w/ address e • CALL(f, l) : result of fcn f applied to arguments l • ESEQ(s, e) : result of e after stmt s is executed • NAME(n) : address of the statement labeled n CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
IR statements • MOVE(e, dest) : move result of e into dest • dest = TEMP(t) : assign to temporary t • dest = MEM(e) : assign to memory locn e • EXP(e) : evaluate e, discard result • SEQ(s1, s2) : execute s1 and then s2 • JUMP(e) : jump to address e • CJUMP(e, l1, l2) : jump to l1or l2depending on whether e is true or false • LABEL(n) : a labeled statement (may be used in NAME, JUMP, CJUMP) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translation • Intermediate code gen is tree translationAbstract syntax tree IR tree • Each subtree of AST translated to subtree in IR tree • Translation process described by translation function T [ E, A ] CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
location v : k A T [v] = MEM(PLUS(FP, CONST( k ))) fp 4 fp 8 Translation Example T [E1== E2 , A] = OP(==, T[E1, A], T[E2, A]) SEQ SEQ SEQ CJUMP LABEL(L1) == L2 L1 MEM if (b==0) a = b; CONST 0 LABEL(L2) + MOVE if fp 8 boolean int MEM MEM == = ; int b int 0 int a intb + + CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translation Code • Function T [ E, A] corresponds to a translation method class ASTnode IRnode translate(SymTab A); } • Note similarity to type-checking method: Type typeCheck(SymTab A); CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translating control structure • If, while, return statements cause transfer of control within program • Idea: Manage flow of control by introducing labels for statements, use CJUMP and JUMP statements to transfer control to the labels CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Translating if CJUMP(T[E], t, f) t: T[S] f: T [ if (E) S ] = SEQ SEQ CJUMP T[E]NAME(t) NAME(f) SEQ LABEL(t) LABEL(f) T[ S ] = SEQ(CJUMP(T[E],NAME(t),NAME(f)), SEQ(LABEL(t), SEQ(T[S], LABEL(f)) (if t, f fresh) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
SEQ LABEL(loop) SEQ CJUMP SEQ T[ E ] NAME(t) NAME(f) LABEL(t) SEQ T[ S ] SEQ JUMP(NAME(loop)) LABEL(f) Translating while while (E) S loop: CJUMP (T[ E ], t, f) t: T[ S ] JUMP loop f: = SEQ(LABEL(loop), CJUMP, LABEL(t), T[S], JUMP(NAME(loop)), LABEL(f)) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Function calls, returns • Translate to corresponding IR node label id : lid A T[id ( E1,…En) , A] = CALL(lid, T[ E1], …, T[ En ]) T[ return E , A] = RET(T[E, A]) alternatively, = SEQ(MOVE(T[E ], RV), JUMP(NAME(end)) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Progress • Now have rules for transforming AST into intermediate representation • Can apply this to AST of each function defn to get IR for function • Intermediate representation has many features not found in real assembly code • arbitrarily deep expression trees vs. 1-2 deep • ability to perform statements with side-effects as part of an expression (ESEQ, CALL); undefined behavior • CJUMP is two-way jump rather than fall-through • Why do we allow this in IR at all? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Canonical form • Idea: rewrite trees to get rid of constructs incompatible with assembly • arbitrarily deep expression trees -- deal with this later as part of instruction tiling • ESEQ & CALL nodes -- push ESEQ nodes upward in tree until they become SEQ nodes, push all CALL nodes up, make top-level backbone of SEQ nodes. • CJUMP is two-way jump rather than fall-through -- rewrite so jump on false is always to the very next instruction CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Canonical form • In canonical form, all SEQ nodes go down right chain: • Function is just one big SEQ containing all statements: SEQ(s1,s2,s3,s4,s5,…) • Can translate to assembly more directly SEQ s1 SEQ s2 SEQ s3 SEQ s4 SEQ s5 ... CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Non-canonical features • ESEQ nodes put a statement node underneath an expression: int x = 1 + { while (y > 0) { … } z; } • CALL nodes have side effects; must move to top level as EXP(CALL(…)) or MOVE(CALL(…)) to define behavior ESEQ S E CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
ESEQ rewriting • Want to move ESEQ nodes up to top of tree where they can become SEQ nodes • Idea: define transformation rules that take an IR tree and move ESEQ nodes to top. • Goal: move side-effecting statements to top of tree without ripping apart expressions more than necessary -- leads to better code because expression patterns can be recognized and mapped to instruction set CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
ESEQ Transformations • Example transformations: ESEQ(s1, ESEQ(s2, e))Þ ESEQ(SEQ(s1, s2), e)) MOVE(ESEQ(s1, e), dest) Þ SEQ(s1, MOVE(e, dest)) OP(ESEQ(s1, e1), e2) Þ ESEQ(s1, OP(e1, e2)) OP(e1, ESEQ(s1, e2)) Þ ? CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Rewriting expressions • OP(e1, ESEQ(s1, e2)) ESEQ ? OP e1 s1 OP ESEQ e1 e2 s1 e2 ? { a=0; e1 + e2 } e1 + { a=0; e2 } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Introducing temporaries • If e1 does not commute with s1 • i.e., {s1; e1; e2}¹{e1; s1; e2} • Must save value of e1 in temporary ESEQ OP OP e1 SEQ ESEQ s1 TEMP(t) e2 MOVE s1 e2 e1 TEMP(t) CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
General case • When we move all ESEQ nodes to top, arbitrary expression node looks like: • ESEQ transformation takes arbitrary expression node, returns list of sub-statements to be executed plus final expression. • ESEQ node built as shown ESEQ expr SEQ SEQ s1 SEQ s2 ... s3 CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Interface class CanonicalExpr { IRStmt[] pre_stmts; IRExpr expr; } abstract class IRExpr { CanonicalExpr canonical( ); } CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers
Conclusions • AST statements for structured control flow like “if” and “while” can be translated to unstructured IR nodes using JUMP, CJUMP, LABEL nodes. • Simple code transformations can transform the IR representation into a canonical form that has many of the properties of assembly code. CS 412/413 Introduction to Compilers and Translators -- Spring '99 Andrew Myers