Intermediate Code Generation

Intermediate Code Generation • Why use intermediate code ? • analysis independent from target language • optimisation independent from target • language • porting to new machines requires only • a change of one component of the • compiler We will generate three-address code, using syntax-directed definitions.

Three Address Code • Statements in this language are of the form: • x := y op z • where x, y and z are names, constants or • compiler-generated temporary variables, and • op stands for any operator. • A more complicated statement like • d := a+b*c • would have to be translated to • t1 := b * c • d := a + t1 • where t1 is a compiler-generated temporary • variable.

expression: a := b * c + b / c postfix: abc*bc/+ := syntax tree: := + a * / b b c c • three-address code: • t1 := b * c • t2 := b / c • t3 := t1 + t2 • a := t3

Three-address statements x := y op z assignment x := op y unary assignment x := y copy goto L unconditional jump if x relop y goto L conditional jump param x procedure call call p n procedure call return y procedure call x := y[i] indexed assignment x[i] := y indexed assignment

A Syntax-directed Translation • To generate three-address code from source, • we will use syntax-directed definitions. • First, we will consider the language of • assignments and expressions. • S will have one attribute "code", which will • contain the three-address code fragment of • the assignment. • E will have two attributes: • code - the corresponding code fragment • place - the name that will hold the value • corresponding to E. • The notation gen(x ":=" y "+" z) represents • x := y + z • The notation <fragment> || expr means • concatenate the expression onto the end • of the code fragment.

S -> id := E E1 -> E2 + E3 E1 -> E2 * E3 E1 -> -E2 E1 -> ( E2 ) E -> id 1) 2) 3) 4) 5) 6) S.code := E.code || gen(id.place ":=" E.place) E1.place := newtemp(); E1.code := E2.code || E3.code || gen(E1.place ":=" E2.place "+" E3.place) E1.place := newtemp(); E1.code := E2.code || E3.code || gen(E1.place ":=" E2.place "*" E3.place) E1.place := newtemp(); E1.code := E2.code || gen(E1.place ":=" "uminus" E2.place) E1.place := newtemp(); E1.code := E2.code E.place = id.place; E.code := ""

a := b * c + b * -c S a := E8n E3n + E7n E1n * E2n E4n * E6n b c b - E5n c

Constructing the Attributes place code b c t1 b c t2 t3 t4 E1n.code || E2n.code || t1 := b * c E5n.code || t2 := uminus c E4n.code || E6n.code || t3 := b * t2 E3n.code || E7n.code || t4 := t1 + t3 E8n.code || a := t4 E1n E2n E3n E4n E5n E6n E7n E8n S

Flow of Control We can extend that syntax-directed definition to handle flow of control statements: S1 -> while E do S2 S1.begin := newlabel(); S1.after := newlabel(); S1.code := gen(S1.begin ":") || E.code || gen("if" E.place "= 0 goto" S1.after) || S2.code || gen("goto" S1.begin) || gen(S1.after ":") The attributes "begin" and "after" will hold labels, and newlabel() will return a new label.

labels code ... S1.begin : E.code if E.place = 0 goto S1.after S2.code goto S1.begin S1.after : ...

Looking up the Symbol Table S -> id := E E1 -> E2 + E3 E1 -> E2 * E3 E1 -> -E2 E1 -> ( E2 ) E -> id 1) 2) 3) 4) 5) 6) p := lookup(id.name); if p ¹nil then emit(p ":=" E.place) else error E1.place := newtemp(); emit(E1.place ":=" E2.place "+" E3.place) E1.place := newtemp(); emit(E1.place ":=" E2.place "*" E3.place) E1.place := newtemp(); emit(E1.place ":= uminus" E2.place) E1.place := E2.place p := lookup(id.name); if p ¹nil then E.place := p else error

res := a * (alpha + -b) Assume res, a, alpha and b have already been declared, and placed in the symbol table: token : ID_T ID_T ID_T ID_T attributes : index : 5 6 7 8 lexptr : ->res ->a ->alpha ->b

processed string res := a res :=E1 res :=E1 * (alpha res :=E1 * (E2 res :=E1 * (E2 + -b res :=E1 * (E2 + -E3 res :=E1 * (E2 + E4 res :=E1 * (E5 res :=E1 * (E5) res :=E1 * E6 res :=E7 S attributes E1.place = <6> E2.place = <7> E3.place = <8> E4.place = <9> E5.place = <10> E6.place = <10> E7.place = <11> output <9> := uminus<8> <10> := <7>+<9> <11> := <6>*<10> <5> := <11>

Arrays We will store the elements of an array in a block of consecutive locations. A is an array w is the width of each element low is the lower bound on the index base is the address of A • The ith element of A begins at location: • base + (i - low) * w • or • i * w + (base - (low * w)) • = c We then store c with A in the symbol table, and the address of A[i} then is c + (i * w)

Multi-dimensional Arrays We will consider arrays stored row by row low1 is the lower bound on the first index low2 is the lower bound on the second n2 is the upper bound on the second index • The address of A[i,j] is: • base + ((i - low1)*n2 + (j - low2))* w • or • ((i * n2) + j)*w + (base - ((low1 * n2) + low2)*w)

Grammar of Array References The obvious grammar for indexing array elements is: L -> id [Elist] | id Elist -> Elist , E | E We will use, however, a different grammar, that alows us to build up the index limits as we construct the Elists: L -> Elist | id Elist -> Elist , E | id [ E • We also need: • attributes: Elist.ndim - number of dimensions • Elist.place - temp value • L.place - position in symbol table • L.offset - offset into the array • functions: limit(array,i) - the limit of the ith • dimension of the array • c(array) - returns the pre-computed • formula • width(array) - returns w

The syntax-directed definition 1) S -> L := E 2) E1 -> E2 + E3 3) E1 -> (E2) 4) E -> L 5) L -> Elist ] if L.offset = null then emit(L.place ":=" E.place) else emit(L.place "[" L.offset "] :=" E.place) E1.place := newtemp(); emit(E1.place ":=" E2.place "+" E3.place) E1.place := E2.place if L.offset = null then E.place = L.place else E.place := newtemp(); emit(E.place ":=" L.place "[" L.offset "]") L.place := newtemp(); L.offset := newtemp(); emit(L.place ":=" c(Elist.array)) emit(L.offset ":=" Elist.place "*" width(Elist.array))

6) L -> id 7) Elist1 -> Elist2 , E 8) Elist -> id [ E L.place := id.place L.offset := null t := newtemp(); m := Elist2.ndim + 1; emit(t ":=" Elist2.place "*" limit(Elist2.array, m)) emit(t ":=" t "+" E.place); Elist1.array := Elist2.array; Elist1.place := t; Elist1.ndim := m Elist.array := id.place; Elist.place := E.place; Elist.ndim := 1

Type Conversion • We have seen before how to compute the • type expression for complex expressions • using more than one data type. • It is the job of the compiler to construct • the necessary three-address code to do any • automatic type conversion required . • We will assume that there are two basic • types, integer and real, and we may have to • convert integers to reals. • We assume that there is a function • inttoreal • and two different "+" operators: • int+ • real+

Semantic rule for E1 -> E2 + E3 • E1.place := newtemp(); • if E2.type = integer and E3.type = integer then • begin • emit(E1.place ":=" E2.place "int+" E3.place); • E1.type := integer • end • else if E2.type = real and E3.type = real then • begin • emit(E1.place ":=" E2.place "real+" E3.place); • E1.type = real • end • else if E2.type = integer and E3.type = real then • begin • u := newtemp(); • emit(u ":= inttoreal" E2.place); • emit(E1.place ":=" u "real+" E3.place); • E1.type = real • end • else if

else if E2.type = real and E3.type = integer then • begin • u := newtemp(); • emit(u ":= inttoreal" E2.place); • emit(E1.place ":=" E2.place "real+" u); • E1.type := real • end • else E1.type = type_error; • We would also require similar semantic rules for • E1 -> E2 * E3 • using operators "int*" and "real*".

generating the code processed string id1 := id2 * (id3 + -id4) id1 :=E1 * (id3 + -id4) id1 :=E1 * (E2 + -id4) id1 :=E1 * (E2 + -E3) id1 :=E1 * (E2 + E4) id1 :=E1 * (E5) id1 :=E1 * E6 id1 :=E7 S attributes E1.place = <6> E2.place = <7> E3.place = <8> E4.place = <9> E5.place = <10> E6.place = <10> E7.place = <11> output <9> := uminus<8> <10> := <7>+<9> <11> := <6>*<10> <5> := <11>

Intermediate Code Generation