Advanced Code Optimization Techniques in Compiler Design

Lecture #11, May 10, 2007 • More optimizations (local, loop, global) • Liveness Analysis, • Spilling, • Problems with Jumps, • Ranges, • Intervals, • Linear Scan register allocation.

More Optimizations • Local Optimizations • Constant Folding • Constant Propagation • Copy Propagation • Reduction in Strength • In Lining • Common sub-expression elimination • Loop Optimizations • Loop Invariant s • Reduction in strength due to induction variables • Loop unrolling • Global Optimizations • Dead Code elimination • Code motion • Reordering • code hoisting

Constant Folding • Subexpressions whose operands are all constants can be carried out at compile-time. • E.g. X := 2 * 4 Rather than generating Movi 2 r5 Movi 4 r7 Prim "*" [r5,r7] r9 . . . Generate this instead Movi 8 r9 . . . • Code like this is not ordinarily written by programmers but is often the result of translation of index calculations.

Constant Propagation • Sometimes we know a symbolic value is a constant. So we can propagate the constant and generate better code: 1 step := 4 2 total := 0 3 i := 1 4 total := x [i + step] • Note that because there are no possible paths to 4 that do not pass through 1, 2 and 3 we know that i+step can be computed by (1+4) which is known at compile time to be 5.

Copy Propagation • Assignments of one variable to another also propagate information: x : = y . . . total := Z[x] • Note if my translation knows that y is stored in some register, R7, I can use R7 rather than fetching x from memory • Copy propagation my remove all references to x completely. This allow makes the assignment to x dead code and a candidate to further optimization.

Reduction in Strength • Some sequences of code can be replaced with simpler (or less expensive) sequences. x := 2 * y could be replaced by x := y + y • Exponentiation by 2 by multiply • x ^2 == x * x • Multiplication by factor of 2 by shift

In - Lining • Some calls to functions (especially primitives like +, -, *, absolute value, ord and char) can be inlined as a sequence of machine instructions instead of a call to a library routine. i := abs(j) Bneg j L2 Mov j i Br L3 L2: Neg j R2 Mov R2 i L3:

Common Sub-expressions • Common subexpressions can be exploited by not duplicating code x := z[j+2] - w[j+2] T1 := j+2 x := z[T1] - w[T1] • Note that common subexpressions often occur even when they are not in the user level code. • E.g. Subscript computations on two multi-dimensional arrays with the same dimensions will often have common sub expressions even if the index to the arrays are completely different

Loop Invariants • Computations inside loops which remain invariant each time around the loop can be computed once outside the loop rather than each time around the loop. For i := 1 to N do { total := x[i] / sqr(n) + total } T1 := sqr(n) For i := 1 to N do { total := x[i] / T1 + total } • Note that index calculation may also introduces computations which are invariant around the loop.

Induction Variables and reduction in strength • Variables which vary in a regular way around loops are called induction variables. • For loop variables are obvious cases, but implicit induction variables give much opportunity for optimization. For i := 1 to 10 do { k := i * 4; total := x[i] + w[k] } • Note that k varies as a linear function of i. i := 1 k := 4 while i <= 10 do { total := x[i] + w[k] i := i + 1; k := k + 4 }

Induction Variables (cont. 1) i := 1 k := 4 while i <= 10 do { total := x[i] + w[k] i := i + 1; k := k + 4 } • Note that x[i] and w[k] are computed by formulas like: r4 := add(x) + lowbound(x) + i; load r4 r4 • Note that add(x) + lowbound(x) can be moved outside the loop and this introduces another induction variable xptr :=add(x) + lowbound(x) i := 1 k := 4 while i <= 10 do { total := (xptr + i)* + w[k] i := i + 1; k := k + 1}

Induction Variables (cont. 2) xptr :=addr(x) + lowbound(x) wptr := addr(w) + lowbound(w) i := 1 k := 4 while i <= 10 do { total := (xptr + i)* + (wptr + k)* i := i + 1; k := k + 4 } xptr :=addr(x) + lowbound(x) +1 wptr := addr(w) + lowbound(w) + 4 bound = xptr + 10 while xptr <= bound do { total := xptr* + wptr* xptr := xptr + 1; wptr := wptr + 4 }

Loop Unrolling • Loop with low trip count can be unrolled. This does away with the loop initialization and test for termination conditions. list := [1,2] while (list <> nil) do { total := total + hd(list); list := tail(list) } total := total + hd(list) list := tl(list) total := total + hd(list)

Dead Code Elimination • Automatic generation techniques often generate code that is unreachable. debug := false; if debug then print x; f(x); • Because of constant propagation it is possible to tell at compile-time that the then branch will never be executed.

Code Motion (reordering) • Sometimes reordering statements that do not interfere, allows other more powerful optimizations to be come applicable. Push R2 Movi 7 R3 Pop R4 Movi 7 R3 Push R2 Pop R4 Movi 7 R3 Mov R2 R4 • Now copy propagation might remove R2 altogether

Code Motion (Code Hoisting) • Branches in code sometimes repeat identical calculations. • These calculations can sometimes be “hoisted” before the branch, then they don’t have to be repeated. • This saves space, but not time. if g(x) then x := (d*2) + w / k else x := (d*2) - w / j T1 := (d*2); if g(x) then x := T + w / k else x := T - w / j • Multi branch “case” statements can make this quite a space saver

Code Hoisting in Nested Lang’s Procedure P (a,b:int); var x : int = 5; function f(y:int):int; begin x := a+b - 3; return x + y end; begin for i := 1 to 3 do print f(i) end; • Note that x := a+b - 3;could be hoisted out of the nested function, then hoisted out of the loop. Procedure P (a,b:int); var x : int = 5; function f(y:int):int; begin return x + y end; begin x := a+b - 3; for i := 1 to 3 do print f(i) end;

Register Allocation • Task: Manage scarce resources (registers) in environment with imperfect information (static program text) about dynamic program behavior. • General aim is to keep frequently-used values in registers as much as possible, to lower memory traffic. Can have a large effect on program performance. • Variety of approaches are possible, differing in sophistication and in scope of analysis used.

Spilling • Allocator may be unable to keep every ``live'' variable in registers; must then ``spill'' variables to memory. Spilling adds new instructions, which often affects the allocation analysis, requiring a new iteration. • If spilling is necessary, what should we spill? Some heuristics: • Don't spill variables used in inner loops. • Spill variables not used again for ``longest'' time. • Spill variables which haven't been updated since last read from memory.

Simplistic approach • Assume variables ``normally'' live in memory. • Use existing (often redundant) fetches and stores present in IR1. • So: only need to allocate registers to IR temporaries (T5 etc.). • Ignore possibility of spills. • Use simple linear scan register allocator based on liveness intervals.

Liveness • To determine how long to keep a given variable (or temporary) in a register, need to know the range of instructions for which the variable is live. • A variable or temporary is live immediately following an instruction if its current value will be needed in the future (i.e., it will be used again, and it won't be changed before that use).

Helper functions • We treat lists as sets, so we need functions that • Remove an element from a set • Unions two sets • Note there will only ever be one element with a given value in a set. fun remove x [] = [] | remove x (y::ys) = if x=y then ys else y :: remove x ys; fun union [] ys = ys | union (x::xs) ys = if List.exists (fn z => z=x) ys then union xs ys else x :: (union xs ys)

Work backwards fun defines (s as MOVE(target,src)) live = union (varsOf src) (remove target live) | defines s live = union (varsOfSt s) live fun annLive (s::ss) live ans = annLive ss (defines s live) ((s,live)::ans) | annLive [] live ans = ans; fun live stmts = annLive (rev stmts) [] []

Jumps cause problems T1 := 0 | T1 T3 L1: T2 := T1 + 1 | T2 T3 T3 := T3 + T2 | T2 T3 T1 := (2 * T2) | T1 T3 if T1 < 1000 GOTO L1 | T3 return T3 | • Consider the above IR program. It is labelled by the results obtained from the previous algorithm. • But, what happens if we jump back to L1, the 2nd to last statement should still state that T1 is live, because it could still be used in the dynamic flow of the program.

Computing Ranges • The result of computing the liveness is a list of variables for each line number. Eg. • [[TEMP 1, TEMP 3] • ,[TEMP 1, TEMP 3] • ,[ TEMP 2, TEMP 3,] • ,[ TEMP 2, TEMP 3] • ,[TEMP 1, TEMP 3] • ,[ TEMP 3] • ,[]] • We look for contiguous variables in consecutive line numbers for each variable.

Computing ranges • Given such a list, we comput the range for a single variable (i.e. Temp 1) by making a pass over the list. • We repeat for each variable. • [[TEMP 1,TEMP 3] • ,[TEMP 1,TEMP 3] • ,[TEMP 3,TEMP 2] • ,[TEMP 2,TEMP 3] • ,[TEMP 1,TEMP 3] • ,[TEMP 3] • ,[]] Range for Temp 1 [(5,5),(1,2)] Range for Temp 2 [(3,4)] Range for Temp 3 [(1,6)]

ML code The idea is to carry a option type indicating if we are in an active run for variable name. If we are in an active run, and we find a line that doesn’t have the variable, end the run. If we’re not in a run, and we find the variable, start a new run. fun range name [] line NONE ans = ans | range name [] line (SOME(x,y)) ans = (x,y)::ans | range name (rs::rss) line interval ans = (case (List.find (fn x => x=name) rs,interval) of (NONE,NONE) => range name rss (line+1) NONE ans | (NONE,SOME(x,y)) => range name rss (line+1) NONE ((x,y)::ans) | (SOME _,NONE) => range name rss (line+1) (SOME(line,line)) ans | (SOME _,SOME(x,y)) => range name rss (line+1) (SOME(x,line)) ans)

From Ranges to intervals • Computing intervals from ranges is easy. • Find the smallest start line number, and the largest finish line number in a set of ranges. • [(5,5),(1,2)] -> (1,5) fun max [x] = x | max (x::xs) = let val n = max xs in if x < n then n else x end; fun min [x] = x | min (x::xs) = let val n = min xs in if x < n then x else n end; fun interval ranges = let fun fst (x,y) = x fun snd (x,y) = y in (min (map fst ranges) ,max (map snd ranges)) end;

Live ranges and register allocation • T1 (1,1) (4,5) • T2 (2,3) • T3 (1,5) • Two variables with no overlapping ranges can share the same register. Note that T1 and T2 could be stored in the same physical register. • APPROXIMATION TECHNIQUE • Compute live intervals, which are the first and last statement that a variable can be live. Coalesce the ranges. • T1 (1,5) • T2 (2,3) • T3 (1,5)

Linear Scan Register Allocation • Compute startpoint and endpoint of the live interval for each temporary (Temp i). Store the intervals in a list in order of increasing start point. Range for Temp 1 [(5,5),(1,2)] interval (1,5) Range for Temp 2 [(3,4)] interval (3,4) Range for Temp 3 [(1,6)] interval (1,6) [(TEMP 1,1,5),(TEMP 3,1,6),(TEMP 2,3,4)] • Initialize set active := [ ] and pool of free registers = all usable registers. • For each live interval i in order of increasing start point: • For each interval j in active, in order of increasing end point • if endpoint[j] >= startpoint[i] break to step 3.2 • Remove j from active • Add register[j] to pool of free registers. • Set register[i] := NEXT FREE REGISTER, AND REMOVE IT from pool. If pool is already empty. need to spill. • Add i to active, sorted by increasing end point.

Fixing problems with Jumps • T1 := 0 | T1 T3 • L1: T2 := T1 + 1 | T2 T3 • T3 := T2 + T3 | T2 T3 • T1 := (2 * T1) | T1 T3 • if T1 < 1000 GOTO L1 | T1 T3 • return T3 • To fix problems with jumps we break code into basic blocks. • We use simple analysis within blocks (where the flow of control is simple) • And more complex analysis between blocks.

Advanced Code Optimization Techniques in Compiler Design

Advanced Code Optimization Techniques in Compiler Design

Presentation Transcript

LECTURE

Lecture 25 Lecture 26

Lecture

Lecture VIII Lecture IX

Lecture 6 Lecture 7

Lecture 10 Lecture 10 Lecture 11 Lecture 11 Lecture 11 Lecture 11

Lecture: Density (Mikey’s Lecture)

Lecture S1: Sample Lecture