Efficient Register Allocation Techniques in Compiler Construction

CSE P501 – Compiler Construction Register allocation constraints Local allocation Fast, but poorer code Global allocation Register coloring Jim Hogg - UW - CSE - P501

k • IR typically assumes infinite number of virtual registers • Real machine has k physical registers available • x86: k = 8 • x64: k = 16 • Goals • Produce correct code that uses k or less registers • Minimize added loads and stores • Minimize space needed for spilled values • Do this efficiently – O(n), O(n log n), maybe O(n2) Jim Hogg - UW - CSE - P501

Register Allocation • At each point in the code, pick which values to keep in registers • Insert code to move values between registers and memory • No additional transformations – scheduling already ran • But will usually re-run scheduling afterwards (to include the effects of spilling) • Minimize inserted save/restores ("spill" code) Jim Hogg - UW - CSE - P501

Allocation vs Assignment • Allocation: which values to keep in registers? • Assignment: which specific registers? • Compiler must do both Jim Hogg - UW - CSE - P501

Local Register Allocation • Applied to basic blocks (ie: "local") • Produces good register usage inside a block • But can have inefficiencies at boundaries between blocks • Two variations: top-down, bottom-up Jim Hogg - UW - CSE - P501

"Top-down" Local Allocation • We assume a RISC target machine ("load/store architecture") • load operands into two registers (if not already there) • perform instruction (eg: add, lshift) • store result back to memory (if required) • ie: operands for instructions MUST be in registers, rather than memory • (note, by contrast, that x86 allows operands directly from memory) • Principle: keep most heavily used values in registers • Priority = # of times virtual-register used (destination and/or source) in block • If more virtual registers than physical(the common case) • Reserve some registers for values allocated to memory • Need enough to address and load two operands and store result (R1 = R2 + R3) • Other registers dedicated to “hot” values • But are tied up for entire block with particular value, even if only needed for part of the block Jim Hogg - UW - CSE - P501

"Bottom-up" Local Allocation, 1 • Keep a list of free physical registers • initially all physical registers at beginning of block • except frame-pointer, stack-pointer - pre-allocated • Scan code, instruction-by-instruction • Allocate a register when one is needed • if one is free, allocate it • otherwise, pick a victim - eg: VR3 using R7 • save contents of R7 • allocate R7to requesting virtual register • Emit instructions to use physical register • Free physical register as soon as possible • In x = y op z, free y and z if no longer needed before allocating x Jim Hogg - UW - CSE - P501

Bottom-up Local Allocation, 2 • How to pick a victim? • Scan ahead thru code • Look for next use of each currently-allocated-to virtual register • Pick the virtual register whose next use is furthest off. Why? • Save the victim register's to memory (restore later) Example: CPU with 4 available registers Jim Hogg - UW - CSE - P501

Local "bottom-up" Register Allocation, -1 • ; load v2 from memory • ; load v3 from memory • v1 = v2 + v3 • ; load v5, v6 from memory • v4 = v5 - v6 • v7 = v2 - 29 • ; load v9 from memory • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 • Still in LIR. So lots (too many!) virtual registers required (v2, etc). • Grey instructions (1,2,4,7) load operands from memory into virtual registers. • We will ignore these going forward. Focus on mapping virtual to physical. Jim Hogg - UW - CSE - P501

Local "bottom-up" Register Allocation, 0 vRegNextRef pRegvReg v1 1 v2 1 v3 1 v4 2 v5 2 v6 2 v7 3 v8 4 v9 4 v10 5 v11 6 R1 - R2 - R3 - R4 - • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 Jim Hogg - UW - CSE - P501

Local "bottom-up" Register Allocation, 1 vRegNextRef pRegvReg v1 1 v2 13 v3 16 v4 2 v5 2 v6 2 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v3 R3 v1 R4 - • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 Jim Hogg - UW - CSE - P501

Local "bottom-up" Register Allocation, 2 vRegNextRef pRegvReg v1  v2 3 v3 6 v4 25 v5 2 v6 25 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v3v4 R3 v1v6 R4 v5 • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 ; spill R3 ; spill R2? - no - still clean R2 = R4 - R3 Jim Hogg - UW - CSE - P501

Local "bottom-up" Register Allocation, 3 vRegNextRef pRegvReg v1  v2 3 v3 6 v4 5 v5  v6 5 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v4 R3 v6 R4 v5v7 • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 ; spill R3 ; spill R2? - no! R2 = R4 - R3 ; spill R4? - no! R4 = R1 - 29 And so on . . . Jim Hogg - UW - CSE - P501

Bottom-Up Allocator • Invented about once per decade • Sheldon Best, 1955, for Fortran I • LasloBelady, 1965, for analyzing paging algorithms • William Harrison, 1975, ECS compiler work • Chris Fraser, 1989, LCC compiler • Vincenzo Liberatore, 1997, Rutgers • Will be reinvented again, no doubt • Many arguments for optimality of this Jim Hogg - UW - CSE - P501

Global Register Allocation • Standard technique is graph coloring • Use control-graph and dataflow-graph to derive interference graph • Nodes are live ranges (not registers!) • Edge between nodes when live at same time • Connected nodes cannot use same register • Then color the nodes in the graph • Two nodes connected by an edge may not have same color (ie: be allocated to same physical register) • If more than k colors are needed, insert spill code Jim Hogg - UW - CSE - P501

Graph Coloring, 1 1 2 3 4 5 • Assign a color to each node, such that • No adjacent nodes have same color Jim Hogg - UW - CSE - P501

Graph Coloring, 2 1 2 3 4 5 • 2 colors are enough for this graph • called a 2-coloring • this graph's chromatic number = 2 Jim Hogg - UW - CSE - P501

Graph Coloring, 3 1 1 2 2 3 4 3 4 5 5 small changes forces a 3rd color Jim Hogg - UW - CSE - P501

Live Ranges • Live Range of a variable is between a def and all subsequent possible uses • if a < 10, then a is live lines 1..5 • if a < 20, then a is live lines 1..10 • otherwise, a is live lines 1..14 • Mmm - clearly nonsense! We need a flowgraph for a correct specification • Note: line 14: a goes dead on RHS; d comes live on LHS. So death and birth happen 'half-way thru' the instruction • This example is simplified: variables are never re-def'd. And there are no temps 1 a = ... 2 b = ... 3 c = ... 4 a = a + b + c 5 if a < 10 then 6 d = c + 9 7 print(c) 8 elseif a < 20 then 9 e = 10 10 d = e + a 11 print(e) 12 else 13 f = 12 14 d = f + a 15 print(f) 16 endif 17 print(d) Jim Hogg - UW - CSE - P501

Live Ranges in Flowgraphs a a = ... b = ... c = ... a = a + b + c b c a = ... b = ... c = ... a = a + b + c if a < 10 then d = c + 9 print(c) elseif a < 20 then e = 10 d = e + a print(e) else f = 12 d = f + a print(f) endif print(d) a < 10 d = c + 8 print(c) a < 20 e = 10 d = e + a print(e) f = 12 d = f + a print(f) print(d) Jim Hogg - UW - CSE - P501

Register Interference Graph b f a d e c • If live range of 2 variables overlap, or interfere, they cannot use same register • Eg: a and b cannot both be stored in register R5 • Eg: c and ecan be stored in register R5 (at different, non-overlapping times) • How many registers will we need? - Guesses? Jim Hogg - UW - CSE - P501

Interference Graph - Coloring b f a Node Degree a 3 b 2 c 3 d 3 e 2 f 2 d e c • Problem: color this interference graph, where each color represents a different physical register • Solution: simplify graph: recursively remove easiest nodes (lowest degree) Jim Hogg - UW - CSE - P501

Interference Graph - Remove b b f a Node Degree a 2 b 2 c 2 d 3 e 2 f 2 d e c Removed: b Jim Hogg - UW - CSE - P501

Interference Graph - Remove a b f a Node Degree a 2 b 2 c 1 d 3 e 1 f 1 d e c Removed: b a Jim Hogg - UW - CSE - P501

Interference Graph - Remove c b f a Node Degree a 2 b 2 c 1 d 2 e 1 f 1 d e c Removed: b a c Jim Hogg - UW - CSE - P501

Interference Graph - Remove e b f a Node Degree a 2 b 2 c 1 d 2 e 1 f 1 d e c Removed: b a c e Jim Hogg - UW - CSE - P501

Interference Graph - Color! b f a d e c Removed: b a c e Jim Hogg - UW - CSE - P501

Interference Graph - Add e b f a d e c Removed: b a c Jim Hogg - UW - CSE - P501

Interference Graph - Add c b f a d e c Removed: b a Jim Hogg - UW - CSE - P501

Interference Graph - Add a b f a d e c Removed: b Jim Hogg - UW - CSE - P501

Interference Graph - Add b b f a d e c • 3 colors are enough • eg: red = R1, green = R2, blue = R3 • If less registers available - need to spill Jim Hogg - UW - CSE - P501

Live Ranges: More Realistic Example Register Interval • varp = ... • vw = [varp + 0] • v2 = 2 • vx = [varp + @x] • vy = [varp + @y] • vz = [varp + @z] • vw = vw * v2 • vw = vw * vx • vw = vw * vy • vw = vw * vz • [varp + 0] = vw varp 1..11 vw 2..7 vw 7..8 vw 8..9 vw 9..10 vw 10..11 v2 3..7 vx 4..8 vy 5..9 vz 6..10 In this example, virtual registers are re-def'd. Eg: vw participates in 5 live ranges Eg: vw [7..8] interferes with vx [4..8] but not vw [9..10] But no control flow!

Build Live Ranges, 1 • Use dataflow information to build interference graph • Nodes = live ranges • Add an edge for each pair of live ranges that overlap • Beware copy instructions: • rx = ry does not create interference between rxand ry : • can use same register if the ranges do not otherwise interfere Jim Hogg - UW - CSE - P501

Build Live Ranges, 2 • Construct the interference graph • Find live ranges – SSA • Build SSA form of IR • Live range initially includes single SSA name' • At a -function, form union of live ranges • Either rewrite code to use live range names or keep a mapping between SSA names and live-range names Jim Hogg - UW - CSE - P501

Simplify & Color Algorithm Simplify while graph cannot be colored remove easiest node (fewest neighbors = lowest degree) if degree of easiest node > k insert spill code remember that node into stack recalculate degree of each remaining node Color color the sole remaining node while stack is non-empty pop node from stack color newly expanded graph Jim Hogg - UW - CSE - P501

Coalescing Live Ranges • Idea: if two live ranges are connected by a copy operation (rx = ry) do not otherwise interfere, then coalesce • Rewrite all references to rx to use ry • Remove the copy instruction • Then fix up interference graph Jim Hogg - UW - CSE - P501

Advantages of Coalescing? • Makes the code smaller, faster (no copy operation) • Shrinks number of live ranges • Reduces the degree of any live range that interfered with both live ranges rx and ry • But: coalescing two live ranges can prevent coalescing of others, so ordering matters • coalesce most frequently executed ranges first (eg: inner loops) • Can have a substantial payoff – do it! Jim Hogg - UW - CSE - P501

Overall Structure More Coalescing Possible Find live ranges Build int. graph Coalesce Spill Costs Find Coloring No Spills Spills Insert Spills Jim Hogg - UW - CSE - P501

Real-Life Complications • Need to deal with irregularities in the register set • Some operations require dedicated registers • Eg: idivin x86 • Eg: split address/data registers in M68k • Register conventions like function results • Eg: use of registers across calls (return in EAX) • Different register classes • Eg: integer, floating-point, SIMD • Overlapping registers • Eg: x86 AL, AH, AX, EAX, RAX • Model by precoloring nodes, adding constraints in the graph, etc. Jim Hogg - UW - CSE - P501

Graph Representation • Recall: • Register allocation is one of the most important optimizations • Finding optimum solution for typical interference graph will take a million years • So we use heuristics to find a good solution a million times each day • Interference graph representation drives the time & space requirements for the allocator (may even dominate the entire compiler) • Not unknown to have ~5k nodes and ~1M edges Jim Hogg - UW - CSE - P501

Efficient Register Allocation Techniques in Compiler Construction