400 likes | 523 Vues
This document discusses the challenges and strategies associated with register allocation in compiler construction, particularly focusing on local and global allocation techniques. It highlights the constraints of physical register usage on real machines, such as x86 and x64 architectures, and outlines the goals of producing correct code that minimizes added loads and stores while efficiently managing spilled values. Key allocation methods, including top-down and bottom-up approaches, are explained, emphasizing the importance of making optimal use of available registers to enhance compiler performance.
E N D
CSE P501 – Compiler Construction Register allocation constraints Local allocation Fast, but poorer code Global allocation Register coloring Jim Hogg - UW - CSE - P501
k • IR typically assumes infinite number of virtual registers • Real machine has k physical registers available • x86: k = 8 • x64: k = 16 • Goals • Produce correct code that uses k or less registers • Minimize added loads and stores • Minimize space needed for spilled values • Do this efficiently – O(n), O(n log n), maybe O(n2) Jim Hogg - UW - CSE - P501
Register Allocation • At each point in the code, pick which values to keep in registers • Insert code to move values between registers and memory • No additional transformations – scheduling already ran • But will usually re-run scheduling afterwards (to include the effects of spilling) • Minimize inserted save/restores ("spill" code) Jim Hogg - UW - CSE - P501
Allocation vs Assignment • Allocation: which values to keep in registers? • Assignment: which specific registers? • Compiler must do both Jim Hogg - UW - CSE - P501
Local Register Allocation • Applied to basic blocks (ie: "local") • Produces good register usage inside a block • But can have inefficiencies at boundaries between blocks • Two variations: top-down, bottom-up Jim Hogg - UW - CSE - P501
"Top-down" Local Allocation • We assume a RISC target machine ("load/store architecture") • load operands into two registers (if not already there) • perform instruction (eg: add, lshift) • store result back to memory (if required) • ie: operands for instructions MUST be in registers, rather than memory • (note, by contrast, that x86 allows operands directly from memory) • Principle: keep most heavily used values in registers • Priority = # of times virtual-register used (destination and/or source) in block • If more virtual registers than physical(the common case) • Reserve some registers for values allocated to memory • Need enough to address and load two operands and store result (R1 = R2 + R3) • Other registers dedicated to “hot” values • But are tied up for entire block with particular value, even if only needed for part of the block Jim Hogg - UW - CSE - P501
"Bottom-up" Local Allocation, 1 • Keep a list of free physical registers • initially all physical registers at beginning of block • except frame-pointer, stack-pointer - pre-allocated • Scan code, instruction-by-instruction • Allocate a register when one is needed • if one is free, allocate it • otherwise, pick a victim - eg: VR3 using R7 • save contents of R7 • allocate R7to requesting virtual register • Emit instructions to use physical register • Free physical register as soon as possible • In x = y op z, free y and z if no longer needed before allocating x Jim Hogg - UW - CSE - P501
Bottom-up Local Allocation, 2 • How to pick a victim? • Scan ahead thru code • Look for next use of each currently-allocated-to virtual register • Pick the virtual register whose next use is furthest off. Why? • Save the victim register's to memory (restore later) Example: CPU with 4 available registers Jim Hogg - UW - CSE - P501
Local "bottom-up" Register Allocation, -1 • ; load v2 from memory • ; load v3 from memory • v1 = v2 + v3 • ; load v5, v6 from memory • v4 = v5 - v6 • v7 = v2 - 29 • ; load v9 from memory • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 • Still in LIR. So lots (too many!) virtual registers required (v2, etc). • Grey instructions (1,2,4,7) load operands from memory into virtual registers. • We will ignore these going forward. Focus on mapping virtual to physical. Jim Hogg - UW - CSE - P501
Local "bottom-up" Register Allocation, 0 vRegNextRef pRegvReg v1 1 v2 1 v3 1 v4 2 v5 2 v6 2 v7 3 v8 4 v9 4 v10 5 v11 6 R1 - R2 - R3 - R4 - • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 Jim Hogg - UW - CSE - P501
Local "bottom-up" Register Allocation, 1 vRegNextRef pRegvReg v1 1 v2 13 v3 16 v4 2 v5 2 v6 2 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v3 R3 v1 R4 - • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 Jim Hogg - UW - CSE - P501
Local "bottom-up" Register Allocation, 2 vRegNextRef pRegvReg v1 v2 3 v3 6 v4 25 v5 2 v6 25 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v3v4 R3 v1v6 R4 v5 • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 ; spill R3 ; spill R2? - no - still clean R2 = R4 - R3 Jim Hogg - UW - CSE - P501
Local "bottom-up" Register Allocation, 3 vRegNextRef pRegvReg v1 v2 3 v3 6 v4 5 v5 v6 5 v7 3 v8 4 v9 4 v10 5 v11 6 R1 v2 R2 v4 R3 v6 R4 v5v7 • v1 = v2 + v3 • v4 = v5 - v6 • v7 = v2 - 29 • v8 = - v9 • v10 = v6 * v4 • v11 = v10 - v3 R3 = R1 + R2 ; spill R3 ; spill R2? - no! R2 = R4 - R3 ; spill R4? - no! R4 = R1 - 29 And so on . . . Jim Hogg - UW - CSE - P501
Bottom-Up Allocator • Invented about once per decade • Sheldon Best, 1955, for Fortran I • LasloBelady, 1965, for analyzing paging algorithms • William Harrison, 1975, ECS compiler work • Chris Fraser, 1989, LCC compiler • Vincenzo Liberatore, 1997, Rutgers • Will be reinvented again, no doubt • Many arguments for optimality of this Jim Hogg - UW - CSE - P501
Global Register Allocation • Standard technique is graph coloring • Use control-graph and dataflow-graph to derive interference graph • Nodes are live ranges (not registers!) • Edge between nodes when live at same time • Connected nodes cannot use same register • Then color the nodes in the graph • Two nodes connected by an edge may not have same color (ie: be allocated to same physical register) • If more than k colors are needed, insert spill code Jim Hogg - UW - CSE - P501
Graph Coloring, 1 1 2 3 4 5 • Assign a color to each node, such that • No adjacent nodes have same color Jim Hogg - UW - CSE - P501
Graph Coloring, 2 1 2 3 4 5 • 2 colors are enough for this graph • called a 2-coloring • this graph's chromatic number = 2 Jim Hogg - UW - CSE - P501
Graph Coloring, 3 1 1 2 2 3 4 3 4 5 5 small changes forces a 3rd color Jim Hogg - UW - CSE - P501
Live Ranges • Live Range of a variable is between a def and all subsequent possible uses • if a < 10, then a is live lines 1..5 • if a < 20, then a is live lines 1..10 • otherwise, a is live lines 1..14 • Mmm - clearly nonsense! We need a flowgraph for a correct specification • Note: line 14: a goes dead on RHS; d comes live on LHS. So death and birth happen 'half-way thru' the instruction • This example is simplified: variables are never re-def'd. And there are no temps 1 a = ... 2 b = ... 3 c = ... 4 a = a + b + c 5 if a < 10 then 6 d = c + 9 7 print(c) 8 elseif a < 20 then 9 e = 10 10 d = e + a 11 print(e) 12 else 13 f = 12 14 d = f + a 15 print(f) 16 endif 17 print(d) Jim Hogg - UW - CSE - P501
Live Ranges in Flowgraphs a a = ... b = ... c = ... a = a + b + c b c a = ... b = ... c = ... a = a + b + c if a < 10 then d = c + 9 print(c) elseif a < 20 then e = 10 d = e + a print(e) else f = 12 d = f + a print(f) endif print(d) a < 10 d = c + 8 print(c) a < 20 e = 10 d = e + a print(e) f = 12 d = f + a print(f) print(d) Jim Hogg - UW - CSE - P501
Register Interference Graph b f a d e c • If live range of 2 variables overlap, or interfere, they cannot use same register • Eg: a and b cannot both be stored in register R5 • Eg: c and ecan be stored in register R5 (at different, non-overlapping times) • How many registers will we need? - Guesses? Jim Hogg - UW - CSE - P501
Interference Graph - Coloring b f a Node Degree a 3 b 2 c 3 d 3 e 2 f 2 d e c • Problem: color this interference graph, where each color represents a different physical register • Solution: simplify graph: recursively remove easiest nodes (lowest degree) Jim Hogg - UW - CSE - P501
Interference Graph - Remove b b f a Node Degree a 2 b 2 c 2 d 3 e 2 f 2 d e c Removed: b Jim Hogg - UW - CSE - P501
Interference Graph - Remove a b f a Node Degree a 2 b 2 c 1 d 3 e 1 f 1 d e c Removed: b a Jim Hogg - UW - CSE - P501
Interference Graph - Remove c b f a Node Degree a 2 b 2 c 1 d 2 e 1 f 1 d e c Removed: b a c Jim Hogg - UW - CSE - P501
Interference Graph - Remove e b f a Node Degree a 2 b 2 c 1 d 2 e 1 f 1 d e c Removed: b a c e Jim Hogg - UW - CSE - P501
Interference Graph - Color! b f a d e c Removed: b a c e Jim Hogg - UW - CSE - P501
Interference Graph - Add e b f a d e c Removed: b a c Jim Hogg - UW - CSE - P501
Interference Graph - Add c b f a d e c Removed: b a Jim Hogg - UW - CSE - P501
Interference Graph - Add a b f a d e c Removed: b Jim Hogg - UW - CSE - P501
Interference Graph - Add b b f a d e c • 3 colors are enough • eg: red = R1, green = R2, blue = R3 • If less registers available - need to spill Jim Hogg - UW - CSE - P501
Live Ranges: More Realistic Example Register Interval • varp = ... • vw = [varp + 0] • v2 = 2 • vx = [varp + @x] • vy = [varp + @y] • vz = [varp + @z] • vw = vw * v2 • vw = vw * vx • vw = vw * vy • vw = vw * vz • [varp + 0] = vw varp 1..11 vw 2..7 vw 7..8 vw 8..9 vw 9..10 vw 10..11 v2 3..7 vx 4..8 vy 5..9 vz 6..10 In this example, virtual registers are re-def'd. Eg: vw participates in 5 live ranges Eg: vw [7..8] interferes with vx [4..8] but not vw [9..10] But no control flow!
Build Live Ranges, 1 • Use dataflow information to build interference graph • Nodes = live ranges • Add an edge for each pair of live ranges that overlap • Beware copy instructions: • rx = ry does not create interference between rxand ry : • can use same register if the ranges do not otherwise interfere Jim Hogg - UW - CSE - P501
Build Live Ranges, 2 • Construct the interference graph • Find live ranges – SSA • Build SSA form of IR • Live range initially includes single SSA name' • At a -function, form union of live ranges • Either rewrite code to use live range names or keep a mapping between SSA names and live-range names Jim Hogg - UW - CSE - P501
Simplify & Color Algorithm Simplify while graph cannot be colored remove easiest node (fewest neighbors = lowest degree) if degree of easiest node > k insert spill code remember that node into stack recalculate degree of each remaining node Color color the sole remaining node while stack is non-empty pop node from stack color newly expanded graph Jim Hogg - UW - CSE - P501
Coalescing Live Ranges • Idea: if two live ranges are connected by a copy operation (rx = ry) do not otherwise interfere, then coalesce • Rewrite all references to rx to use ry • Remove the copy instruction • Then fix up interference graph Jim Hogg - UW - CSE - P501
Advantages of Coalescing? • Makes the code smaller, faster (no copy operation) • Shrinks number of live ranges • Reduces the degree of any live range that interfered with both live ranges rx and ry • But: coalescing two live ranges can prevent coalescing of others, so ordering matters • coalesce most frequently executed ranges first (eg: inner loops) • Can have a substantial payoff – do it! Jim Hogg - UW - CSE - P501
Overall Structure More Coalescing Possible Find live ranges Build int. graph Coalesce Spill Costs Find Coloring No Spills Spills Insert Spills Jim Hogg - UW - CSE - P501
Real-Life Complications • Need to deal with irregularities in the register set • Some operations require dedicated registers • Eg: idivin x86 • Eg: split address/data registers in M68k • Register conventions like function results • Eg: use of registers across calls (return in EAX) • Different register classes • Eg: integer, floating-point, SIMD • Overlapping registers • Eg: x86 AL, AH, AX, EAX, RAX • Model by precoloring nodes, adding constraints in the graph, etc. Jim Hogg - UW - CSE - P501
Graph Representation • Recall: • Register allocation is one of the most important optimizations • Finding optimum solution for typical interference graph will take a million years • So we use heuristics to find a good solution a million times each day • Interference graph representation drives the time & space requirements for the allocator (may even dominate the entire compiler) • Not unknown to have ~5k nodes and ~1M edges Jim Hogg - UW - CSE - P501