1 / 26

A Global Progressive Register Allocator

A Global Progressive Register Allocator. David Ryan Koes Seth Copen Goldstein Carnegie Mellon University {dkoes,seth}@cs.cmu.edu. eax. ebx. ecx. edx. esi. edi. esp. ebp. Register Allocation Problem. unbounded number of program variables.

Mercy
Télécharger la présentation

A Global Progressive Register Allocator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University {dkoes,seth}@cs.cmu.edu

  2. eax ebx ecx edx esi edi esp ebp Register Allocation Problem unbounded number of program variables limited number of processor registers + slow memory spill code optimization … v = 1 w = v + 3 x = w + v u = v t = u + x print(x); print(w); print(t); print(u); … register preferences rematerialization register allocator live range splitting memory operands

  3. fully utilize machine description explicit and expressive model of costs of allocation for given architecture optimal solutions A More Principled Register Allocator reg alloc machine description

  4. Multi-commodity Network Flow: An Expressive Model • Given network (directed graph) with • cost and capacity on each edge • sources & sinks for multiple commodities • Find lowest cost flow of commodities • NP-complete for integer flows b a Example: edges have unit capacity 1 0 b a

  5. a a r0 r0 r1 r1 mem mem 1 1 Register Allocation as a MCNF Variables  Commodities Variable Definition  Source Variable Last Use  Sink Nodes  Allocation Classes (Reg/Mem/Const) Registers Limits  Node Capacities Spill Costs  Edge Costs Allocation  Flow r1 mem 1 3 Also need anti-variables to model persistent memory

  6. Example load cost Source Code int example(int a, int b) { int d = 1; int c = a - b; return c+d; } insn pref cost Pre-alloc Assembly MOVE 1 -> d SUB a,b -> c ADD c,d -> c MOVE c -> r0 mem access cost

  7. Split Normal Merge a: %eax a: %eax a: mem a: mem a: mem Control Flow • MCNF can only represent straight-line code • need to link together networks from basic blocks New nodes to handle block entry/exit constraints a: %eax a: mem

  8. fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions NP-hard, so use progressive solution technique reg alloc machine description Technique: Lagrangian relaxation directed allocators Allocation Quality Compile Time A More Principled Register Allocator

  9. Solution Procedure • Compute Lagrangian prices using iterative subgradient optimization • guaranteed converge to “optimal” prices • for linear relaxation of the problem • Prices used by allocator to find solution • solution improves as prices converge • two allocators • iterative heuristic allocator • simultaneous heuristic allocator

  10. Solution Procedure • Advantages • iterative nature  progressive • Lagrangian relaxation theory provides means for computing a good lower bound • Can compute optimality bound • Disadvantages • No guarantee of finding optimal solution • Optimality bound poor if integrality gap large 99% of the time integrality gap = 0

  11. a b c d 0 4 0 -2 Iterative Heuristic Allocator Edges to/from memory cost 3 Allocation order: a, b, c, d Cost: Total: 2

  12. X X Simultaneous Heuristic Allocator Edges to/from memory cost 3 Current cost: -1 -3 -2

  13. Evaluation • Implemented in gcc 3.4.3 targeting x86 • Optimize for code size • perfect static evaluation • important metric in its own right • MediaBench, MiBench, Spec95, Spec2000 • over 10,000 functions

  14. default allocator: 1121 graph allocator: 1422 • CPLEX Progressiveness

  15. graph allocator • default allocator • CPLEX Progressiveness

  16. Progressive! Code Size

  17. Optimality Proven maximum distance from optimal Proven optimality

  18. 10x slower Compile Time Slowdown :-(

  19. fully utilize machine description explicit and expressive model of costs of allocation for given architecture: Global MCNF optimal solutions approach optimality using progressive solution technique: Lagrangian directed allocators reg alloc machine description A More Principled Register Allocator

  20. ? Questions?

  21. Accuracy of the Model Global MCNF model correctly predicts costs of register allocation within 2% for 71% of functions compiled

  22. Code Size

  23. Compile Time Asymptotic Complexity one iteration: O(nv)

  24. Code Performance

  25. Compile Time Slowdown :-( 10x slower

More Related