1 / 11

Compiler-Based Register Name Adjustment for Low-Power Embedded Processors

Compiler-Based Register Name Adjustment for Low-Power Embedded Processors. Discussion by Garo Bournoutian. Introduction. Problem: High power consumption due to bit transitions on instruction bus Current compilers allocate registers with minimal amount of spill/fill code

dava
Télécharger la présentation

Compiler-Based Register Name Adjustment for Low-Power Embedded Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler-BasedRegister Name Adjustment for Low-PowerEmbeddedProcessors Discussion by Garo Bournoutian

  2. Introduction • Problem: High power consumption due to bit transitions on instruction bus • Current compilers allocate registers with minimal amount of spill/fill code • Importance: Growing number of mobile, battery-powered devices. • Solution allows for longer battery-life, larger die sizes • Approach: rearrange, rename registers in code to allow for minimal bit transitions

  3. What is a bit transition? Assembly Code: Instruction Word: add r3, r2, r4 … 0011 0010 0100 … sub r6, r3, r5 … 0110 0011 0101 … sub r3, r2, r6 … 0011 0010 0110 … mul r4, r4, r5 … 0100 0100 0101 … Transitions / field: 7 4 5 Total transitions: 16

  4. An Example • The original code had a total of 16 transitions: add r3, r2, r4 sub r6, r3, r5 sub r3, r2, r6 mul r4, r4, r5 • The optimized code now has a total of 10 transitions: • add r6, r2, r4 • sub r7, r6, r5 • sub r6, r2, r7 • mul r4, r4, r5 • Just renaming r3 to r6 and r6 to r7, you have a 37% reduction of bit transitions.

  5. Formulation • Must map code in basic blocks to numerical structures ld r5, (r1)0 add r3, r2, r5 add r4, r3, r2 mul r3, r4, r3 st r3, (r7)10

  6. Heuristic Solution • Solving this problem for multiple basic blocks and literals is NP-Complete (Traveling Salesman Problem) • Effective, efficient heuristic solution for RNA requires two steps: • Register PerturBation (RPB) • Maximizes distribution skew of register pairs • Register PermuTation (RPT) • Uses frequencies of register pairs to minimize hamming distance

  7. Register PerturBation • Commutativity Transformation • add,mul, and, or operations • No side-effects in code performance • Linear Time Complexity • Dead Register Reassignment r1  r2, r3 r1  r2, r3 r4  r1, r2 r2  r1, r2 r2  r3, r4r2  r3, r2 • Linear Time Complexity

  8. Register PermuTation • Capture utilization frequency of register/literal pairs by means of Register Histogram Graph (RHG) • Directed graph • Nodes = registers/literal • Edge between two nodes whose registers are consecutive in the code • Iterative search finds optimal encoding between each pair. • Complexity of O(|E|*|R|2) R = set of all registers E = number of edges

  9. Application of Heuristic • Applied primarily on major application loops • Special care taken to preserve def-use chains between loops • Adds trivial number of instructions at “hot spots” • Profile information may be used to prioritize which order to visit basic blocks • Can be used within compilation system or as stand-alone tool operating on binary code

  10. Experimental Results (How they supported their findings) • Used modified version of SimpleScalar • Made Control Flow Graph for each “Hot Spot” • Generated Basic Block Frequencies • Encapsulated RPB and RPT into stand-alone module • Input the generated CFG into this module • Ran module on six different benchmarks • RPT Improvement as high as 25% • RPB Improvement as high as 44%

  11. Conclusions • Presented compiler-driven, power-aware register name adjustment (RNA) algorithm • Formally defined as NP-Complete • Two efficient heuristics for attacking problem • RPB – commutativity and dead register reassignment • RPT – register pair frequencies and remappings • Significant power improvements resulting from compiler-based optimization (no additional hardware support needed) • Independent of ISA • Easily integrated within any compilation framework

More Related