110 likes | 253 Vues
Compiler-Based Register Name Adjustment for Low-Power Embedded Processors. Discussion by Garo Bournoutian. Introduction. Problem: High power consumption due to bit transitions on instruction bus Current compilers allocate registers with minimal amount of spill/fill code
E N D
Compiler-BasedRegister Name Adjustment for Low-PowerEmbeddedProcessors Discussion by Garo Bournoutian
Introduction • Problem: High power consumption due to bit transitions on instruction bus • Current compilers allocate registers with minimal amount of spill/fill code • Importance: Growing number of mobile, battery-powered devices. • Solution allows for longer battery-life, larger die sizes • Approach: rearrange, rename registers in code to allow for minimal bit transitions
What is a bit transition? Assembly Code: Instruction Word: add r3, r2, r4 … 0011 0010 0100 … sub r6, r3, r5 … 0110 0011 0101 … sub r3, r2, r6 … 0011 0010 0110 … mul r4, r4, r5 … 0100 0100 0101 … Transitions / field: 7 4 5 Total transitions: 16
An Example • The original code had a total of 16 transitions: add r3, r2, r4 sub r6, r3, r5 sub r3, r2, r6 mul r4, r4, r5 • The optimized code now has a total of 10 transitions: • add r6, r2, r4 • sub r7, r6, r5 • sub r6, r2, r7 • mul r4, r4, r5 • Just renaming r3 to r6 and r6 to r7, you have a 37% reduction of bit transitions.
Formulation • Must map code in basic blocks to numerical structures ld r5, (r1)0 add r3, r2, r5 add r4, r3, r2 mul r3, r4, r3 st r3, (r7)10
Heuristic Solution • Solving this problem for multiple basic blocks and literals is NP-Complete (Traveling Salesman Problem) • Effective, efficient heuristic solution for RNA requires two steps: • Register PerturBation (RPB) • Maximizes distribution skew of register pairs • Register PermuTation (RPT) • Uses frequencies of register pairs to minimize hamming distance
Register PerturBation • Commutativity Transformation • add,mul, and, or operations • No side-effects in code performance • Linear Time Complexity • Dead Register Reassignment r1 r2, r3 r1 r2, r3 r4 r1, r2 r2 r1, r2 r2 r3, r4r2 r3, r2 • Linear Time Complexity
Register PermuTation • Capture utilization frequency of register/literal pairs by means of Register Histogram Graph (RHG) • Directed graph • Nodes = registers/literal • Edge between two nodes whose registers are consecutive in the code • Iterative search finds optimal encoding between each pair. • Complexity of O(|E|*|R|2) R = set of all registers E = number of edges
Application of Heuristic • Applied primarily on major application loops • Special care taken to preserve def-use chains between loops • Adds trivial number of instructions at “hot spots” • Profile information may be used to prioritize which order to visit basic blocks • Can be used within compilation system or as stand-alone tool operating on binary code
Experimental Results (How they supported their findings) • Used modified version of SimpleScalar • Made Control Flow Graph for each “Hot Spot” • Generated Basic Block Frequencies • Encapsulated RPB and RPT into stand-alone module • Input the generated CFG into this module • Ran module on six different benchmarks • RPT Improvement as high as 25% • RPB Improvement as high as 44%
Conclusions • Presented compiler-driven, power-aware register name adjustment (RNA) algorithm • Formally defined as NP-Complete • Two efficient heuristics for attacking problem • RPB – commutativity and dead register reassignment • RPT – register pair frequencies and remappings • Significant power improvements resulting from compiler-based optimization (no additional hardware support needed) • Independent of ISA • Easily integrated within any compilation framework