Compiler-Directed instruction cache leakage optimizations

Compiler-Directed instruction cache leakage optimizations Discussed by Raid Ayoub CSE DEPARTMENT

Outline • Power consumption in CMOS • Motivation • Handling the problem at the compiler level • Breaking the at the loop level • Examining various strategies for turnoff • Loop level optimizations • Experimental results • Summary

Leakage power in CMOS • Leakage Power: power consumed due to subthreshold leakage current • Leakage power: 30% of L1 power and 70% of L2 power for 0.13u • Reducing leakage power improves battery life times and reliability • Leakage power will be increasingly significant Vdd • Handling leakage power at the circuit level: Putting cell(s) in low leakage mode • State destroying mode (Vdd close to 0) • Reduces static power dramatically  Destroys state  Incurs time and power overhead during the switch to active mode • State preserving mode (lowering Vdd) • Reduces static power • Preserves stored data  Incurs time and power overhead during the switch to active mode (off) (on) (on) (off) SRAM cell

Handling Leakage power of caches • Challenges in managing leakage-control modes of cache: • Distill at run time the inactive instructions and place them in leakage mode • Potential for performance degradation: Putting active instructions in leakage mode • Hardware overhead: power and latency • Previous approaches: Utilizearchitectural-level techniques • Utilize hardware monitoring to manage the leakage modes of the cache • Limitations: •  Hardware complexity •  Power savings could be moderate

Proposed approach • Utilize compiler based technique • Identify the last use of instructions and place them in leakage mode • Special instructions are used to place cache lines into leakage modes • Goals: • Simplifies hardware support • Improve power savings • Two compiler-based strategies are studied: • Conservative • Optimistic • Two leakage savings mechanisms are utilized • State destroying and state preserving mode

Loop Body-I Loop Body-III Loop Body-II Compiler strategies • Turn off instructions are applied at the loop level granularity • Conservative: • the lines for loop body-I can only be turned off when exiting loop III •  Less effective when loop III encloses the majority of instructions in the code • Optimistic: • the lines for loop body-I can be turned off every time exiting loop I (Assume loop II take long time)

Examining various strategies • Experimental results show that • Strategy 3 is the most successful one • Strategy 3 is competitive to other proposed schemes • Strategies 1 and 2 are less successful due to the overhead of caches misses imposed by state destroying • Hybrid strategy • Policy: When exiting the loop: • if the loop will be visited again, put the cache lines in leakage mode (state preserving) • If the loop is dead, turn off cache lines (state-destroying) • Hybrid strategy performed somewhat similar to strategy 3

Compiler optimizations • Loop Distribution • Fewer cache lines need to be activated at any given time  Potential in reducing conflicts in L2 cache and improve performance • Possible destroy in data cache locality • Loop Fusion • Potential in enhancing data locality  Increase number of active lines at a given time  Potential in reducing the time duration for the active lines Header Header Loop Body-I Loop Body-I Body-I Body-I Header Body-II Body-II Loop Body-II Loop Body-II Loop distribution Loop fusion

Compiler optimizations Impact of various optimizations on leakage energy on adi • Experimental results show that Loop distribution is the most successful • optimization in reducing leakage energy

Summary • Proposed approach delivers competitive energy savings and energy-delay to other schemes • Applying loop optimizations improves energy savings • Hardware support is simple • Adding explicit turn-off instructions impose modifications into ISA

Compiler-Directed instruction cache leakage optimizations

Compiler-Directed instruction cache leakage optimizations

Presentation Transcript

Compiler Managed Dynamic Instruction Placement In A Low-Power Code Cache

DIFFERENTIATING TEACHER-DIRECTED INSTRUCTION:

Generating Compiler Optimizations from Proofs

Teacher-Directed Instruction

Weakest Precondition Synthesis for Compiler Optimizations

Activity: Teacher-Directed Instruction

Optimizing Compiler . Scalar optimizations .

Reducing Misses using Compiler Optimizations

Optimizing compiler . Interpocedural optimizations .

Exploiting Program Hotspots and Code Sequentiality for Instruction Cache Leakage Management

Compiler Speculative Optimizations

Performance Analysis and Compiler Optimizations

Compressed Instruction Cache

Advanced Cache Optimizations

Compiler Optimizations

CSC D70: Compiler Optimization Memory Optimizations

Optimizing Compiler . Scalar optimizations .

Compressed Instruction Cache

5 Basic Cache Optimizations