100 likes | 243 Vues
This research explores methods to optimize static power dissipation in superscalar processors by managing functional units. It addresses the issues of Instruction-Level Parallelism (ILP) distribution and idle periods during program execution, offering solutions like Dual Threshold Voltage and Power Gating. The study discusses a compiler approach for identifying suitable regions to turn off functional units, including how to handle latencies associated with power gating. Experimental results indicate an improvement in utilization by up to 90% with minimal performance degradation, showcasing effective strategies for static power optimization.
E N D
Optimizing Static Power Dissipation By Functional Units in Superscalar Processors Siddharth Rele, Santosh Pande, Soner Onder, and Rajiv Gupta
Motivation and Research Goals • ILP distributed non-uniformly throughout a program. • Functional Units idle for prolonged periods of time during program execution. • Dynamic power vs static power
Solving The Static Power Problem • Dual Threshold Voltage • Power Gating -Turn Off Devices by Cutting Off the Supply Voltage • Turning On Latency
Power Gating • Using hardware to identify idle regions -Additional Power Requirements -Warmup Time • The Compiler Approach -Identify Suitable Regions -Determine the number and type of the Functional Units to be Turned Off
Issues • Tolerating the Latency of Turning a Functional Unit Off • Tolerating the Latency of Turning a Functional Unit On • Dealing with Variable Length Idle Regions
Architectural Support • Power Aware Instruction Set -on and off directives as instruction suffixes • On and Off Semantics for an OOO Superscalar Processor -processor-deadlock and pending-off stage • Nullifying Spurious Off-On Pairs
Example mul.off if(x>10){ wait=0; while(1){ wait++; if(wait==1000) break; }} mul.on
Compiler Support • Identify Hot Blocks and corresponding Functional Unit Requirements -Access to ILP required when multiple Functional Units of the same type • Computing the PAFG -identifying longer periods of time over which a functional unit can be turned off • Place the off and on directives at right places
Experimental Methodology • Compiler-lcc • Code Generator-lburg • Cycle-level Simulator-FAST system • Benchmarks: -MediaBench(rawcaudio.c ,rawdaudio.c) -DSPstones(fir2dim.c,n-real-updates.c, fir.c) -SPEC95(compress.c)
Experimental Results • Utlization improved upto 90% • Performance Degradation less than 1% • Average Duration for which Units turned off quite long-several hundred to thousands cycles • Nullification Strategy quite effective for certain benchmarks.