1 / 1

Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors

Madhavi Valluri and Lizy John Laboratory for Computer Architecture University of Texas at Austin www.ece.utexas.edu/projects/ece/lca. Where does power go?. Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors. Rest. Rest. Fetch. Caches. IALU. dcache.

haruki
Télécharger la présentation

Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Madhavi Valluri and Lizy John Laboratory for Computer Architecture University of Texas at Austin www.ece.utexas.edu/projects/ece/lca Where does power go? Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Rest Rest Fetch Caches IALU dcache OoO issue FPU OoO issue FPU • CASE 1 : Assume R15 & r24 depend • on input data; Potential address aliasing; • cannot be resolved statically • CASE 2: R15 and R24 are starting addresses of • two arrays a & b; no aliasing problem exists Alpha 21464 IALU MMU Fetch Compiler Schedule (case 1) Cycle 1 Instr (1) Cycle 2 Instr (2) Cycle 3 Instr (3) SIX CYCLES Cycle 4 Instr (4) PER ITERATION Cycle 5 nop Cycle 6 Instr (5) Exec Mem Motivating Example OoO issue Compiler Schedule (case 2) Cycle 1 Instr (1) Instr (4) Cycle 2 Instr (2) nop Cycle 3 Instr (3) Instr (5) THREE CYCLES PER ITERATION Out-of-order issue logic* used to identify parallel instructions to issue each cycle consumes ~25-50% of processor power OoO Issue Logic Main Components: Issue Queue & Reorder Buffer - Accessed multiple times each cycle - Highly Associative - Multiple locations accessed - Large number of ports - Complex Select/Wakeup logic Loop: { (1) add R1, R2, R5; (2) sub R5, R2, R1; (3) st R1, 0(R15); (4) ld R9, 8(R24); (5) add R4, R9, R10; } NO difference between dynamic schedule and compiler schedule! Dynamic Schedule Cycle 1 Instr (1) Instr (4) Cycle 2 Instr (2) nop Cycle 3 Instr (3) Instr (5) THREE CYCLES PER ITERATION • Dynamic scheduling hardware justified inirregular • regions of the program • In regularregions of programs (regions with • well-structured control-flow, regular memory access • patterns etc), hardware scheduling is redundant • SOLUTION: Hybrid-Scheduling Approach Assumed operation latencies ALU – 1 cycle LD/ST – 2 cycles Dynamic schedule is 2X better than compiler schedule a.k.a: Dynamic Issue Logic The Hybrid-Scheduling Approach Preliminary Results (ISLPED 2003) • Programs divided into low power static regions and high power dynamic region • Code in S-Regions scheduled by compiler alone into packets of independent instructions • Instruction packets issued to functional units without any dynamic dependence checks • 2 modes of execution – Superscalar mode for dynamic regions + Static mode for S-Regions • Large savings in static mode since OoO logic bypassed • Resource requirements of program varies with program phase • Dynamic resource adaptation schemes typically lower processor resources when program IPC is low and increase resources when IPC is high • Hybrid-Scheduling scheme eliminates power-hungry resource use for high IPC (or high ILP) regions Comparing with Dynamic Resource Adaptation Schemes • Benchmarks & S-Region Selection • Media & SpecFP benchmarks • Potential S-Regions - Loops • without function calls • Final S-Regions selected by • profiling • 30-99% time spent in S-Regions Future Research Directions Program • The real challenge: Adapting • Hybrid-scheduling for irregular, • SPECint-like applications • Minimal time spent in loops • without function calls (previously • explored type of S-Region) • Ongoing research – Investigating • alternate S-Regions – Hyperblocks, • Superblocks and Traces Rename O-o-O Issue Reorder Buffer alu Dynamic Region alu Decode alu Special instruction to switch mode Low Power Reorder Buffer alu • Experimental Results • 30% average energy savings • 3.6% average performance drop Static Region

More Related