1 / 36

Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation

Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation. Advanced Computer Architecture Laboratory The University of Michigan Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, and Conrad Ziesler

rhoda
Télécharger la présentation

Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Razor: Dynamic Voltage Scaling Based on Circuit-Level Timing Speculation Advanced Computer Architecture Laboratory The University of Michigan Dan Ernst, Nam Sung Kim, Shidhartha Das, Sanjay Pant, Rajeev Rao, Toan Pham, and Conrad Ziesler Faculty Members: David Blaauw, Todd Austin, and Trevor Mudge Krisztián Flautner, ARM Ltd. December 3rd, 2003

  2. Intra-die variations in ILD thickness Dynamic Voltage Scaling and Design Uncertainty • DVS - Adapting voltage/frequency to meet performance demands of workload • Lower processor voltage during periods of low utilization • Lower Voltage is a Good Thing™ for power • Minimum voltage is limited by Safety Margins • Error-free operation must be guaranteed! • Technology trends are Maximizing the Minimums • Process and temperature variation • Capacitive and inductive noise • Key Observation: worst-case conditions also highly improbable • Significant gain for circuits optimized for common case • Efficient mechanisms needed to tolerate infrequent worst-case scenarios

  3. Traditional DVS Zero margin Sub-critical Shaving Voltage Margins with Razor • Goal: reduce voltage margins with in-situ error detection and correction for delay failures • Proposed Approach: • Remove safety margins and tolerate occasional errors • Tune processor voltage based on error rate • Purposely run below critical voltage • Data-dependent latency margins • Trade-off: voltage power savings vs. overhead of correction • Analogous to wireless power modulation

  4. Main FF Shadow Latch Main FF Razor Timing Error Detection • Second sample of logic value used to validate earlier sample • Key design issues: • Maintaining pipeline forward progress - Meta-stable results in main flip-flop • Short path impact on shadow-latch - Recovering pipeline state after errors • Power overhead of error detection and correction 5 9 3 9 MEM 4 9 clk clk clk_del

  5. Main FF Shadow Latch Main FF Hold Constraint (~1/2 cycle) Razor Short Path Constraint • Second sample of logic value used to validate earlier sample • Key design issues: • Maintaining pipeline forward progress - Meta-stable results in main flip-flop • Short path impact on shadow-latch - Recovering pipeline state after errors • Power overhead of error detection and correction 3 5 9 9 8 MEM 2 4 8 clk clk clk_del

  6. Razor FF Razor FF PC Centralized Razor Pipeline Error Recovery Cycle: 2 0 3 6 5 1 4 inst2 inst1 inst6 inst4 inst5 inst3 IF ID EX MEM WB (reg/mem) Razor FF Razor FF error error error error recover recover recover recover clock • Once cycle penalty for timing failure • Global synchronization may be difficult for fast, complex designs

  7. Stabilizer FF Razor FF Razor FF Razor FF Razor FF PC Distributed Razor Pipeline Error Recovery Cycle: 3 2 5 1 0 7 8 9 6 4 inst3 inst4 inst7 inst1 inst8 inst3 inst4 inst2 inst5 inst6 inst2 IF ID EX MEM (read-only) WB (reg/mem) error bubble error bubble error bubble error bubble recover recover recover recover Flush Control flushID flushID flushID flushID • Multiple cycle penalty for timing failure • Scalable design since all recovery communication is local • Builds on existing branch / data speculation recovery framework

  8. Error-Rate Studies – Hardware Measurement

  9. 35% energy savings with 1.3% error 22% saving once every 20 seconds! Error Rate Studies – Empirical Results

  10. Error Rate Studies – SPICE-Level Simulations • Based on a SPICE-level simulations of a Kogge-Stone adder 200 mV

  11. 3 mm I-Cache Register File WB 3.3 mm IF ID EX MEM D-Cache Razor I - Prototype Razor Implementation • 4 stage 64-bit Alpha pipeline: • 200MHz expected operation in 0.18mmtechnology, 1.8V, ~500mW • Tunable via software from50-200MHz, 1.1-1.8V • Razor applied to combinational logic • Razor overhead: • Total of 192 Razor flip-flops out of 2408 total (9%) • Error-free power overhead: ~ 3%

  12. Pipeline Throughput Energy IPC Total Energy, Etotal = Eproc + Erecovery Optimal Etotal Energy of Processor Operations, Eproc Energy of Pipeline Recovery, Erecovery Energy of Processor w/o Razor Support Decreasing Supply Voltage Effects of Razor DVS

  13. EX-Stage Analysis – Optimal Voltage Sweep Recovery cost includes energy to recover entire pipeline (18x an add)

  14. EX-Stage Analysis – Optimal Voltage Sweep

  15. Simulation Analysis – Energy-Optimal Voltage

  16. Simulation Analysis – Razor DVS Execution

  17. Simulation Analysis – Razor DVS Performance

  18. clk D1 Q1 0 Main Flip-Flop PC Razor FF Stabilizer FF Razor FF Razor FF Razor FF 1 Error_L Shadow Latch comparator IF ID EX MEM (read-only) WB (reg/mem) Error RAZOR FF clk_del bubble bubble bubble error error bubble error error recover recover recover recover Flush Control flushID flushID flushID flushID Conclusions • In-situ detection/correction of timing errors • Eliminate process, temperature, and safety margins • Tune processor voltage based on error rate • Purposely run below critical voltage to capture data-dependent latency margins • Implemented with architecture/circuit support • Double-sampling metastability-tolerantRazor flip-flops validate logic results • Pipeline initiates recovery after circuit timing errors, no voltage/clock re-tuning needed • Trade-off: supply voltage power savingsvs. overhead of correction • Running with error is good!

  19. Future Directions • Research opportunities • Razor for caches/memory and control logic • Voltage control algorithms, especially per-stage tuning • Typical-case energy optimized designs (instead of worse-case latency optimized) • Turnkey application of Razor technology • Prototype design, fabrication, evaluation • Razor I – Q4 2003 – Razor-ized combinational logic, global tuning • Razor II – Q3 2004 – Razor-ized caches and control logic, per-stage tuning • Other applications • Single-event upset (SEU) protection using Razor error detection/re-execution • Over-clocking for performance improvement (large gains among hobbyists)

  20. Questions ? ? ? ? ? ? ? ? ? ? ? ?

  21. Back-up Slides

  22. Mem C ontrol Data cache I O U N I T Floating point and graphics Ex Unit Control Unit Cache control control L2 tags L2 Cache L2 Cache Other Approaches to Dynamic Voltage Scaling • Traditional DVS • Valid voltage / delay combinations “blessed” at design time • Approach leaves a significant amount of energy “on the table” • Temperature, process, data, and safety margins placed on voltage • Other approaches miss some margins • Slack detector – automatic tuning • ARM’s Intelligent Energy Manager (IEM) • Processor voltage automatically tuned toexternal ambient conditions • Inverter chain designed to track mostrestrictive critical path, margin still required

  23. Logic Stage L1 Logic Stage L2 0 1 Error_L Shadow Latch clk_del Razor Flip-Flop Implementation • Compare latched data with shadow-latch on delayed clock • Upon failure: place data from shadow-latch in main latch • Ensure shadow latch always correct using conservative design techniques • Correct value in shadow latch guarantees forward progress • Recover pipeline using microarchitectural recovery mechanism clk Q D Main Flip-Flop comparator Error RAZOR FF

  24. clk_b clk D Q clk_b clk Meta-stability detector Inv_n Error_L Inv_p clk_del_b clk_del Shadow Latch Razor Flip-Flop Circuit Error_L

  25. clock intended path short path Min. Path Delay > tdelay + thold clock_del tdelay thold Min. path delay Overcoming Short Path Constraints • Delayed clock imposes a short-path constraint • Razor necessary only for latches on slow paths • Pad fast path for latches with mixed path delays • Trade-off between DVS headroom and short path constraints ff Pad with extra delay Razor_ff Long Paths Short Paths clock

  26. X X X Hardware Measurement Setup Slow Pipeline A 36 18 18x18 48-bit LFSR != 40-bit Error Counter clk/2 clk/2 Slow Pipeline B 36 clk/2 18x18 48-bit LFSR clk/2 clk/2 18 Fast Pipeline 36 stabilize 18x18 clk clk clk

  27. Simulation Methodology • Challenge: instruction latency depends on circuit evaluation latency • May vary with changes in stage inputs, stage logic, voltage, temperature… • Dynamic timing simulation combines architectural/circuit simulation • Initial implementation utilized a hand-generated EX-stage circuit model • Effort ongoing to automate extraction/decomposition/integration into SimpleScalar

  28. reset Ediff = Eref - Esample Pipeline Voltage Control Function Voltage Regulator Esample Vdd Ediff . . . error signals Eref -  Supply Voltage Control System • Current design utilizes a very simple proportional control function • Control algorithm implemented in software

  29. Error Pipeline Recovery IF ID EX MEM MEM WB inst inst inst inst inst inst clk clk_d ID.d EX.d Redo instruction in MEM MEM.d No Error error

  30. Utilization Time Voltage Scaling under Dynamic Workloads • Adapt frequency/voltage to performance demands of workload • Software controlled processor speed • Lower processor voltage during periods of low operating frequency Vdd Freq Voltage • Quadratic reduction in dynamic power and energy • Super-quadratic reduction in leakage

  31. High-level HDL Specification WB IF ID EX MEM PC FF FF FF FF Circuit Extractionwith Parasitics Variable Voltage SDF generation Architecture Specification Power/Delay C-model SimpleScalar + DTA Voltage Control Algorithm Detailed Power/Delay Analysis Simulation Flow • Automatic creation of very detailed power/delay C-models

  32. Simulation Methodology • Dynamic timing simulation combines architectural/circuit simulation • Contrast to static timing simulation which is only concerned with critical path • SimpleScalar/Alpha architectural-level simulation • Gate-level simulation of per-stage logic blocks • Logic block model describes cells, local and global interconnect • Cells characterized with SPICE at varied slew/cap-load/voltage • Each cycle, circuit simulator evaluates delay of each stages’ logic block\ 0 1 0 1 1 0 1 0 1 1 1

  33. Simulation Analysis – Razor DVS Execution

  34. Razor Demo

  35. pos neg pos error fail Dynamic Or / Latch restore restore bubble bubble flush flush neg More Details on Meta-Stability • Sub-critical operation invites meta-stability • Meta-stability detector itself can become meta-stable • double latch error signal to obtain sufficient small probability clk_b clk D Q clk_b clk restore clk_del_b clk_del • Flush entire pipe • No forward progress • Reduce frequency

  36. I1 I2 I1 I2 Short Path Short Path Failure IF ID EX MEM WB inst1 inst2 inst2 inst1 inst1 clk clk_d ID.d EX.d MEM.d error

More Related