1 / 19

The Laboratory for Computer Architecture at Virginia (LAVA)

The Laboratory for Computer Architecture at Virginia (LAVA). Kevin Skadron University of Virginia Department of Computer Science. Why We Care About Thermal Management. Source: Tom’s Hardware Guide http://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html. Dynamic Thermal Management.

lemuel
Télécharger la présentation

The Laboratory for Computer Architecture at Virginia (LAVA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Laboratory for Computer Architecture at Virginia (LAVA) Kevin Skadron University of Virginia Department of Computer Science

  2. Why We Care About Thermal Management... Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html

  3. Dynamic Thermal Management • Dynamically adjust execution to control temperature • Avoid catastrophic failure (heat sink, fan) • Permit the use of a less expensive thermal package • Design for less than the worst case • Package costs ~$1 / W above ~40 W • Peak power as high as 130 W in 1-2 generations (SIA roadmap) • Temperatures over 100°C

  4. Dynamic Thermal Management • Deal with “hot spots” • Localized heating occurs much faster than chip-wide • Chip-wide treatment is too conservative • Prove temperature will be safely bounded

  5. Thermal Modeling • Want a fine-grained model of temperature • Power dissipation: too indirect, not easy to measure in HW

  6. “Ohm’s Law” for Temperature V  temp I  power R  thermal resistance C  thermal capacitance RC  time constant I · t V · t V = ------- + -------- C RC • Lets us compute stepwise changes in temperature for any granularity at which we can get P, T, R, C • steady-state: V = IR (T = PR)

  7. Thermal Modeling • Use thermal resistance and capacitance of Si • Develop computationally efficient model based on lumped values Pi · t Ti · t Ti = -------- + --------- Ci RiCi • Integrate in Wattch (power/performance simulator) • Time evolution of temperature is driven by unit activities and power dissipations on a per-cycle basis • Detect hot spots and activate thermal response • Typical time constant: 10-100 s

  8. Fetch Toggling • Fetch toggling • disable fetch every N cycles • 4/5, 2/3, 1/2, 1/3, 1/5, … IF ID EX MEM WB

  9. Fetch Toggling • Fetch toggling • disable fetch every N cycles • 4/5, 2/3, 1/2, 1/3, 1/5, … IF ID EX MEM WB IF ID EX MEM WB

  10. Fetch Toggling • Fetch toggling • disable fetch every N cycles • 4/5, 2/3, 1/2, 1/3, 1/5, … • How to set the fetch rate? IF ID EX MEM WB IF ID EX MEM WB

  11. Feedback-Control of Fetch Toggling • Formal feedback control PID: m = KC (e + KIe + Kdde/dt) • easy to compute • toggling = f(m) setpoint e m P T Actuator:I-fetch toggling Thermaldynamics Controller Temp. sensor measured T

  12. Other Thermal-Management Techniques • Fetch toggling • Fetch throttling • Decode throttling • Speculation control • Frequency/voltage scaling

  13. Per-Structure Response • Hot spots • Branch predictor (probed every cycle) • Load-store queue • L1 D-cache (for high-BW apps) • …most major structures are a hot spot for at least one SPEC2k app • Modified Wattch • Sampling rate: 1000 cycles (RC of hot spots is 10-100 s) • Base temp. of 100C (SIA roadmap) • Emergency threshold of 108 (Yuan/Hong SEMI-THERM ‘01) • Set point of 107.9

  14. Thermal Modeling: Where to go from here?(i.e., lots of research questions) • Floor-planning issues and granularity of lumped R/C values • Thermal coupling among blocks • Response lag in temperature sensors • Validation techniques • Visualization • How to deal with large time scales?

  15. Thermal Management: Where to go from here?(i.e., lots more research questions) • New mechanisms • Characterize benchmarks • When to use frequency/voltage scaling • Faster HW techniques for sensing temperature changes • Robust response despite sensor lag • Hot spots • Temperature effects on leakage current • Joint control of temp., power, and performance

  16. Thermal Management: Where to go from here?(i.e., lots more research questions) • New mechanisms • When to use clock scaling • Robust response despite sensor lag • Temperature effects on leakage current • Joint control of temperature, power, and performance

  17. Summary • New tools for thermal management • Models • Mechanisms Source: Tom’s Hardware Guidehttp://www6.tomshardware.com/cpu/01q3/010917/heatvideo-01.html

  18. Backup slides

  19. Performance Loss Performance loss reduced by 65%

More Related