500 likes | 599 Vues
Explore innovative techniques and models for managing thermal issues in modern microprocessors, optimizing performance while ensuring temperature control. Discover insights on power modeling, clustering, and architecture-level thermal management. Learn about HotSpot projects and dynamic thermal modeling. Dive into the efficient design of packages and heat dissipation strategies for microprocessor components.
E N D
Overview • Motivation (Kevin) • Thermal issues (Kevin) • Power modeling (David) • Thermal management (David) • Optimal DTM (Lev) • Clustering (Antonio) • Power distribution (David) • What current chips do (Lev) • HotSpot (Kevin)
PowerPC G3 Microprocessor • On-chip temperature sensor (junction temperature) • Based on differential voltage change across 2 diodes of different sizes • Implemented in PowerPC G3/G4 processors • OS required for control • Instruction Cache Throttling used to dynamically lower junction temperature
Pentium III • On-die thermal diode • Coupled with board-level thermal diode sensor • Uses • Monitor long-term temperature and environmental trends • Provide indication of catastrophic failure
Pentium 4 • Thermal ramp rates ~50ºC/second(over whole package) • Much too high for coarse-grained solutions • Thermal Monitor • Highly-accurate on-die temperature sensing circuit • Fast acting temperature control circuit (~50ns) Temperature Sensing Diode PROCHOT# Current Comparator Reference Current Source
Pentium 4 -- Thermal Monitor • Trip Point is calibrated at manufacturing time • Simple response • Turn processor clocks on/off at 50% duty cycle • For 1.5GHz processor, ~2s on + ~2s off?
Pentium 4 -- Results • For 200 traces (TPC-C, SPEC, Microsoft) • Thermal design point can be reduced to 75% of true “max power” with minimal performance loss
Pentium 4 • Thermal monitors allow • Tradeoff between cost and performance • Cheaper package • More triggers, Less Performance • Expensive package • No triggers, no performance loss
Architecture-level Thermal Management • Dynamically adjust execution to control temperature • Avoid catastrophic failure (heat sink, fan) • Permit use of less expensive package • Design for less than the worst case • Package costs ~$1/W above ~40 W • Heat sinks, heat pipes, thinned wafers, fans • Fans reduce battery life • Peak power as high as 150 W now and > 200W in 1-2 generations • Temperatures over 100°C • More fundamentally -- there is a need for architecture-level thermal modeling • What’s actually going on in there?
HotSpot project • Collaboration between HPLP and LAVA Labs (ECE and CS depts. UVa) • Deal with “hot spots” • Localized heating occurs muchfaster than chip-wide • microsec. to millisec. • Chip-wide treatment is too conservative • seconds to minutes • but there is significant lateral thermal coupling through the package • How do we model this? • Prove temperature will be safely bounded
Hot spots in Power4 Temperature “landscape”: space and time How to estimate early in the design cycle?
Thermal modeling • Want a fine-grained, dynamic model of temperature • At a granularity architects can reason about • That accounts for adjacency and package • That does not require detailed designs • That is fast enough for practical use • HotSpot- a compact model based on thermal R, C • Parameterized to automatically derive a model based on various • Architectures • Power models • Floorplans • Thermal Packages
T_hot T_amb Dynamic compact thermal model Electrical-thermal duality V temp (T) I power (P) R thermal resistance (Rth) C thermal capacitance (Cth) RC time constant (Rth Cth) Kirchoff Current Law differential eq. I = C · dV/dt + V/R thermal domain P = Cth · dT/dt + T/Rth where T = T_hot – T_amb At higher granularities of P, Rth, Cth P, T are vectors and Rth, Cth are circuit matrices
Package we model Heat sink IC Package Heat spreader PCB Pin Die Interface material
Modeling the package • Thermal management allows for packaging alternatives/shortcuts/interactions • HotSpot needs a model of packaging • Basic thermal model: • Heat spreader • Heatsink • Interface materials (e.g. phase-change films) • Fan/Active cooler (TEC) • Thermal resistance due to convection • Constriction and bulk resistance for fins • Spreading constriction and bulk resistance for heatsink base and heat spreader • Thermal resistance for bonding material • Thermal capacitance heat spreader and heatsink
“Optimal” package • Default package is found using: • Power dissipation • Target temperature on chip • Chip area • Clock speed – high or low performance • Power dissipation and target temperature used to determine resistance value needed • Needs more work: modern packages are incredibly complex, yet there is still a need to model at higher levels Now: what can we do with HotSpot?
Equivalent vertical network • Diagram is simplified – peripheral nodes Chip Interface Peripheral spreader nodes Spreader Interface + Sink Convection
Vertical network parameters • Resistances • Determined by the corresponding areas and their cross sectional thickness • R = resistivity x thickness / Area • Capacitances • C = specific heat x thickness x Area • Peripheral node areas Spreader North East West Chip South
Lateral resistances • Determined by the floorplan and the length of shared edges between adjacent blocks • "Heat Spreading and Conduction in Compressed Heatsinks", Jaana Behm and Jari Huttunen, in proceedings of the 10th International Flotherm User Conference, May 2001.
Lateral resistances – contd... • Lengths used for silicon • Lengths used in the spreader
Our model (lateral and vertical) Interface material(not shown)
Temperature equations • Fundamental RC differential equation • P = C dT/dt + T / R • Steady state • dT/dt = 0 • P = T / R • When R and C are network matrices • Steady state – T = R x P • Modified transient equation • dT/dt + (RC)-1 x T = C-1 x P • HotSpot software mainly solves these two equations
HotSpot • Time evolution of temperature is driven by unit activities and power dissipations averaged over 10K cycles • Power dissipations can come from any power simulator, act as “current sources” in RC circuit ('P' vector in the equations) • Simulation overhead in Wattch/SimpleScalar: < 1% • Requires models of • Floorplan: important for adjacency • Package: important for spreading and time constants • R and C matrices are derived from the above
Implementation • Primarily a circuit solver • Steady state solution • Mainly matrix inversion – done in two steps • Decomposition of the matrix into lower and upper triangular matrices • Successive backward substitution of solved variables • Implements the pseudocode from CLR • Transient solution • Inputs – current temperature and power • Output – temperature for the next interval • Computed using a fourth order Runge-Kutta (RK4) method
Transient solution • Solves differential equations of the form dT + AT = B where A and B are constants • In HotSpot, A is constant but B depends on the power dissipation • Solution – assume constant average power dissipation within an interval (10 K cycles) and call RK4 at the end of each interval • In RK4, current temperature (at t) is advanced in very small steps (t+h, t+2h ...) till the next interval (10K cycles) • RK – `4` because error term is 4th order i.e., O(h^4)
Transient solution contd... • 4th order error has to be within the required precision • The step size (h) has to be small enough even for the maximum slope of the temperature evolution curve • Transient solution for the differential equation is of the form Ae-Bt with A and B are dependent on the RC network • Thus, the maximum value of the slope (AxB) and the step size are computed accordingly
Validation • Validated and calibrated using MICRED test chips • 9x9 array of power dissipators and sensors • Compared to HotSpot configured with same grid, package • Within 7% for both steady-state and transient step-response • Interface material (chip/spreader) matters
Current features • Specification of arbitrary floorplans • Format of floorplan file: • One line per unit • Line format – <unit-name> \t <width> \t <height> \t <left-x> \t <bottom-y> \n • Takes a power trace file as an input and outputs corresponding temperature trace • Ability to modify package specifactions (type of interface material, size and type of heat spreader and heat sink etc.)
Modeled after an Alpha 21364 Current floorplan
Soon to be features • Grid model – RC network per grid cell instead of a block • Temperature models for wires, pads and interface material between heat sink and spreader • Better (more user friendly) floorplan specification • Automatic floorplan generation using classical floorplanning algorithms
Better floorplan specification • Floorplan of current microprocessors has a structural similarity • Floorplans similar to MIPS R10K, Pentium and the Alpha 21264 • Pipeline order corresponds to floorplan adjacency
Better floorplan specification • Sample specification (with % areas) that takes advantage of pipeline order
Automatic floorplan for architects • Why develop an architectural floorplanning tool? • Thermal modeling requires adjacency information. • Wire delays make performance depend on the floorplan. • Goal • Derive a realistic floorplan using only microarchitectural information • Trade off thermal efficiency against latency • Simulated annealing based floorplan optimization for thermal, delay and combined metrics • Current work. Results will be available soon
Sensors Caveat emptor: We are not well-versed on sensor design; the following is a digest of information we have been able to collect from industry sources and the research literature.
Desirable Sensor Characteristics • Small area • Low Power • High Accuracy + Linearity • Easy access and low access time • Fast response time (slew rate) • Easy calibration • Low sensitivity to process and supply noise
PowerPC G3 • (Sanchez et al, Symp. on VLSI Circuits ‘97, COMPCON ‘97) • 0.35 μ, 2.5V • Area 0.2 mm2 • Power: 10 mW • Precision: ±4.5° • Offset: 12° at process corners • Linearity: < ±4° • Based on thermal diodes and current mirrors
Types of Sensors (In approx. order of increasing ease to build) • Thermocouples – voltage output • Junction between wires of different materials; voltage at terminals is α Tref – Tjunction • Often used for external measurements • Thermal diodes – voltage output • Biased p-n junction; voltage drop for a known current is temperature-dependent • Biased resistors (thermistors) – voltage output • Voltage drop for a known current is temperature dependent • You can also think of this as varying R • Example: 1 KΩ metal “snake” • BiCMOS, CMOS – voltage or current output • Rely on reference voltage or current generated from a reference band-gap circuit; current-based designs often depend on temp-dependence of threshold
Thermal Sensors in PowerPC • On-chip temperature sensor (junction temperature) • Based on differential voltage change across 2 diodes of different sizes • Implemented in PowerPC G3/G4 processors • Instruction Cache Throttling used to dynamically lower junction temperature
Typical Sensor Configuration PTAT – Proportional to Absolute Temperature
Absolute Sensor 1 Syal, Lee, Ivanov, Altet, Online Testing Workshop, 2001 Schematics of Delta Vgs Current Reference (left) Generator and Delay Cell (right)
Sensors: Problem Issues • Poor control of CMOS transistor parameters • Noisy environment • Cross talk • Ground noise • Power supply noise • These can be reduced by making the sensor larger • This increases power dissipation • But we may want many sensors
“Reasonable” Values • Based on conversations with engineers at Sun, Intel, and HP (Alpha) • Linearity: not a problem for range of temperatures of interest • Slew rate: < 1 μs • This is the time it takes for the physical sensing process (e.g., current) to reach equilibrium • Sensor bandwidth: << 1 MHz, probably 100-200 kHz • This is the sampling rate; 100 kHz = 10 μs • Limited by slew rate but also A/D • Consider digitization using a counter
“Reasonable” Values: Precision • Mid 1980s: < 0.1° was possible • Precision • ± 3° is very reasonable • ± 2° is reasonable • ± 1° is feasible but expensive • < ± 1° is really hard • The limited precision of the G3 sensor seems to have been a design choice involving the digitization P: 10s of mW
Calibration • Accuracy vs. Precision • Analogous to mean vs. stdev • Calibration deals with accuracy • The main issue is to reduce inter-die variations in offset • Typically requires per-part testing and configuration • Basic idea: measure offset, store it, then subtract this from dynamic measurements
Dynamic Offset Cancelation • Rich area of research • Build circuit to continuously, dynamically detect offset and cancel it • Typically uses an op-amp • Has the advantage that it adapts to changing offsets • Has the disadvantage of more complex circuitry
Role of Precision • Suppose: • Junction temperature is J • Max variation in sensor is S • Thermal emergency is T • T = J – S • Spatial gradients • If sensors cannot be located exactly at hotspots, measured temperature may be G° lower than true hotspot • T = J – S – G
Rate of change of temperature • Our FEM simulations suggest maximum 0.1° in about 25-100 μs • This is for power density < 1 W/mm2 die thickness between 0.2 and 0.7mm, and contemporary packaging • This means slew rate is not an issue • But sampling rate is!
Sensors Summary • Sensor precision cannot be ignored • Reducing operating threshold by 1-2 degrees will affect performance • Precision of 1° is conceivable but expensive • Maybe reasonable for a single sensor or a few • Precision of 2-3° is reasonable even for a moderate number of sensors • Power and area are probably negligible from the architecture standpoint • Sampling period <= 10-20 μs
HotSpot Summary • HotSpot is a simple, accurate and fast architecture level thermal model for microprocessors • Over 90 downloads till now • Ongoing active development – architecture level floorplanning will be available soon • Download site • http://lava.cs.virginia.edu/HotSpot • Mailing list • www.cs.virginia.edu/mailman/listinfo/hotspot
Temperature-aware computing: Optimize performance subject to a thermal constraint