Automated Microprocessor Stressmark Generation

Automated Microprocessor Stressmark Generation Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium HPCA 2008, Feb 19, Salt Lake City, UT

Energy, power, power density, temperature, voltage variation, … • First-class design constraints • Embedded processors • High-performance processors • Understanding and analysis of primary importance • Average: typical • Maximum: worst-case

Why care about worst-case? • Processor must operate properly under extreme conditions • Examples • Max power  power supply, DPM • Max temperature  thermal package, DTM • Max dI/dt  power delivery • Localized max power  hot spots  circuit failure, timing errors, etc. • Max temperature differentials  sensor placement

How to characterize worst-case? • Stressmarks • Hand-coded synthetic stress codes • Examples • Max power: Alpha’s Toast • Max dI/dt: Alpha’s Thumper • Limitations • Time-consuming to develop • Requires intimate understanding of system • Tied to a specific processor • Difficult to do in early design stages

A possible solution • Automatic stressmark generation • In two steps • BenchMaker • Generate synthetic benchmark from abstract workload model • StressMaker • Explore workload space by ‘turning knobs’ using BenchMaker and search for stressmarks

Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation through case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work

BenchMaker hardware abstract workload model instruction mix ILP synthetic benchmark I & D footprint benchmark synthesizer D stream strides branch transition simulator BB size

Instruction mix abstract workload model Fraction short int Fraction long int Fraction short fp Fraction long fp Fraction int loads Fraction int stores Fraction fp load Fraction fp stores instruction mix ILP I & D footprint D stream strides branch transition BB size

ILP abstract workload model Probability for inter-operation dependency distance = 1 = 2 = 3, 4 = 5, 6 = 7, 8 = 9, … , 16 = 17, … , 32 > 32 instruction mix ILP I & D footprint D stream strides branch transition BB size

I & D stream behavior abstract workload model No. unique I & D addresses Fraction memory operations with a local stride (at 32-byte block level) of 0, 1, 2, …, 8, or greater than 8 instruction mix ILP I & D footprint D stream strides branch transition BB size

Branch behavior abstract workload model Probability for a transition rate of 0%-10%, 10%-20%, etc. Avg and stdev of the basic block size distribution instruction mix ILP I & D footprint D stream strides branch transition BB size

Abstract workload model abstract workload model • Only 40 characteristics • Explicit goal • In contrast to prior work • Microarchitecture-independent instruction mix ILP I & D footprint D stream strides branch transition BB size

Synthetic benchmark generator • Program spine • Instruction types • Inter-operation dependencies • Stride assignment • Branch transition • Register assignment • Code generation add sub br add ld mul br add ld sub ld st br

Synthetic benchmark generator • Input: abstract workload model • Output: synthetic benchmark • C program with embedded assembly code • Benefit: synthetic benchmark converges after 10 million dynamic instructions

Experimental setup • sim-alpha validated Alpha 21264 simulator • Wattch for power modeling • HotSpot for thermal modeling • SPEC CPU2000 • 100M simulation points • Commercial workloads • SPECjbb2005, DBT2, DBMS

Synthetic clone benchmark preserves characteristics Original benchmark Synthetic clone benchmark 2.0 1.5 IPC 1.0 0.5 0.0 vpr gcc mcf gzip dbt2 twolf bzip2 crafty dbms vortex perlbmk jbb2005 Original benchmark Synthetic clone benchmark 35 30 25 20 EPI 15 10 5 0 vpr gcc mcf gzip dbt2 twolf bzip2 dbms crafty vortex perlbmk jbb2005

Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation using case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work

StressMaker BenchMaker synthetic benchmark abstract workload configuration microprocessor model abstract workload space exploration stressmark objective function: e.g., max power

Workload space exploration • Huge space • Heuristic search using genetic algorithm • Bio-inspired algorithm • Reduces likelihood for local optima • Iterative algorithm • Start from randomly generated solutions • Probabilistically retain solutions with highest objective function value • Generate new solutions using crossover & mutation • End result: stressmark

Max-power stressmark StressMaker SPEC CPU / commercial art 30 25 mesa SPECjbb2005 20 perlbmk gzip Power (Watts) 15 perlbmk perlbmk mesa gzip dbt2 gzip 10 eon mcf art 5 0 lsq alu fetch clock icache issue bpred regfile dcache window rename dispatch dcache2 resultbus • 8-wide OOO processor; 81.5Watts in total • assuming Wattch (0.18um, 1.2GHz, aggressive clock gating)

Max-power stressmark chars • Keep functional units busy • Uniform mix of instruction types • Keep issue logic busy • High ILP • No pipeline flushes • High branch predictability • Keep caches busy • Good locality  similar to hand-crafted stressmarks [Gowan et al., DAC’98] [Vishwanath, Intel Tech Journal, 2000]

Evaluation of genetic algorithm • Speed • Three orders of magnitude faster than exhaustive search • Effectiveness • Max-power stressmark through StressMaker achieves 99% of max-power stressmark through exhaustive search: 48Watts for 4-wide OOO processor

Max single-cycle power • Estimate max instantaneous (single-cycle) current drawn from the power supply • StressMaker’s stressmark: 72W • Its average power consumption: 32W • [4-wide OOO processor] • Maximum power assuming all units are 100% active: 85W • StressMaker gets 85% of theoretical maximum

dI/dt stressmark • Current swings cause ripples in supply voltage • dI/dt stressmark alternates between high and low power consumption [Joseph et al., HPCA’03] [Alpha’s Thumper] • StressMaker • Generate N-insn max-power stressmark: 72W • Generate N-insn min-power stressmark: 16W • Concatenate both • Cyclic behavior with period 2N

Thermal stressmarks • Thermal hotspots • Max component power • Thermal differentials • Thermal sensor placement [Lee et al., ICCD’05] • Examples • L2 vs. I-fetch: 44.6ºC difference • No stress on L2, high ILP, high branch predictability • L2 vs. register remap: 48.4ºC difference • Lots of L2 accesses: stress L2 and minimal stress on register remap

Why automate the process? 2-wide OOO max-power stressmark 100 4-wide OOO max-power stressmark 80 8-wide OOO max-power stressmark 60 Power (Watts) 40 20 0 2-wide OOO 4-wide OOO 8-wide OOO stressmark is processor-specific

Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation using case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work

Related work • VLSI test vectors • at circuit level, not at (micro)architectural level • Hand-crafted stressmarks • Current practice • Max-power, dI/dt, thermal hotspots, temp differentials • Performance model validation • Microbenchmarks • Benchmark synthesis • Statistical simulation

Conclusion: two contributions • BenchMaker • Abstract workload model • Generates proxies for real-life benchmarks • High accuracy • StressMaker • Automated stressmark generation • Case studies: max-power, max single-cycle power, dI/dt, thermal hotspots, etc.

Future work • Compare StressMaker against hand-crafted stressmarks • Fine-tune abstract workload model • Bit toggling data values and instruction opcodes • Interactions between threads and programs • Multi-threaded and multi-core processors

Thank you. Questions? Automated Microprocessor Stressmark Generation Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium

Automated Microprocessor Stressmark Generation

Automated Microprocessor Stressmark Generation

Presentation Transcript

AUTOMATED TEST GENERATION Muhammed İ. KALKAN

Metadata: Automated generation

Automated Scoring for Next Generation Assessments

Automated Test-Input Generation

Automated Puzzle Generation

Automated Signature and Policy Generation

Automated Grid Generation for WAVEWATCH III

Automated Fugue Generation

MICROPROCESSOR

Automated Parser Generation (via CUP )

AUTOMATED NARRATIVE GENERATION

Microprocessor

Automated Proof Generation for EG

Automated Test Generation

Automated Generation of Context-Aware Tests

Automated Test Data Generation

Automated map generation and delivery

Microprocessor

Automated Map Generation in Indianapolis

Microprocessor

Automated Content Generation

Automated Patch Generation