1 / 31

Automated Microprocessor Stressmark Generation

Automated Microprocessor Stressmark Generation. Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium HPCA 2008, Feb 19, Salt Lake City, UT. Energy, power, power density, temperature, voltage variation, ….

alize
Télécharger la présentation

Automated Microprocessor Stressmark Generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Microprocessor Stressmark Generation Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium HPCA 2008, Feb 19, Salt Lake City, UT

  2. Energy, power, power density, temperature, voltage variation, … • First-class design constraints • Embedded processors • High-performance processors • Understanding and analysis of primary importance • Average: typical • Maximum: worst-case

  3. Why care about worst-case? • Processor must operate properly under extreme conditions • Examples • Max power  power supply, DPM • Max temperature  thermal package, DTM • Max dI/dt  power delivery • Localized max power  hot spots  circuit failure, timing errors, etc. • Max temperature differentials  sensor placement

  4. How to characterize worst-case? • Stressmarks • Hand-coded synthetic stress codes • Examples • Max power: Alpha’s Toast • Max dI/dt: Alpha’s Thumper • Limitations • Time-consuming to develop • Requires intimate understanding of system • Tied to a specific processor • Difficult to do in early design stages

  5. A possible solution • Automatic stressmark generation • In two steps • BenchMaker • Generate synthetic benchmark from abstract workload model • StressMaker • Explore workload space by ‘turning knobs’ using BenchMaker and search for stressmarks

  6. Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation through case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work

  7. BenchMaker hardware abstract workload model instruction mix ILP synthetic benchmark I & D footprint benchmark synthesizer D stream strides branch transition simulator BB size

  8. Instruction mix abstract workload model Fraction short int Fraction long int Fraction short fp Fraction long fp Fraction int loads Fraction int stores Fraction fp load Fraction fp stores instruction mix ILP I & D footprint D stream strides branch transition BB size

  9. ILP abstract workload model Probability for inter-operation dependency distance = 1 = 2 = 3, 4 = 5, 6 = 7, 8 = 9, … , 16 = 17, … , 32 > 32 instruction mix ILP I & D footprint D stream strides branch transition BB size

  10. I & D stream behavior abstract workload model No. unique I & D addresses Fraction memory operations with a local stride (at 32-byte block level) of 0, 1, 2, …, 8, or greater than 8 instruction mix ILP I & D footprint D stream strides branch transition BB size

  11. Branch behavior abstract workload model Probability for a transition rate of 0%-10%, 10%-20%, etc. Avg and stdev of the basic block size distribution instruction mix ILP I & D footprint D stream strides branch transition BB size

  12. Abstract workload model abstract workload model • Only 40 characteristics • Explicit goal • In contrast to prior work • Microarchitecture-independent instruction mix ILP I & D footprint D stream strides branch transition BB size

  13. Synthetic benchmark generator • Program spine • Instruction types • Inter-operation dependencies • Stride assignment • Branch transition • Register assignment • Code generation add sub br add ld mul br add ld sub ld st br

  14. Synthetic benchmark generator • Input: abstract workload model • Output: synthetic benchmark • C program with embedded assembly code • Benefit: synthetic benchmark converges after 10 million dynamic instructions

  15. Experimental setup • sim-alpha validated Alpha 21264 simulator • Wattch for power modeling • HotSpot for thermal modeling • SPEC CPU2000 • 100M simulation points • Commercial workloads • SPECjbb2005, DBT2, DBMS

  16. Synthetic clone benchmark preserves characteristics Original benchmark Synthetic clone benchmark 2.0 1.5 IPC 1.0 0.5 0.0 vpr gcc mcf gzip dbt2 twolf bzip2 crafty dbms vortex perlbmk jbb2005 Original benchmark Synthetic clone benchmark 35 30 25 20 EPI 15 10 5 0 vpr gcc mcf gzip dbt2 twolf bzip2 dbms crafty vortex perlbmk jbb2005

  17. Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation using case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work

  18. StressMaker BenchMaker synthetic benchmark abstract workload configuration microprocessor model abstract workload space exploration stressmark objective function: e.g., max power

  19. Workload space exploration • Huge space • Heuristic search using genetic algorithm • Bio-inspired algorithm • Reduces likelihood for local optima • Iterative algorithm • Start from randomly generated solutions • Probabilistically retain solutions with highest objective function value • Generate new solutions using crossover & mutation • End result: stressmark

  20. Max-power stressmark StressMaker SPEC CPU / commercial art 30 25 mesa SPECjbb2005 20 perlbmk gzip Power (Watts) 15 perlbmk perlbmk mesa gzip dbt2 gzip 10 eon mcf art 5 0 lsq alu fetch clock icache issue bpred regfile dcache window rename dispatch dcache2 resultbus • 8-wide OOO processor; 81.5Watts in total • assuming Wattch (0.18um, 1.2GHz, aggressive clock gating)

  21. Max-power stressmark chars • Keep functional units busy • Uniform mix of instruction types • Keep issue logic busy • High ILP • No pipeline flushes • High branch predictability • Keep caches busy • Good locality  similar to hand-crafted stressmarks [Gowan et al., DAC’98] [Vishwanath, Intel Tech Journal, 2000]

  22. Evaluation of genetic algorithm • Speed • Three orders of magnitude faster than exhaustive search • Effectiveness • Max-power stressmark through StressMaker achieves 99% of max-power stressmark through exhaustive search: 48Watts for 4-wide OOO processor

  23. Max single-cycle power • Estimate max instantaneous (single-cycle) current drawn from the power supply • StressMaker’s stressmark: 72W • Its average power consumption: 32W • [4-wide OOO processor] • Maximum power assuming all units are 100% active: 85W • StressMaker gets 85% of theoretical maximum

  24. dI/dt stressmark • Current swings cause ripples in supply voltage • dI/dt stressmark alternates between high and low power consumption [Joseph et al., HPCA’03] [Alpha’s Thumper] • StressMaker • Generate N-insn max-power stressmark: 72W • Generate N-insn min-power stressmark: 16W • Concatenate both • Cyclic behavior with period 2N

  25. Thermal stressmarks • Thermal hotspots • Max component power • Thermal differentials • Thermal sensor placement [Lee et al., ICCD’05] • Examples • L2 vs. I-fetch: 44.6ºC difference • No stress on L2, high ILP, high branch predictability • L2 vs. register remap: 48.4ºC difference • Lots of L2 accesses: stress L2 and minimal stress on register remap

  26. Why automate the process? 2-wide OOO max-power stressmark 100 4-wide OOO max-power stressmark 80 8-wide OOO max-power stressmark 60 Power (Watts) 40 20 0 2-wide OOO 4-wide OOO 8-wide OOO stressmark is processor-specific

  27. Outline • BenchMaker • Description • Evaluation • StressMaker • Description • Evaluation using case studies • Max-power, max single-cycle power, dI/dt • Related work • Conclusion and future work

  28. Related work • VLSI test vectors • at circuit level, not at (micro)architectural level • Hand-crafted stressmarks • Current practice • Max-power, dI/dt, thermal hotspots, temp differentials • Performance model validation • Microbenchmarks • Benchmark synthesis • Statistical simulation

  29. Conclusion: two contributions • BenchMaker • Abstract workload model • Generates proxies for real-life benchmarks • High accuracy • StressMaker • Automated stressmark generation • Case studies: max-power, max single-cycle power, dI/dt, thermal hotspots, etc.

  30. Future work • Compare StressMaker against hand-crafted stressmarks • Fine-tune abstract workload model • Bit toggling data values and instruction opcodes • Interactions between threads and programs • Multi-threaded and multi-core processors

  31. Thank you. Questions? Automated Microprocessor Stressmark Generation Ajay M. Joshi* Lieven Eeckhout** Lizy K. John* Ciji Isen* *The University of Texas at Austin **Ghent University, Belgium

More Related