110 likes | 125 Vues
Explore alternative cores, parameter settings, and applications in the system-on-a-chip (SOC). Gate/RT-level simulation is slow, necessitating a system-level method for peripherals' power estimation. Follow a 3-step process of instruction-based system-level model creation, low-level per-instruction power evaluation, and back annotation of the system model. Analyze power modes and transition functions for accurate power consumption measurement. Results show the method's accuracy, speed, and importance of power modes selection. Future work includes trace-simulator and trace-analysis approaches for further speed enhancement.
E N D
Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores Joerg Henkel NEC C&C Research Princeton, New Jersey Tony Givargis, Frank Vahid* Dept. of Computer Science & Engineering University of California, Riverside *also with the Center for Embedded Computer Systems, UC Irvine This work was supported by the National Science Foundation under grant # CCR-9876006 , and by a Design Automation Conference graduate scholarship.
Core database Application1 Peripheral1 Peripheral2 Peripheral1 Peripheral2_a Peripheral2_b …. System-on-a-chip (SOC) • Want to explore alternative cores, parameter settings, and applications • Gate/RT level simulation too slow SOC Application2 Micro- processor Cache Memory Bridge
SOC: System-level model Application Cache Memory Micro- processor Cache Memory Bridge Bridge Peripheral Peripheral Peripheral Peripheral Peripheral Peripheral SOC: Gate-level model Application • Still need system-level method for peripherals • 3-step method Micro- processor Cache Memory Bridge Peripheral Peripheral Peripheral SOC System-level Power Estimation • Microprocessor • Tiwari/Malik/Wolfe 94 • Instruction set simulator • Marculescu/Pedram 96 • Instruction trace reduction Micro- processor • Plus cache, memory & bus • Simunic/Benini/DeMicheli 99 • Extended instruct. simulator • Givargis/Vahid/Henkel 99 • Trace reductions
Reset() … Enable_tx() … Enable_rx() … Send() … Rcceive() … UART UART Core Provider’s Step 1: Instruction-based System-Level Model Creation • System simulation model already commonly used, and required in VSIA standard • Executes ~1000x faster than gate-level model Core database UART JPEG decode ….
Energy 2 bytes 4 bytes 8 bytes 16 bytes Reset Reset 13 J 13 J 13 J 14 J 14 J Enable_tx Enable_tx 23 J 23 J 25 J 24 J 24 J Enable_rx Enable_rx 18 J 18 J 19 J 19 J 19 J Send Send 76 J 76 J 77 J 89 J 115 J Receive Receive 44 J 44 J 49 J 55 J 64 J Buffer size UART instruction UART instruction Instruction Core Provider’s Step 2: Low-level Per-instruction Power Evaluation • Measure power of gate/layout model, per instruction • Use unique testbench per instruction, may take hours/days • Low-level model differentiates cores from other SOC modules enabling accurate power estimation • Must account for core parameters
Energy Reset 13 J Enable_tx 23 J Enable_rx 18 J Send 76 J Receive 44 J Core Provider’s Step 3: Back Annotation of System Model Core database Reset() … uJtot += 13 Enable_tx() … uJtot += 23 Enable_rx() … uJtot += 18 Send() … uJtot += 76 Rcceive() … uJtot += 44 UART UART UART JPEG decode ….
2 bytes 4 bytes 8 bytes 16 bytes Mode 1: Idle Reset 11 J 13 J 14 J 14 J Enable_tx 27 J 32 J 31 J 31 J Enable_rx 17 J 18 J 19 J 18 J Send 17 J 19 J 19 J 20 J Receive 14 J 15 J 17 J 18 J Enable_tx or Enable_rx Mode 2 : Enabled Mode1: Idle Mode2: Enabled Reset 13 J 13 J 14 J 14 J Enable_tx 23 J 25 J 24 J 24 J Reset Enable_rx 18 J 19 J 19 J 19 J Send 76 J 77 J 89 J 115J Receive 44 J 49 J 55 J 64 J Core “Power Modes” Requires Extra Effort by Core Provider • Unlike microprocessor, certain peripheral core instructions can greatly modify power consumption of other instructions • Must create power mode transition function, and measure power per instruction per mode.
+ Total energy User Performs System Simulation, Which Yields Power Data • Simulation takes only seconds or minutes SOC Application Micro- processor Cache Memory Core database Bridge Peripheral Peripheral UART UART UART JPEG decode ….
14% 1793 1% Gate-level: 40,980 sec 1573 1550 “Databook” RT-level: 2,700 sec Instr.-based system-level: 14 sec 38% 717 5% 519 493 37% 2% 155 113 115 Results: Image-decode Accelerator • Examined 3 peripheral cores: UART, DMA, JPEG • Compared our instruction-based system-level method with: • Gate-level simulation: slow but accurate • “Databook” RT-level: cycle-accurate simulation, used databook average-power values 2000 1800 1600 1400 1200 1000 Energy (mJ) 800 600 400 200 0 UART DMA JPEG
Gate-level energy (mJ) System-level energy (mJ) Single-mode 113 86 23.0% Two-modes 104 8.6% Four-modes 115 1.7% Error Results: Importance of Power Modes • Proper power-mode selection is critical for peripheral cores • Too few modes or wrong modes can lead to much error UART example
Conclusions • Introduced instruction-based method is • Accurate (less than 5% error) • Fast (1000x speedup over gate-level) • Fits with current core-based methodology • Concept of power modes is necessary for accuracy • Future work includes: • Trace-simulator-based approach (10x speedup) • Trace-analysis-based approach (100x speedup)