1 / 41

Power Management for Chip-level Multiprocessing Processors

Power Management for Chip-level Multiprocessing Processors. Kai Ma. Background. To get better performance 1. Scale frequency (fast) 2. On-chip resource replication (parallel) Chip-MultiProcessing vs Simultaneous MultiThreading. SMT vs CMP. Other justification for CMP.

Télécharger la présentation

Power Management for Chip-level Multiprocessing Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Power Management for Chip-level Multiprocessing Processors Kai Ma

  2. Background • To get better performance 1. Scale frequency (fast) 2. On-chip resource replication (parallel) Chip-MultiProcessing vs Simultaneous MultiThreading

  3. SMT vs CMP

  4. Other justification for CMP • Memory wall, ILP wall, Power wall • Higher cache coherency circuitry rate • Signal integrity • Future: Many cores (many specialized cores )

  5. Power management for CMP • Reduce operating costs for energy and cooling • Prolong battery life for portable and embedded systems • Reduce cooling requirement • Meet scalable performance target • Heat dissipation and hotspot

  6. Outline 1. An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget Canturk Isci*, Alper Buyuktosunoglu*, Chen-Yong Cher*, Pradip Bose* and Margaret Martonosi *IBM T.J. Watson Research Center Department of Electrical Engineering Yorktown Heights Princeton University 2. Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors Radu Teodorescu and Josep Torrellas Department of Computer Science University of Illinois at Urbana-Champaign

  7. Outline • An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget 1. Contribution 2. Global Power Management 3. Global Power Management Policies: core modes, power and performance matrix 4. Experimental Result and Evaluation 5. Conclusion 6. Critique

  8. Contribution • Introduce a global power management • Develop a static power management analysis tool • Evaluate different policies for CMP power management

  9. Global Power Management • Monitor the power and set working mode of each core

  10. Global Power Management Policies • Priority: Slow down the core runs low priority task • PullhiPushLo: Speedup the low power core and slow down the high power core. • MaxBIPS: Predict and choose power mode combination

  11. Core Power Modes • Underlying mechanism: DVFS • Overhead: Order of microseconds • Performance Degradation: Elapsed execution time for benchmark

  12. Power and BIPS Matrices

  13. Experimental Methodology • SPEC CPU2000 benchmark • A trace-based CMP analysis tool is incorporated with IBM’s Turandot simulator • Mode switch (500ns) and Statistics collection (50ns) • During mode switch, no instruction execution, power is consumed

  14. Static vs Dynamic

  15. Policy and Budget Curve

  16. Power Saving

  17. Power Management Result

  18. Trends under CMP Scaling • The difference between MaxBIPS and oracle decreases with core number increasing • Increasing core numbers has smaller impact on MaxBIPS • CMP scales favor static per-core management over chip-wide DVFS

  19. Conclusion • Global management is preferred • Dynamic management is preferred • MaxBIPS is efficient

  20. Critique • MaxBIPS: Prediction is superlinearly dependent on the number of modes and core • Power performance estimation matrix: transition penalty • Not consider temperature

  21. Outline Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors 1. Background 2. Contribution 3. Algorithm 4. System Implementation 5. Evaluation 6. Conclusion 7. Critique

  22. Background For CMP, with-in die process variation impacts: • Static power consumption • Maximum frequency

  23. Contribution • Propose variation-aware algorithms for application scheduling • Complement these algorithms with variation-aware DVFS

  24. CMP Configuration • High level frequency and DVFS policy

  25. Algorithms

  26. Linear Programming • A technique for optimization of a linear objective function, subject to linear equality and linear inequality constraints • c and b are known vectors, A is a known matrix, x represents variables vector

  27. Power Mode Selection: LinOpt • TP : average throughput • N: core number • i : from 1 to N • a(i) : constant depends on the thread and core • v(i): core voltage • b(i) and c(i): constants introduced to approximate power-voltage relation • Object function: • Constraints:

  28. Power Mode Selection: SAnn • Use annealing algorithm to solve the power mode selection problem • SAnn searches all possible combination of core voltage • Compare to LinOpt: More accurate but more costly

  29. System Implementation • Algorithm runs on a core or a power management unit • At OS scheduling interval, OS assigns threads to cores by using VarF&AppIPC • Every 10ms, the LinOpt algorithm runs and sets the cores to correct power

  30. Profiling for Implementation

  31. Evaluation Methodology • Variation:Varius model • Power: SESC + Wattch+HotLeakage • Temperature: HotSpot • Critical Path Model: 1.Calculation path delay: Multiplier like unit 2.Memory: SRAM 3.Interconnection: Cacti 4.Gate delay: Alpha-power law

  32. Workload • SPEC • Run different applications on different cores • 12 billion instructions

  33. Metrics • Total power • Average frequency of active cores • Throughput • Energy delay-square product (consider Time-to-solution and energy consumption) • Weighted throughput: application’s IPC normalized to the application’s IPC at reference conditions

  34. Evaluation • Power and frequency variation on one die

  35. Uniform Frequency & No DVFS • As the thread number increases, there is no less used core for thread mapping

  36. NoUniform Frequency & No DVFS • Different cores run at different frequencies, by selecting less used core, they may end up with lower frequency ones.

  37. NoUniFreq+DVFS • Throughput: VarF&AppIPC+LinOpt is effective • Power: throughput gains are high when power targets are low

  38. LinOpt Granularity • Deviation between power consumed and power target decreases as interval between LinOpt run increases

  39. Conclusion • With-in die variation substantially impacts static power consumed and maximum frequency • Variation-aware algorithms are proposed and analyzed, LinOpt is efficient

  40. Critique • How to decouple thread mapping and power mode selection • Static power consumption and dynamic power consumption should be discussed separately • Thread mapping takes place once, thread migration should be considered

  41. Comparison

More Related