1 / 23

ECE 510 Brendan Crowley

ECE 510 Brendan Crowley. Paper Review October 31, 2006. “Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures”. Rakesh Kumar, Keith Farkas, Norman P. Jouppi, Partha Ranganathan, Dean M. Tullsen. Presentation Overview. Introduction The Architecture

sadie
Télécharger la présentation

ECE 510 Brendan Crowley

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 510Brendan Crowley Paper Review October 31, 2006

  2. “Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures” Rakesh Kumar, Keith Farkas, Norman P. Jouppi, Partha Ranganathan, Dean M. Tullsen

  3. Presentation Overview • Introduction • The Architecture • Modeling the Architecture • Results • Critical Analysis / Conclusion

  4. Introduction • Background • Processors continue to have increased speed and transistor count as transistor sizes decrease • This leads to increased power consumption which causes problems • Heat dissipation • Chip failure • Battery life • Designers are always searching for new ways to decrease power consumption

  5. Introduction (2) • Most work on reducing power consumption falls under one of two categories: • Voltage and frequency scaling • “Gating” – the ability to turn on/off portions of the core • Some designs have included the use of multiple identical (homogeneous) cores • Others have included processors with co-processors that run a different instruction set

  6. Introduction (3) • The Main Idea • Different software applications have different resource requirements • This fact leads the authors to believe that core diversity is of greater value than uniformity • Therefore, proposed design is a single-ISA heterogeneous multi-core architecture • Each core runs the same instruction set, but has different abilities and performance characteristics

  7. The Architecture • One method is to take a family of previously designed cores, modify their interfaces, and combine them on one die • Each core executes same instruction set, but contains different resources, and therefore achieves different performance and energy efficiency on the same application

  8. The Architecture (2) • The operating system determines the application’s requirements and decides which core is best to use (which core will be the most energy efficient) • To accommodate a wide variety of applications, the cores should have a wide range of performances

  9. The Architecture (3) • Authors chose a 5-core design, using existing cores with a few changes: • Hypothetical single-threaded version of the EV8 (Alpha 21464), which they call the “EV8-” • MIPS R4700 • EV4 (Alpha 21064) • EV5 (Alpha 21164) • EV6 (Alpha 21264)

  10. The Architecture (4) • Assumptions • Each core has a private L1 data and instruction cache • All cores share an L2 cache, phase-locked-loop circuitry and pins • Implemented in 0.10 micron technology • One application running at a time (one thread running)

  11. The Architecture (5) • Relative core sizes

  12. The Architecture (6) • Different parts of a program may require different resources • To take full advantage of the core diversity it is necessary to switch between cores in the middle of program execution • This is done at operating system timeslice intervals, with user-state already saved to memory • If the OS decides to switch cores, the data is saved to the shared L2 cache, where the next core can retrieve it

  13. The Architecture (7) • The authors assume the unused cores are powered down to avoid static leakage and dynamic switching power • This means time must be spent powering up the cores • Experimental results show that this doesn’t affect performance when core-switching is done at OS timer intervals, even with pessimistic assumptions about power-up time and software overhead

  14. Modeling the Architecture • Data on the EV8 was based on some predictions and reported data • Data on the other cores was from published literature • Assume all of the alpha cores run at 2.1GHz (since they assume 0.10 micron process), and the R4700 runs at 1GHz

  15. Modeling the Architecture (2) • All architectures were modeled as accurately as possible on a highly detailed instruction-level simulator, using the configurations in the table below

  16. Modeling the Architecture (3) • The table below shows the area and peak power statistics of the cores • Areas were found from die photos • Total Die area is approximately 400mm2

  17. Modeling the Architecture (4) • Benchmark execution simulated using SMTSIM • Simulator was modified to simulate a multi-core processor with a shared L2 cache • Assume a single thread running on one core at a time • Switching cores requires the active core’s pipeline to be flushed and writing back the L1 cache lines to the L2 cache

  18. Results • The following figure shows results for the SPEC application applu • The Y-axis, IPS2/W, is basically the inverse of power-delay product • Constraint: • Never choose a core that sacrifices more than 50% performance relative to EV8- over an interval

  19. Results (2)

  20. Results (3) • Compared to a single-core architecture, this design could ideally reduce the PDP by 74% • Combination of 25% performance loss and 81% energy savings • Could change the constraint to achieve greater PDP savings (sacrificing performance, of course) • Another design point gives 36% energy savings with 4% performance loss

  21. Results (4) • Could optimize other metrics besides PDP, depending on the design goals • Different power and performance tradeoffs can be made simply by changing the core switching algorithm (no need to change the hardware)

  22. Critical Analysis / Conclusion • There are a lot of assumptions made about things like frequency scaling, power consumption of cores, etc. • This paper only reports results for one benchmark application • Multiple cores/threads running at the same time would likely be used in practice • How would this affect the core switching complexity and latency

  23. Critical Analysis / Conclusion (2) • This technique seems like a very good one • Homogeneous multi-core chips are already on the market • Potential for significant energy savings

More Related