1 / 20

Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors

Evan Speight, Hazim Shafi, Lixin Zhang, and Ram Rajamony ISCA 2005. Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors. Billions of transistors per chip Massive multicore chips in future (32 cores) ‏ Private L1 and L2 caches for each core, forming a tile

nero-cross
Télécharger la présentation

Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evan Speight, Hazim Shafi, Lixin Zhang, and Ram Rajamony ISCA 2005 Adaptive Mechanisms and Policies for ManagingCache Hierarchies in Chip Multiprocessors

  2. Billions of transistors per chip Massive multicore chips in future (32 cores)‏ Private L1 and L2 caches for each core, forming a tile Severe limitations in performance due to power budgets Caches occupy large area on chip, hence good performance management of cache hierarchy highly desirable Current trend of CMP

  3. CMP cache solutions proposed This paper proposes following solutions • Use of L3 cache as victim cache for evicted lines from L2 • Avoid writebacks of clean lines • Use peer L2 cache for victimised lines while writing back • Maintain reuse history for replaced lines for selective snarfing These techniques provided an average 13% increase in performance of commercial workloads as shown later

  4. Baseline Architecture used for the study

  5. Issues with blind L3 writeback • Dirty lines have to be written back • Writing back clean lines will reduce subsequent accesses to the line • Write back is however unnecessary if the line resides in another L2 or the L3 cache • Such excessive writeback puts pressure on on-chip and off-ship bandwidth • Hence writebacks need to be regulated

  6. Selective Writeback • Use of history table to hint at presence of line in L3 • The table is associated with each L2 cache • The table is updated/accessed on each writeback of a clean line. • The table size is much smaller than cache size • LRU method used to decide which lines history is maintained

  7. Selective Writeback Mechanism Line being written back, not present in L3 Write back to L3, update WBHT If line has entry in WBHT, writeback is squashed On subsequent replacement of the line, WBHT is checked Note that accuracy of WBHT only affects performance , not correctness

  8. Potential Issues • L3 will replace lines due to capacity misses • If such a replaced line has WBHT entry, L2 will not write back to L3 • On subsequent access to the lines, it will have to be fetched from memory • Due to size limit of WBHT, an entry may be removed even though line is present in L3 • Write back queue occupied while WBHT is accessed

  9. What if peer cache not used for writeback? • More writeback penalty as L3 is off chip, hence more access time • On subsequent accesses, L3 latency comes into picture • Power consumption is more for off-chip accesses, hence places more constraints on overall design Thus use of peer L2 caches is desirable for writing back

  10. Factors to account for using peer caches • Minimise negative interference at recipient peer L2 cache • Make sure that useful lines only are retained on chip in peer L2 caches • Modifications needed in cache coherence protocols • Keep cache controller hardware less complex

  11. Mechanism for peer caches use • Identify lines to be evicted in peer caches. Invalid lines preferred!! • If not, choose shared lines for replacement • Use a table to indicate and select which lines are likely to be reused • If peer caches have the line in clean state, squash the writeback as a snoop response

  12. Mechanism to guess line reuse On a writeback, allocate entry in reuse table On a subsequent miss of the line, set the “use” bit if it has entry in the table If use bit is set, initialize a snarf by peer L2 caches On subsequent writeback of the line consult the reuse table

  13. Simulation Environment • IBM's Mambo cache hierarchy simulator used • Coherence protocol used similar to IBM Power 4 • Simulate varying outstanding load/write misses that can occur simultaneously • Applications simulated: • Transaction Processing (TP)‏ • Commercial Processing Workload (CPW2)‏ • NotesBench • Trade2

  14. System Parameters

  15. Effects of Write Back History Table

  16. Runtime Improvements Runtime Improvement Over Base- line of Write Back History Table Runtime Improvement of Updating. All WBHTs Using L3 Snoop Response

  17. Effect of Varying WBHT Size Figure 4. Normalized Runtime of Varying L2 WBHT Sizes Normalized to 512-Entry WBHT System

  18. Effect of L2 Snarfing Runtime Improvement Over Baseline of Allowing L2 Snarfing

  19. Improvements by L2 snarfing and combined mechanisms Runtime of Varying L2 Snarf Table Sizes Normalized to 512-Entry Snarf Table System Runtime Improvement Over Baseline of Combined Tables

  20. Conclusion • These simple adaptive mechanisms have a positive effect on performance • The effect of combining both techniques is not additive • Even small history tables can remove more than half the unnecessary writebacks • L2 snarfing resulted in less off chip accesses

More Related