1 / 16

An Adaptive Shared/Private NUCA Cache Partitioning Scheme for CMPs

An Adaptive Shared/Private NUCA Cache Partitioning Scheme for CMPs. Haakon Dybdahl, Per Stenström HPCA 2007. CMP Caching. Extremes Private and shared caches NUCA organizations Shared Caches Adaptive vs uncontrolled sharing Pollution issues. Chang and Sohi, ISCA ‘06.

seegerc
Télécharger la présentation

An Adaptive Shared/Private NUCA Cache Partitioning Scheme for CMPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for CMPs Haakon Dybdahl, Per Stenström HPCA 2007

  2. CMP Caching • Extremes • Private and shared caches • NUCA organizations • Shared Caches • Adaptive vs uncontrolled sharing • Pollution issues Chang and Sohi, ISCA ‘06 So, is custom cache partitioning better?

  3. Adaptive Partitioning • Dynamic sharing of last-level caches among cores • Private and shared cache partitions • Who needs more, gets more! • Overall goal – minimize total cache misses

  4. Issues to be considered? • How to estimate private/shared space for a core? • How to share the “shared space” among cores? • Replacement policy for shared spaces?

  5. Private/Shared Cache Partition Size • Private partition : Increase/decrease blocks per set, keep # of sets constant

  6. Private/Shared Cache Partition Size • Shared partition • estimate relative gain • Estimate misses that can be avoided by increasing one block per set • Estimate increased cache misses if decrease in one block per set.

  7. H/W Support • Core Id • Shadow Tags • Counters • Max # of blocks in a set • Cost: Shadow Tags Core ID Counters

  8. Relative comparisons • Avoiding cache misses? • Shadow tags : one per set per core • Hits in shadow tags. • Hits of LRU blocks. • Re-evaluation • Core_with_most_hits_to_shadow_tags(1) compared with core_with_lowest_hits_to_LRU_block (2) • Done every 2000 cycles • If 1>2 one cache block/set added to core 1

  9. Managing Partitions • Private partition • LRU replacement • Some key events: • Cache hit in private L3 • Block found, classified as MRU • Cache hit in neighboring L3 • All neighboring caches checked in parallel • Block brought to private L3, LRU replacement • Block evicted from private moved to shared $(???) • Cache miss • Block fetched from memory, placed in private $ • LRU block from private moved to shared

  10. Shared partition block replacement Algorithm

  11. Results • Single threaded workloads on each core • 4MB shared L3, 1 MB private L3 • Workload characterization • Last level cache sensitive/insensitive • Overall goal: Maximize HM of IPC’s of all 4 cores • Forms basis for comparison

  12. Speedups For last level-cache sensitive benchmarks

  13. Larger Caches 8 MB L3

  14. Technology scaling • Smaller techs, delay is more dominant

  15. Wrt Chang/Sohi, ISCA’06 • Uncontrolled vs adaptive partitioning

  16. Summary • Adaptive cache partitioning gives you: • Better performance • Less interference • Improved sharing • Can do more with less (cache)

More Related