Performance analysis of In-Situ Communication, Co-operative caching and Capacity stealing

Performance analysis of In-Situ Communication, Co-operative caching and Capacity stealing Siddhesh Mhambrey

Motivation • Optimizing Replication, Communication and Capacity Allocation for CMPs by Zeshan Chisti, Micheal Powell and T.N. Vijaykumar • Change in Latency-Capacity Tradeoff in CMPs • Faster on-chip communication • Limited Capacity • Ways to exploit the change • In-Situ Communication-Cores Communicate to reduce coherence misses • Controlled Replication-Instead of copying data, core just points to it • Capacity Stealing – Core Makes use of neighboring cache’s empty space

In-Situ Communication • Key Idea: Avoid invalidations of shared data by introducing an additional state • Approach • Modify the MESI protocol to avoid going to the shared state on a read miss in M • All sharers also enter C state • Avoid invalidation and coherence miss • If dirty data present go directly to the C state • Advantage: Reduction in number of invalidations and write-backs • Reduction in number of misses • Disadvantage: Extra memory for new state and increased complexity of coherence scheme Modified MESI State Diagram

Co-operative Caching • Key Idea: Access data from a neighboring core instead of going off-chip • Approach: On a read/write miss • Check if a duplicate copy exists on chip • If it does, incur a local miss penalty • Else, an off-chip miss penalty • Advantage: Reduction in Latency • Disadvantage: Increase in complexity on-chip and scalability

Capacity stealing • Key Idea: Use of capacity in a neighboring core’s cache to store unused data • Approach • Change in the Replacement algorithm • Check if empty space exists in neighboring core • If it does, store in that space; else store off-chip • Advantage: Reduction in number of off chip accesses • Drawback: For multi-threaded workloads, it is difficult to find space in a neighboring cache

Simulation Methodology • Cache Simulator in Java from CSE 420 designed by Michael Jonas • 4-core CMP • 1K, 2K and 4K caches with direct,2-way or 4 way set associativity and Write-Back policy • Protocols • Modified MESI • Modified MESI with Co-operative Caching • Currently using LRU for replacement • 3 sample access traces • Sample 1: Limited range of addresses • Sample 2: Wide range of addresses • Sample 3: Arbitrary sample pattern from CSE420

Current Results Miss rate Invalidations Modified MESI with CC – 40 % better Modified MESI – 30% better 15% reduction in number of invalidations Effect of Size Effect of Associativity Miss rate Miss rate

Conclusion • Modified MESI with Co-operative caching gives almost 40% performance benefit as compared to Standard MESI protocol • Co-operative caching reduces the number of off-chip misses but does not affect invalidations • Size of the cache has a significant effect on Modified MESI without Co-operative Caching • On-chip power consumption will increase due to Modified MESI because of the increase in complexity

Thank you Questions?

Performance analysis of In-Situ Communication, Co-operative caching and Capacity stealing