230 likes | 348 Vues
This research explores the effects of Wrong-Path (WP) memory references in multiple Chip Multi-Processor (CMP) systems, building upon previous findings in uniprocessor systems. We investigate the positive impacts of prefetching, as well as negative consequences like increased cache pollution and extra coherence traffic. Using a simulation methodology with a detailed shared-memory model, we evaluate how WP memory references influence performance across SMPs and multi-CMPs. Our findings reveal significant increases in memory references and coherence traffic, highlighting the importance of accurately modeling WP effects in these systems.
E N D
Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University of Rhode Island Joshua J. Yi, Freescale Semiconductor, Inc.
Motivation • Previous work on Wrong-path (WP) effects in Uniprocessors • Positive Effects: Prefetching • Up to 20% better performance for 181.mcf (SPECint 2000) • Negative Effects: Pollution • L1 and L2 cache pollution • Extra traffic • Important to simulate WP, especially for some applications • How about WP effects in Multiple-CMP systems?
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Wrong-path effects in SMPs – 0 / 4 • Broadcast (snoop)- and directory-based SMP systems • MSI, MOSI, MESI, MOESI cache coherence protocols • Same issues in uniprocessors apply • Pollution effect • Prefetching effect • Extra cache/memory traffic • In contrast to uniprocessor effects, WP cause: • Extra coherence traffic: • data, invalidations, write-backs, acknowledgements • Additional cache block state transitions
A speculatively replaces B Initial States A is a Wrong-path Block ! Wrong-path effects in SMPs – 1 / 4 • Replacements
Write-back dirty copy of B M -> S Write-back dirty copy of A Only for MESI (or MSI) Wrong-path effects in SMPs – 2 / 4 • Write-backs
P1 loses its write privileges for block A P1 asks for grant to write and sends invalidation Wrong-path effects in SMPs – 3 / 4 • Invalidations
Wrong-path effects in SMPs – 4 / 4 • Data/Bus and Coherence Traffic Increases • L1 references, • L2 references, • coherence traffic • snoop, directory requests for data and invalidations • Power Consumption Increases • Due to extra cache references, coherence traffic and cache block state transitions • Resource Contention • Competing with correct-path resources • In contrast to uniprocessors, the increase in the frequency of full service buffers • critical when many cache-to-cache transfers
WP effects in Multiple-CMPs – 0 / 2 • CMP node and a 4 CMP system • We studied inclusive L1 and L2 cache • L2 cache also tracks the coherence of cache blocks in L1
WP effects in Multiple-CMPs – 1 / 2 OIV SO S OIN I I State Transitions when replacement of an SO line in L2 cache
WP effects in Multiple-CMPs – 1 / 2 MO MT M S SO • State Transitions when an MT line in L2 cache receives a WP request
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Experimental Methodology • GEMS simulator – Wisconsin Multifacet Group • Based on Virtutech SIMICS • Aggressive out-of-order superscalar processor • Detailed Shared-Memory Model • We evaluate 16-processor (4 and 8-CMPs) SPARC V9 system running unmodified Solaris 9 • Evaluated 2-level MOSI directory coherence protocol • MOSI: Modified, Owned, Shared, Invalid • We track the speculatively generated memory references • and mark them as being on the wrong-path when the branch misprediction is known
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Evaluation Results 1 / 5 -- L1 and L2 Cache Traffic 4 CMPs 8 CMPs • Total memory references increase by 16% and 14% for 4- and 8-CMPs, respectively. • L2 cache references increase by 35% and 36%, respectively. • For em3d, the increase in the number of L1 misses increase as much as 70%.
Evaluation Results 2 / 5 -- Coherence Traffic 4 CMPs 8 CMPs • Internal -- 36% External -- 30%
Evaluation Results 3 / 5 -- L1 and L2 cache replacements • L1 -- 30%, L2 -- 17% • Potential Cache Performance Impact
Evaluation Results 4 / 5 -- Write Misses 4 CMPs 8 CMPs On average 7% On average 4%
Evaluation Results 5 / 5 -- Cache Line State Transitions 4 CMPs 8 CMPs • Internal: 2% to 13% • External: 1% to 9% • Internal: 2% to 17% • External: 1% to 10%
Outlines • Wrong Path Effects in SMPs and multi-CMPs • Simulation Methodology • Evaluation Results • Conclusion
Conclusion • It is important to model WP memory references in cache-coherent multi-CMP systems • For multi-CMPs, not only do the WP affect the performance of individual processors due to prefetching and pollution, they also affect the performance of the entire system by increasing • cache coherence transactions • cache block state transitions • write-backs • invalidations • resource contention • For a workload with many cache-to-cache transfers, WP can significantly affect coherence actions.
The End Thank You !