1 / 12

Caching in multiprocessor systems

Caching in multiprocessor systems. Tiina Niklander In AMICT 2009, Petrozavodsk 19.5.2009. Background. More transistors on one chip Multiple cores Larger cache Multiple on chip caches More functionality (more functional units, dedicated multimedia / deciphering cell, integrated GPU)

art
Télécharger la présentation

Caching in multiprocessor systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caching in multiprocessor systems Tiina Niklander In AMICT 2009, Petrozavodsk 19.5.2009

  2. Background • More transistors on one chip • Multiple cores • Larger cache • Multiple on chip caches • More functionality (more functional units, dedicated multimedia / deciphering cell, integrated GPU) • Multiple cores introduce • Cache organization • Private vs shared caches • Cache coherence

  3. Cache organization • Common organization: • L1 is private • Last-level cache is shared • With three levels: • L1 private • L2 ? Private or shared • L3 Shared

  4. Private vs Shared cache • Fully private, fully shared, partially shared Private L2 (pair of processors share) Shared L2 (all can access all L2) F. Sibai: On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures. Microprocessors and Microsystems 32 ( 2008), pp. 405-412

  5. Shared cache • Simple coherence issue (just one copy) • Different latencies (CPU - cache location) • Cache access competition (wait for other core) M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip Multiprocessors. In SC2008. IEEE, 2008, pp.

  6. Private cache • No access competition, smaller latencies, • But coherence becomes an issue! • Same date in multiple caches -> invalidate on write • Cache partitioning • Design time: Fixed partitioning • Run time: • Fixed partitioning (configuration issue) • Dynamic (based on current need)

  7. Cache coherence • Protocols: MESI, MSI, MOSI, MOESI • Invalidation message: RFO (Read for ownership) • Each cache snoops the bus to monitor memory ops M – modified (O- Owned) E – Exlusive S – Shared I – Invalid N – not allowed state Y – allowed state wikipedia

  8. (Distributed) cooperative caches • Add a directory structure • Knows the data locations in local caches • Cache-to-cache copying • When in another cache (directory locates) • On eviction (store temporarily on another cache) E, Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In PACT’08. ACM 2008, pp. 134-142

  9. New improvement ideas for cache performance 1/2 •  Split the cache for different tasks • Dynamically allocate cache areas • Software controlled eviction • GOAL: thread moves unneeded, but strongly-shared data to shared cache to improve performance of other threads • New instruction evict tells the processor to move some data from private L1 or L2 to shared L3

  10. New improvement ideas for cache performance 2/2 • Helper threads • GOAL: additional thread executes parts of the code ahead of the actual thread to ‘prefetch’ data to cache • Generate memory traces for the programmer • Tuning the software performance

  11. Conclusion • Focus on fine-tuning the cache performance • Cache coherence itself is solved earlier • Not always used (if allowed non-coherent usage) • L2 and L3 caches • Shared or private • Cache partitioning • Support for software-based improvements • Eviction hints • Traces • Prefetching (like helper thread)

  12. References • S. Fide, S. Jenks: Proactive use of shared L3 caches to enhance cache communic-ations in multi-core processors. IEEE Comp. Arch. L. vol 7 (2008), pp 57-60 • E. Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In Conf. on Parallel architectures and compilation techniques, PACT’08. ACM 2008, pp. 134-142 • M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip. Multiprocessors. In Proc. of the 2008 ACM/IEEE Conf. on Supercomputing. IEEE, 2008, pp. 1-12 • L. Peng, et.al.: Memory hierarchy performance measurement of commercial dual-core desktop processors. Journal of Systems Architecture 54(2008), pp. 816-828. • F. Sibai: On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures. Microprocessors and Microsystems 32 ( 2008), pp. 405-412 • J. Zhang, X. Fan, S.H. Liu: A Pollution Alleviate L2 Cache Replacement Policy for Chip Multiprocessor Architecture. In Int. Conf. on Networking, Architecture and Storage, IEEE, 2008, pp. 310-316

More Related