1 / 59

Various techniques around snoop caches

Various techniques around snoop caches. AMANO, Hideharu, Keio University hunga@am . ics . keio . ac . jp Textbook pp.40-60. The role of a cache in bus connected multiprocessors. Improving the access latency of the shared memory Reducing the bus congestion Shared Cache

francis
Télécharger la présentation

Various techniques around snoop caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Varioustechniques around snoop caches AMANO, Hideharu, Keio University hunga@am.ics.keio.ac.jp Textbookpp.40-60

  2. The role of a cache in bus connected multiprocessors • Improving the access latency of the shared memory • Reducing the bus congestion • Shared Cache • A cache itself is shared. • Private Cache • A cache is provided for a PU.

  3. Snoop Cache Snoop Cache Snoop Cache PU PU PU Private(Snoop) Cache MainMemory Alargebandwidthsharedbus Snoop Cache PU Each PU provides its own private cache

  4. PU PU PU Cache consistency (coherent) problem MainMemory Alargebandwidthsharedbus A A’ A PU Data of each cache is not the same

  5. CacheConsistencyProtocol • Each cache keeps consistency by monitoring (snooping) bus transactions. WriteThrough:Every written data updates the shared memory. Frequent access of bus will degrade performance Basis(Synapse) Illinois Berkeley Invalidate WriteBack: Update (Broadcast) Firefly Dragon

  6. Read Read PU PU PU Write Through Cache(Invalidation:Data read out) I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus V V PU

  7. Write PU PU PU Write Through Cache(Invalidate:Data write into) I:Invalidate V:Valid MainMemory Alargebandwidthsharedbus Monitoring (Snooping) V I V PU

  8. Monitoring (Snooping) Write PU PU PU Write Through Cache(InvalidateDirectWrite) The target cache line is not existing in the cache I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus V I PU

  9. First, Fetch Monitoring (Snoop) Write PU PU PU Write Through Cache(Invalidate:FetchOnWrite) Cache line is not existing in the target cache I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus I V V PU Fetch and write

  10. Write PU PU PU Write Through Cache(Update) I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus Monitoring (Snoop) V V Data is updated PU

  11. Yes Data 010 010 Common Cache Structure(Direct Map) 0011010 0011 010 100 … … Main Memory = 0011 Cache Cache Directory (Tag Memory)

  12. The structure of Snoop cache Shared bus Directory can be accessed simultaneously from both sides. The bus transaction can be checked without caring the access from CPU. Directory CacheMemory Entity The same Directory (DualPort) Directory CPU

  13. The Problem of WriteThroughCache • In uniprocessors, the performance of the write through cache with well designed write buffers is comparable to that of write back cache. • However, in bus connected multiprocessors, the write through cache has a problem of bus congestion.

  14. Read Read PU PU PU Basic Protocol States attached to each line C:Clean (Consistent to shared memory) D: Dirty I:Invalidate MainMemory Alargebandwidthsharedbus C C PU

  15. Invalidation signal Write PU PU PU Basic Protocol(A PU writes the data) MainMemory Alargebandwidthsharedbus C I D C PU Invalidation signal: address only transaction

  16. Read PU PU PU Basic Protocol (A PU reads out) MainMemory Alargebandwidthsharedbus C I C D PU

  17. Snoop Cache Snoop Cache W PU PU PU Basic Protocol (A PU writes into again) MainMemory Alargebandwidthsharedbus D I I D PU

  18. CE CS Snoop Cache Snoop Cache CS PU PU PU States for each line Illinois’s Protocol CE:CleanExclusive CS:CleanSharable DE:DirtyExclusive I:Invalidate MainMemory Alargebandwidthsharedbus PU

  19. CE Snoop Cache Snoop Cache Snoop Cache W PU PU PU Illinois’s Protocol (The role of CE) CE:CleanExclusive CS:CleanSharable DE:DirtyExclusive I:Invalidate MainMemory Alargebandwidthsharedbus →DE PU

  20. Snoop Cache Snoop Cache R R PU PU PU Berkeley’s protocol MainMemory Alargebandwidthsharedbus US US PU Ownership→responsibility of write back OS:OwnedSharableOE:OwnedExclusive US:UnownedSharableI:Invalidated

  21. US Snoop Cache Snoop Cache W PU PU PU Berkeley’s protocol (A PU writes into) MainMemory Alargebandwidthsharedbus US →OE →I PU Invalidation is done like the basic protocol

  22. Snoop Cache I Snoop Cache R PU PU PU Berkeley’s protocol The line with US is not required to be written back MainMemory Alargebandwidthsharedbus OE →OS →US PU Inter-cache transfer occurs! In this case, the line with US is not consistent with the shared memory.

  23. CE CS Snoop Cache Snoop Cache PU PU PU Firefly protocol MainMemory Alargebandwidthsharedbus →CS PU CE:CleanExclusiveCS:CleanSharable DE:DirtyExclusive I: Invalidate is not used!

  24. CS Snoop Cache Snoop Cache W PU PU PU Firefly protocol (Writes into the CS line) MainMemory Alargebandwidthsharedbus CS PU All caches and shared memory are updated → Like update type WriteThrough Cache

  25. CE Snoop Cache Snoop Cache Snoop Cache W PU PU PU Firefly protocol (The role of CE) MainMemory Alargebandwidthsharedbus →DE PU Like Illinoi’s, writing CE does not require bus transactions

  26. Snoop Cache Snoop Cache R R PU PU PU Dragon protocol MainMemory Alargebandwidthsharedbus UE US →US PU Ownership→Resposibility of write back OS:OwnedSharableOE:OwnedExclusive US:UnownedSharableUE:UnownedExclusive

  27. I Snoop Cache Snoop Cache W PU PU PU Dragon protocol MainMemory Alargebandwidthsharedbus US →OS PU Only corresponding cache line is updated. The line with US is not required to be written back.

  28. Snoop Cache Snoop Cache R PU PU PU Dragon protocol MainMemory Alargebandwidthsharedbus OE →OS → US PU Direct inter-cache data transfer like Berkeley’s protocol

  29. Snoop Cache Snoop Cache Snoop Cache W PU PU PU Dragon protocol (The role of the UE) MainMemory Alargebandwidthsharedbus UE →OE PU No bus transaction is needed like CE is Illinois’

  30. MOESI Protocol class Valid Owned Exclusive S:Sharable I: Invalid O: Owned E: Exclusive M: Modified

  31. MOESI protocol class • Basic:MSI • Illinois:MESI • Berkeley:MOSI • Firefly:MES • Dragon:MOES Theoretically well defined model. Detail of cache is not characterized in the model.

  32. Invalidate vs.Update • The drawback of Invalidate protocol • Frequent data writing to shared data makes bus congestion → ping-pong effect • The drawback of Update protocol • Once a line shared, every writing data must use shared bus. • Improvement • CompetitiveSnooping • Variable Protocol Cache

  33. Invalidation C Snoop Cache Snoop Cache →D W PU PU PU Ping-pong effect(A PU writes into) MainMemory Alargebandwidthsharedbus C →I PU

  34. D Snoop Cache Snoop Cache →C R PU PU PU Ping-pong effect(The other reads out) MainMemory Alargebandwidthsharedbus I →C PU

  35. Invalidation C Snoop Cache Snoop Cache →D W PU PU PU Ping-pong effect(The other writes again) MainMemory Alargebandwidthsharedbus C →I PU

  36. I Snoop Cache Snoop Cache →C R PU PU PU Ping-pong effect(A PU reads again) MainMemory Alargebandwidthsharedbus D →C PU A cache line goes and returns iteratively →Ping-pong effect

  37. CS Snoop Cache Snoop Cache PU PU PU The drawback of update protocol(Firefly protocol) MainMemory Alargebandwidthsharedbus CS W PU B Once a line becomes CS, a line is sent even if B the line is not used any more. FalseSharing causes unnecessary bus transaction.

  38. CS Snoop Cache Snoop Cache PU PU PU CompetitiveSnooping MainMemory Alargebandwidthsharedbus CS →I W PU Update n times, and then invalidates The performance is degraded in some cases.

  39. Invalidation Snoop Cache C Snoop Cache →D →CE W PU PU PU WriteOnce(GoodmanProtocol) MainMemory Alargebandwidthsharedbus C →I PU Main memory is updated with invalidation. Only the first written data is transferred to the main memory.

  40. →I →I US Snoop Cache US W PU PU PU ReadBroadcast(Berkeley) MainMemory Alargebandwidthsharedbus US →OE PU Invalidation is the same as the basic protocol.

  41. I Snoop Cache I R PU PU PU ReadBroadcast MainMemory Alargebandwidthsharedbus OE →OS →US →US PU Read data is broadcast to other invalidated cache.

  42. I Snoop Cache I R PU PU PU Cache injection MainMemory Alargebandwidthsharedbus I →US →US →US PU The same line is injected.

  43. CS Snoop Cache Snoop Cache PU PU PU New Keio Protocol MainMemory Alargebandwidthsharedbus X CE →CS PU Minimize communication outside of the chip

  44. DE Snoop Cache Snoop Cache →DS R PU PU PU New Keio Protocol MainMemory Alargebandwidthsharedbus I →DS PU Avoiding write back as possible:DS(DirtyShared)

  45. Replace DEO Snoop Cache Snoop Cache Snoop Cache PU PU PU Zombie Cache I MainMemory Alargebandwidthsharedbus PU Chip

  46. Replace DEO Snoop Cache Snoop Cache Snoop Cache PU PU PU Zombie Cache II MainMemory Z-2Cache Alargebandwidthsharedbus PU Chip

  47. Cache systems for on-chip-multiprocessors • Shared caches/Shared registers are available. • High-bandwidth bus/switches are available. • Cache hierarchy is required. • Server/High Performance • Stanford Hydra project • IBM Power4 • SUN MJCA • Compaq Piranha • Embedded Chip-Multiprocessors

  48. 128b 128b 128b 128b 128b 128b 128b 128b 64b 64b 64b 64b 64b 64b 64b 64b Possible cache structure(StanfordHydraProject) 128b System Bus Evaluation of Design Alternative for a Multiprocessor Microprocessor ISCA96より Proc 16K L1 2way Proc 16K L1 2way Proc 16K L1 2way Proc 16K L1 2way 1X4Crossbar 512K L2 512K L2 512K L2 512K L2 Proc. Proc. Proc. Proc. 4x4 Crossbar 16K L1 2way 16K L1 2way 16K L1 2way 16K L1 2way 4x4 Crossbar Proc 16KL1 2way Proc 16KL1 2way Proc 16KL1 2way Proc 16KL1 2way 1X4Crossbar 512K L2 2way 512K L2 2way 512K L2 2way 512K L2 2way 128b L2 Bus 2MB L2 2 Way 128b System bus L2Shared L1Shared Memory shared

  49. Stanford’s Hydra Considerations in the design of Hydra CSL-TR-98-749, CPU CPU CPU CPU L1 I Cache L1D Cache L1 I Cache L1D Cache L1 I Cache L1D Cache L1 I Cache L1D Cache Mem.Cont. Mem.Cont. Mem.Cont. Mem.Cont. Write Through Bus(64b) Read/Replace Bus(256b) On-chipL2Cache Off-chipL3CacheInt. Rambus Memory interface I/O Bus Interface Cache SRAM Array DRAM Main Memory I/O

  50. Daytona(Lucent) • MESI Protocol • RISC+DSP • Pipelined operation of bus and memory controller. • 128bit STBus • 0.25μm CMOS4.5m×6mm (small chip)

More Related