590 likes | 608 Vues
Explore various techniques and roles of snoop caches in bus-connected multiprocessors, including shared and private cache concepts. Learn about cache consistency protocols and different cache update methods. Study protocols like WriteThrough, Invalidate, WriteBack, and more.
E N D
Varioustechniques around snoop caches AMANO, Hideharu, Keio University hunga@am.ics.keio.ac.jp Textbookpp.40-60
The role of a cache in bus connected multiprocessors • Improving the access latency of the shared memory • Reducing the bus congestion • Shared Cache • A cache itself is shared. • Private Cache • A cache is provided for a PU.
Snoop Cache Snoop Cache Snoop Cache PU PU PU Private(Snoop) Cache MainMemory Alargebandwidthsharedbus Snoop Cache PU Each PU provides its own private cache
PU PU PU Cache consistency (coherent) problem MainMemory Alargebandwidthsharedbus A A’ A PU Data of each cache is not the same
CacheConsistencyProtocol • Each cache keeps consistency by monitoring (snooping) bus transactions. WriteThrough:Every written data updates the shared memory. Frequent access of bus will degrade performance Basis(Synapse) Illinois Berkeley Invalidate WriteBack: Update (Broadcast) Firefly Dragon
Read Read PU PU PU Write Through Cache(Invalidation:Data read out) I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus V V PU
Write PU PU PU Write Through Cache(Invalidate:Data write into) I:Invalidate V:Valid MainMemory Alargebandwidthsharedbus Monitoring (Snooping) V I V PU
Monitoring (Snooping) Write PU PU PU Write Through Cache(InvalidateDirectWrite) The target cache line is not existing in the cache I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus V I PU
First, Fetch Monitoring (Snoop) Write PU PU PU Write Through Cache(Invalidate:FetchOnWrite) Cache line is not existing in the target cache I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus I V V PU Fetch and write
Write PU PU PU Write Through Cache(Update) I:Invalidated V:Valid MainMemory Alargebandwidthsharedbus Monitoring (Snoop) V V Data is updated PU
Yes Data 010 010 Common Cache Structure(Direct Map) 0011010 0011 010 100 … … Main Memory = 0011 Cache Cache Directory (Tag Memory)
The structure of Snoop cache Shared bus Directory can be accessed simultaneously from both sides. The bus transaction can be checked without caring the access from CPU. Directory CacheMemory Entity The same Directory (DualPort) Directory CPU
The Problem of WriteThroughCache • In uniprocessors, the performance of the write through cache with well designed write buffers is comparable to that of write back cache. • However, in bus connected multiprocessors, the write through cache has a problem of bus congestion.
Read Read PU PU PU Basic Protocol States attached to each line C:Clean (Consistent to shared memory) D: Dirty I:Invalidate MainMemory Alargebandwidthsharedbus C C PU
Invalidation signal Write PU PU PU Basic Protocol(A PU writes the data) MainMemory Alargebandwidthsharedbus C I D C PU Invalidation signal: address only transaction
Read PU PU PU Basic Protocol (A PU reads out) MainMemory Alargebandwidthsharedbus C I C D PU
Snoop Cache Snoop Cache W PU PU PU Basic Protocol (A PU writes into again) MainMemory Alargebandwidthsharedbus D I I D PU
CE CS Snoop Cache Snoop Cache CS PU PU PU States for each line Illinois’s Protocol CE:CleanExclusive CS:CleanSharable DE:DirtyExclusive I:Invalidate MainMemory Alargebandwidthsharedbus PU
CE Snoop Cache Snoop Cache Snoop Cache W PU PU PU Illinois’s Protocol (The role of CE) CE:CleanExclusive CS:CleanSharable DE:DirtyExclusive I:Invalidate MainMemory Alargebandwidthsharedbus →DE PU
Snoop Cache Snoop Cache R R PU PU PU Berkeley’s protocol MainMemory Alargebandwidthsharedbus US US PU Ownership→responsibility of write back OS:OwnedSharableOE:OwnedExclusive US:UnownedSharableI:Invalidated
US Snoop Cache Snoop Cache W PU PU PU Berkeley’s protocol (A PU writes into) MainMemory Alargebandwidthsharedbus US →OE →I PU Invalidation is done like the basic protocol
Snoop Cache I Snoop Cache R PU PU PU Berkeley’s protocol The line with US is not required to be written back MainMemory Alargebandwidthsharedbus OE →OS →US PU Inter-cache transfer occurs! In this case, the line with US is not consistent with the shared memory.
CE CS Snoop Cache Snoop Cache PU PU PU Firefly protocol MainMemory Alargebandwidthsharedbus →CS PU CE:CleanExclusiveCS:CleanSharable DE:DirtyExclusive I: Invalidate is not used!
CS Snoop Cache Snoop Cache W PU PU PU Firefly protocol (Writes into the CS line) MainMemory Alargebandwidthsharedbus CS PU All caches and shared memory are updated → Like update type WriteThrough Cache
CE Snoop Cache Snoop Cache Snoop Cache W PU PU PU Firefly protocol (The role of CE) MainMemory Alargebandwidthsharedbus →DE PU Like Illinoi’s, writing CE does not require bus transactions
Snoop Cache Snoop Cache R R PU PU PU Dragon protocol MainMemory Alargebandwidthsharedbus UE US →US PU Ownership→Resposibility of write back OS:OwnedSharableOE:OwnedExclusive US:UnownedSharableUE:UnownedExclusive
I Snoop Cache Snoop Cache W PU PU PU Dragon protocol MainMemory Alargebandwidthsharedbus US →OS PU Only corresponding cache line is updated. The line with US is not required to be written back.
Snoop Cache Snoop Cache R PU PU PU Dragon protocol MainMemory Alargebandwidthsharedbus OE →OS → US PU Direct inter-cache data transfer like Berkeley’s protocol
Snoop Cache Snoop Cache Snoop Cache W PU PU PU Dragon protocol (The role of the UE) MainMemory Alargebandwidthsharedbus UE →OE PU No bus transaction is needed like CE is Illinois’
MOESI Protocol class Valid Owned Exclusive S:Sharable I: Invalid O: Owned E: Exclusive M: Modified
MOESI protocol class • Basic:MSI • Illinois:MESI • Berkeley:MOSI • Firefly:MES • Dragon:MOES Theoretically well defined model. Detail of cache is not characterized in the model.
Invalidate vs.Update • The drawback of Invalidate protocol • Frequent data writing to shared data makes bus congestion → ping-pong effect • The drawback of Update protocol • Once a line shared, every writing data must use shared bus. • Improvement • CompetitiveSnooping • Variable Protocol Cache
Invalidation C Snoop Cache Snoop Cache →D W PU PU PU Ping-pong effect(A PU writes into) MainMemory Alargebandwidthsharedbus C →I PU
D Snoop Cache Snoop Cache →C R PU PU PU Ping-pong effect(The other reads out) MainMemory Alargebandwidthsharedbus I →C PU
Invalidation C Snoop Cache Snoop Cache →D W PU PU PU Ping-pong effect(The other writes again) MainMemory Alargebandwidthsharedbus C →I PU
I Snoop Cache Snoop Cache →C R PU PU PU Ping-pong effect(A PU reads again) MainMemory Alargebandwidthsharedbus D →C PU A cache line goes and returns iteratively →Ping-pong effect
CS Snoop Cache Snoop Cache PU PU PU The drawback of update protocol(Firefly protocol) MainMemory Alargebandwidthsharedbus CS W PU B Once a line becomes CS, a line is sent even if B the line is not used any more. FalseSharing causes unnecessary bus transaction.
CS Snoop Cache Snoop Cache PU PU PU CompetitiveSnooping MainMemory Alargebandwidthsharedbus CS →I W PU Update n times, and then invalidates The performance is degraded in some cases.
Invalidation Snoop Cache C Snoop Cache →D →CE W PU PU PU WriteOnce(GoodmanProtocol) MainMemory Alargebandwidthsharedbus C →I PU Main memory is updated with invalidation. Only the first written data is transferred to the main memory.
→I →I US Snoop Cache US W PU PU PU ReadBroadcast(Berkeley) MainMemory Alargebandwidthsharedbus US →OE PU Invalidation is the same as the basic protocol.
I Snoop Cache I R PU PU PU ReadBroadcast MainMemory Alargebandwidthsharedbus OE →OS →US →US PU Read data is broadcast to other invalidated cache.
I Snoop Cache I R PU PU PU Cache injection MainMemory Alargebandwidthsharedbus I →US →US →US PU The same line is injected.
CS Snoop Cache Snoop Cache PU PU PU New Keio Protocol MainMemory Alargebandwidthsharedbus X CE →CS PU Minimize communication outside of the chip
DE Snoop Cache Snoop Cache →DS R PU PU PU New Keio Protocol MainMemory Alargebandwidthsharedbus I →DS PU Avoiding write back as possible:DS(DirtyShared)
Replace DEO Snoop Cache Snoop Cache Snoop Cache PU PU PU Zombie Cache I MainMemory Alargebandwidthsharedbus PU Chip
Replace DEO Snoop Cache Snoop Cache Snoop Cache PU PU PU Zombie Cache II MainMemory Z-2Cache Alargebandwidthsharedbus PU Chip
Cache systems for on-chip-multiprocessors • Shared caches/Shared registers are available. • High-bandwidth bus/switches are available. • Cache hierarchy is required. • Server/High Performance • Stanford Hydra project • IBM Power4 • SUN MJCA • Compaq Piranha • Embedded Chip-Multiprocessors
128b 128b 128b 128b 128b 128b 128b 128b 64b 64b 64b 64b 64b 64b 64b 64b Possible cache structure(StanfordHydraProject) 128b System Bus Evaluation of Design Alternative for a Multiprocessor Microprocessor ISCA96より Proc 16K L1 2way Proc 16K L1 2way Proc 16K L1 2way Proc 16K L1 2way 1X4Crossbar 512K L2 512K L2 512K L2 512K L2 Proc. Proc. Proc. Proc. 4x4 Crossbar 16K L1 2way 16K L1 2way 16K L1 2way 16K L1 2way 4x4 Crossbar Proc 16KL1 2way Proc 16KL1 2way Proc 16KL1 2way Proc 16KL1 2way 1X4Crossbar 512K L2 2way 512K L2 2way 512K L2 2way 512K L2 2way 128b L2 Bus 2MB L2 2 Way 128b System bus L2Shared L1Shared Memory shared
Stanford’s Hydra Considerations in the design of Hydra CSL-TR-98-749, CPU CPU CPU CPU L1 I Cache L1D Cache L1 I Cache L1D Cache L1 I Cache L1D Cache L1 I Cache L1D Cache Mem.Cont. Mem.Cont. Mem.Cont. Mem.Cont. Write Through Bus(64b) Read/Replace Bus(256b) On-chipL2Cache Off-chipL3CacheInt. Rambus Memory interface I/O Bus Interface Cache SRAM Array DRAM Main Memory I/O
Daytona(Lucent) • MESI Protocol • RISC+DSP • Pipelined operation of bus and memory controller. • 128bit STBus • 0.25μm CMOS4.5m×6mm (small chip)