1 / 105

Lecture 9 Outline

Lecture 12: Other Bus-Based Coherence Protocols Spring 2010 E. F. Gehringer, based on slides by Yan Solihin. Lecture 9 Outline. MESI protocol Dragon update-based protocol Impact of protocol optimizations. Lower-Level Protocol Choice.

lavada
Télécharger la présentation

Lecture 9 Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 12: Other Bus-Based Coherence ProtocolsSpring 2010E. F. Gehringer, based on slides by Yan Solihin

  2. Lecture 9 Outline • MESI protocol • Dragon update-based protocol • Impact of protocol optimizations

  3. Lower-Level Protocol Choice • What transition should occur when a BusRd is observed in state M? • Change to S?  Assume I’ll reread it soon • Good for mostly read data • But what about “migratory” data? … • Change to I?  Assume another will write it (Synapse) • I read and write, then you read and write, then X reads and writes... • Sequent Symmetry and MIT Alewife use adaptive protocols

  4. MESI (4-state) Invalidation Protocol • Here’s a problem with the MSI protocol: • A {Rd, Wr} sequence causes two bus transactions • even when no one is sharing (e.g., serial program!) • BusRd (I  S) followed by BusRdX or BusUpgr (S  M) • In general, coherence traffic from serial programs is unacceptable • To avoid this, add a fourth state, Exclusive: • Invalid • Modified (dirty) • Shared (two or more caches may have copies) • Exclusive (only this cache has clean copy, same value as in memory) • How does the protocol decide whether I  E or I  S? • Need to check whether someone else has a copy • “Shared” signal on bus: wired-or line asserted in response to BusRd

  5. PrRd/– M S E I PrWr/– PrWr/BusRdX PrRd/BusRd(~S) PrRd/BusRd(S) PrRd/– MESI: Processor-Initiated Transactions PrRd/– PrWr/– PrWr/BusRdX Fill in the last two transitions here.

  6. M I E S BusRdX/Flush BusRd/Flush BusRd/Flush BusRdX/Flush BusRdX/Flush׳ BusRd/– BusRdX/– BusRd/Flush׳ MESI: Bus-Initiated Transactions Flush׳ means flush only if cache-to-cache sharing is used; only the cache responsible for supplying the data will do a flush.

  7. MESI State Transition Diagram • BusRd(S) means shared line asserted on BusRd transaction

  8. Flush vs. Flush' • Flush: mandatory • Flush' happens only when • Cache-to-cache sharing is used, and, • Only one cache flushes data

  9. Cache Cache Cache Trace P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 1 P2 Read A Main memory MESI Visualization P1 Start state. All caches empty and main memory has A = 1. P2 P3 Bus

  10. Cache Cache Cache Trace P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 Mem P1 PrRd A BusRd A returns data P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Processor P1 attempts to read A from its cache. P2 P3 Bus

  11. Cache Cache Cache Trace P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 Mem P1 PrRd A BusRd A returns data P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Processor P1 issues a BusRd. P2 P3 Bus

  12. Cache Cache Cache Trace A = 1 E P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 Mem P1 PrRd A BusRd A returns data P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Main memory returns data to processor P1 which updates its cache. P2 P3 Bus

  13. Cache Cache Cache Trace A = 1 E P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Read operation completes. P2 P3 Bus

  14. Cache Cache Cache Trace A = 2 M P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 PrWr A P3 Read A A = 1 P2 Read A Main memory Processor P1 Writes A = 2 P1 Processor P1 writes to its cache. P2 P3 M Bus One less bus request due to Exclusive state, esp. for serial programs

  15. Cache Cache Cache Trace A = 2 M P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 1 P2 Read A Main memory Processor P1 Writes A = 2 P1 Write operation completes. P2 P3 Bus

  16. Cache Cache Cache Trace A = 2 M P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 P3 snoops BusRd PrRd A P3 Read A A = 1 P3 P1 BusRd A Flush P2 Read A Main memory Processor P3 Reads A P1 Processor P3 attempts to read A from its cache. P2 P3

  17. Cache Cache Cache Trace A = 2 M P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 P3 snoops BusRd PrRd A P3 Read A A = 1 P3 P1 BusRd A Flush P2 Read A Main memory Processor P3 Reads A P1 Processor P3 issues a BusRd. P2 P3

  18. Cache Cache Cache Trace A = 2 S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 P3 snoops BusRd PrRd A P3 Read A A = 1 P3 P1 BusRd A Flush P2 Read A Main memory Processor P3 Reads A P1 Processor P1 snoops the BusRd from processor P3. P2 P3

  19. Cache Cache Cache Trace A = 2 A = 2 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 P3 PrRd A snoops BusRd P3 Read A A = 2 P3 P1 BusRd A Flush P2 Read A Main memory Processor P3 Reads A P1 Processor P1 flushes, sending updated data to P3 and main memory. P2 P3

  20. Cache Cache Cache Trace A = 2 A = 2 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P3 Reads A P1 Read operation completes. P2 P3

  21. Cache Cache Cache Trace A = 2 A = 2 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P3 Writes A = 3 P1 Processor P3 writes to its cache. P2 P3 P3 PrWr A P3 BusUpgr P1 snoops BusUpgr

  22. Cache Cache Cache Trace A = 2 A = 2 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P3 Writes A = 3 P1 Processor P3 issues a BusRd request. P2 P3 P3 PrWr A P3 BusUpgr P1 snoops BusUpgr Note: BusUpgr instead of BusRdX

  23. Cache Cache Cache Trace A = 2 A = 3 I M P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P3 Writes A = 3 P1 Processor P1 snoops the BusRd and invalidates its cache. P2 P3 P3 PrWr A P3 BusUpgr P1 snoops BusUpgr

  24. Cache Cache Cache Trace A = 3 A = 2 I M P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P3 Writes A = 3 P1 Write operation completes. P2 P3

  25. Cache Cache Cache Trace A = 2 A = 3 I M P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P1 Reads A P1 Processor P1 reads from its cache. P2 P3 P1 PrRd A P1 BusRd A P3 snoops BusRd P3 Flush

  26. Cache Cache Cache Trace A = 2 A = 3 I M P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P1 Reads A P1 Processor P1 issues a BusRd request. P2 P3 P1 PrRd A P1 BusRd A P3 snoops BusRd P3 Flush

  27. Cache Cache Cache Trace A = 2 A = 3 I M P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 2 P2 Read A Main memory Processor P1 Reads A P1 Processor P3 snoops the BusRd. P2 P3 P1 PrRd A P1 BusRd A P3 snoops BusRd P3 Flush

  28. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P1 Reads A P1 Processor P3 flushes, updating processor P1, main memory and its own cache state. P2 P3 P1 PrRd A P1 BusRd A P3 snoops BusRd P3 Flush

  29. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P1 Reads A P1 Read operation completes. P2 P3

  30. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P3 Reads A P1 Processor P3 reads from its cache. P2 P3 P3 PrRd A P3 returns data

  31. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P3 Reads A P1 Processor P3 returns valid data from its cache. P2 P3 P3 PrRd A P3 returns data

  32. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P3 Reads A P1 Read operation completes. P2 P3

  33. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P2 Reads A P1 Processor P2 reads from its cache. P2 P3 P2 PrRd A P2 BusRd A P1 Flush

  34. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P2 Reads A P1 Processor P2 issues a BusRd request. P2 P3 P2 PrRd A P2 BusRd A P1 Flush

  35. Cache Cache Cache Trace A = 3 A = 3 S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P2 Reads A P1 Main memory controller observes the BusRd. P2 P3 A = 3 S X P2 PrRd A P2 BusRd A P1 Flush Referred to as Cache-to-cache transfer in Illinois MESI protocol

  36. Cache Cache Cache Trace A = 3 A = 3 A = 3 S S S P1 Read A Bus Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 3 P2 Read A Main memory Processor P2 Reads A P1 Operation completes. P2 P3

  37. MESI Example (Cache-to-Cache Transfer) * Data from memory if no cache2cache transfer, BusRd/ –

  38. MESI Example (Cache-to-Cache Transfer+BusUpgr) * Data from memory if no cache2cache transfer, BusRd/ –

  39. Lower-Level Protocol Choices • Who supplies data on miss when not in M state: memory or cache? • Original, lllinois MESI: cache • assume cache faster than memory (cache-to-cache transfer) • Not necessarily true • Adds complexity • How does memory know it should supply data? (must wait for caches) • A selection algorithm is needed if multiple caches have valid data. • Useful in a distributed-memory system • May be cheaper to obtain from nearby cache than distant memory • Especially when constructed out of SMP nodes (Stanford DASH)

  40. Lecture 9 Outline • MESI protocol • Dragon update-based protocol • Impact of protocol optimizations

  41. Dragon Writeback Update Protocol • Four states • Exclusive-clean (E): Memory and I have it • Shared clean (Sc): I, others, and maybe memory, but I’m not owner • Shared modified (Sm): I and others but not memory, and I’m the owner • Sm and Sc can coexist in different caches, with at most one Sm • Modified or dirty (M): I and, no one else • On replacement: Sc can silently drop, Sm has to flush • No invalid state • If in cache, cannot be invalid • If not present in cache, can view as being in not-present or invalid state • New processor events: PrRdMiss, PrWrMiss • Introduced to specify actions when block not present in cache • New bus transaction: BusUpd • Broadcasts single word written on bus; updates other relevant caches

  42. E M PrRdMiss/BusRd(S) PrWrMiss/BusRd(~S) Dragon: Processor-Initiated Transactions PrRd/– PrRd/– Sc PrRdMiss/BusRd(~S) PrWr/BusUpd(S) PrWr/BusUpd(~S) PrWr/– PrWrMiss/ (BusRd(S);BusUpd) Sm PrWr/BusUpd(~S) PrRd/– PrRd/– Fill in the last two transitions here. PrWr/– PrWr/BusUpd(S)

  43. E M Dragon: Bus-Initiated Transactions BusRd/– BusUpd/Update BusRd/– Sc BusUpd/Update BusRd/Flush Sm BusRd/Flush

  44. PrRd/— BusUpd/Update PrRd/— BusRd/— Sc E PrRdMiss/ BusRd(S) PrRdMiss/ BusRd(S) PrWr/— PrWr/BusUpd(S) PrWr/ BusUpd(S) BusUpd/Update PrWrMiss/ (BusRd(S); BusUpd) PrWrMiss/ BusRd(S) BusRd/Flush Sm M PrWr/BusUpd(S) PrRd/— PrRd/— PrWr/BusUpd(S) PrWr/— BusRd/Flush Dragon State Transition Diagram

  45. Cache Cache Cache Trace P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 1 P2 Read A Main memory Dragon Visualization P1 Start state. All caches empty and main memory has A = 1. P2 P3 Bus

  46. Cache Cache Cache Trace P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 Mem P1 PrRd A BusRd A returns data P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Processor P1 attempts to read A from its cache. P2 P3 Bus

  47. Cache Cache Cache Trace P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 Mem P1 PrRd A BusRd A returns data P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Processor P1 issues a BusRd. P2 P3 Bus

  48. Cache Cache Cache Trace A = 1 E P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 Mem P1 PrRd A BusRd A returns data P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Main memory returns data to processor P1 which updates its cache. P2 P3 Bus

  49. Cache Cache Cache Trace A = 1 E P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P3 Read A A = 1 P2 Read A Main memory Processor P1 Reads A P1 Read operation completes. P2 P3 Bus

  50. Cache Cache Cache Trace A = 2 M P1 Read A Controller P1 Write A = 2 P3 Read A Snooper Snooper Snooper P3 Write A = 3 P1 Read A P1 PrWr A P3 Read A A = 1 P2 Read A Main memory Processor P1 Writes A = 2 P1 Processor P1 writes to its cache. P2 P3 M Bus

More Related