1 / 25

Memory Consistency & Cache Coherence

MIMD/SPMD ORGANIZATIONS: THREAD-LEVEL PARALLELISM. Memory Consistency & Cache Coherence. Textbook: Sections 5.1-5.4, 7.1-7.3 Semantics of the shared memory model Sequential consistency (SC) Conditions for maintaining SC Cache coherence (CC) Conditions for maintaining CC

ulric
Télécharger la présentation

Memory Consistency & Cache Coherence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MIMD/SPMD ORGANIZATIONS:THREAD-LEVEL PARALLELISM Memory Consistency & Cache Coherence Textbook: Sections 5.1-5.4, 7.1-7.3 Semantics of the shared memory model • Sequential consistency (SC) • Conditions for maintaining SC • Cache coherence (CC) • Conditions for maintaining CC • Anatomy of cache coherence protocols PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  2. Uniform Memory Access (UMA) Systems M M shared bus general interconnect M M M M P P P P P P P Design Space of SMMachines Non-Uniform Memory Access (NUMA) Systems general interconnect Caches attached to each processor are crucial. We therefore consider CC-UMA and CC-NUMA (cache-coherent machines) PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  3. Semantics of Shared Memory Memory accesses form a consistent serial order Implications: • Reads and writes are carried out atomically • Memory accesses from each processor appear in the serial order in the order of the program controlling that processor PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  4. Example Initially, A=B=0 P0: P1: A:=1 B:=1 If B==0 then If A==0 then critical section critical section Programmer can infer that at most one process will enter the critical section Some possible interleavings – legal serial orders WA,0 ,RB,0 ,WB,1,RA,1; P0 enters the critical section WA,0 ,WB,1,RB,0 , RA,1; no one enters the critical section WB,1 , RA,1 ,WA,0 , RB,0; P1 enters the critical section An illegal interleaving RB,0 , RA,1 , WA,0, W B,1; both processes enter critical section PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  5. Sequential Consistency (SC) “the result of the execution is the same as if the operations of all processors are executed in some sequential order and the operations of each individual processor appear in the order specified by its program.” (Lamport, 1979) • SC Execution: An execution of a program is SC if the results it produces conform to any possible interleaving of program orders • SC System: A system is SC if all possible executions on that system is an SC execution PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  6. Definitions Given an SC execution: WA,0 ,WB,1,RB,0 , RA,,1 • [Performing a write]A write operation is performed when a subsequent read cannot return a value of an earlier write in the (legal) serial order • [Performing a read] A read operation is performed when a subsequent write in the (legal) serial order cannot affect the value read Example: • RB,0 returns the value of WB,1 • RA,1 returns the value of WA,0 PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  7. Sufficient Conditions for SC Given all possible serial orders (interleavings) of memory operations for a program, if • [Program order] all memory operations from a processor are performed in program order and • [Atomicity] each memory operation is performed atomically (instantaneously) then the system is SC. The following example shows why atomicity is important: PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  8. race Data Races Note: None of the ordering constraints exist in sequential programs Goal: Implement SC but avoid costly serializations Initially, A=B=0 P0: P1: We can infer that A2:=1 C:=B if C=3 then A1:=2 D:=A1 D=2 and E=1 B:=3 E:=A2 PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  9. Two example systems: A B M M shared bus general interconnect M P P P P Systems without Caches • Atomicity is trivially met – performing a memory operation is indivisible • SC can be violated by performing memory operations out of program order PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  10. P0: P1: A:=1 B:=1 if B=0 then if A=0 then <critical section> <critical section> Two example systems: A B A B M M M shared bus general interconnect P P P P Violations of SC: Examples Sequential consistencycan be violated if memory operations are not carried out in program order PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  11. A B M M M general interconnect P P P P Disallow bypassing in the write buffer Do not issue a new request into the interconnect until the previous one is performed (acknowledged) Enforcing SC A B shared bus PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  12. Most machines supporting an SAS model may have multiple cached copies Memory C C C P P P The Cache Coherence (CC) Problem The cache coherence problem:how to maintain the illusion of a coherent shared address space PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  13. Invalidate Update Interconnection network M M Interconnection network C C C C C C P P P P P P Two Cache Coherence Approaches • A write by Pi invalidates all other copies Pj (<>i) • + Subsequent writes are local • - A read from Pj results in a miss that brings data from Pi Write-invalidate Write-update • A write by Pi updates all other copies Pj (<>i) • - Subsequent writes are global • + A read from Pj does not cause a miss PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  14. Semantic View of a Coherent Memory Consider the following total order of accesses to location X as seen by allprocessors: P1 P2 P3 P4 1. Wx,1 2. Wx,2 • Rx,3 • Rx,4 Cache coherence: Result of execution is as if writes to a single location are serialized • Rx,3 and Rx,4 should return value of most recent write (Wx,2) • Wx,1 should be performed before Wx,2 Formal definitions of performedare critical PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  15. Definitions • A memory request is issued when it is in transit in the memory system • A write is performedw.r.t. a processor when a read request by the processor will return its value or a value by a later write (in a hypothetical serial order) • A read is performed w.r.t. the processor when a subsequently issued write by that processor cannot alter the value that was read Write serialization requires that writes be performed w.r.t.allprocessors before a subsequentread or write isperformed w.r.t.anyprocessor Performed w.r.t. all processors = globally performed PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  16. Maintaining SC Recall the following sufficient conditions: • [Program order] all memory operations from a processor are performed in program order • [Atomicity] Each memory operation must be globally performed, before a subsequent memory operation can be performed with respect to any processor Cache coherence makes the second condition challenging PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  17. Update C C C P1 P2 P3 Example of Atomicity Violation Programmer would infer that A=1 Violation of atomicity: P2’s view WA,1, WB,2 P3’s view WB,2, WA,1 Problem: P2 reads new value of A before it is globally performed PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  18. Relation Between SC and CC Cache coherence applies atomicity and program order requirements with respect to a single location Sequential consistency presumes cache coherence but extends atomicity and program ordering to multiple locations PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  19. Anatomy of Snoopy Cache Protocols Design space: • write-through / write-back invalidate • write-through / write-back update Basic write-back write-invalidate protocol (MSI-protocol): • cache states: M(odified), S(hared), I(nvalid) • processor events: read and write • bus transactions: read, read-exclusive (invalidate), write-back • actions: update state, perform bus transaction, flush value onto bus PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  20. A Write-Through Protocol A simple write-through protocol: • Writes invalidate all other copies • All writes and reads to invalid blocks are broadcast • Assume that writes as well as read misses are atomic PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  21. Satisfying CC andSC • [Program order] Satisfied if no memory operation is reordered • [Atomicity] Satisfied because all global (bus) operations (invalidations and read misses) are atomic • In the partial orders, program order for each individual processor is respected PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  22. A Write-Invalidate Protocol (MSI) • The closer to the top, the more local processor operations • Note: a write to S(hared) results in a read-exclusive request • Optimization: just send invalidation (a bus upgrade) Is coherence and sequential consistency maintained? PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  23. Satisfying CC and SC MSI-protocol guarantees coherence because • Like write-through protocol, write and read misses are serialized by the bus • Write hits are ordered w.r.t. previous global write; therefore, write serialization is guaranteed MSI-protocol guarantees SC because • Bus establishes a total order of memory accesses to alllocations, but … Note:Carrying out bus transactions atomically imposes challenges in practice. We will come back to this issue later. PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  24. A Write-Update Protocol • A write operation updates all copies Problem: We must guarantee that when one processor sees the new value, no other processor can access the old value Two-phase protocol: • Lock all copies for read operations • Update copies and unlock Guarantees that either all processors see old value or all see the new value PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

  25. Pr ogramming models Message passing Compilation Multipr ogramming Communication abstraction or library User/system boundary Shar ed addr ess space Operating systems support Har dwar e/softwar e boundary Communication har dwar e Physical communication medium Memory Consistency Model as an Interface • Relaxations from SC may not violate program correctness • We will consider relaxed memory consistency models later PCOD: Lecture 4 Per Stenström (c) 2008, Sally A. McKee (c) 2009

More Related