Multiprocessors Synchronization and Memory Consistency

CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006 Anshul Kumar, CSE IITD

Synchronization Problem • Processes run on different processors independently • At some point they need to know the status of each other for • communication • mutual exclusion etc • Hardware primitive for atomic read+write is required (e.g. test&set, exchange, fetch&increment etc.) Anshul Kumar, CSE IITD

Spin Lock with Exchange Instr. Lock: 0 indicates free and 1 indicates locked Code to lock X : r2  1 lockit: r2  X ;atomic exchange if(r20)lockit ;already locked locks are cached for efficiency, coherence is used Better code to lock X : lockit: r2  X ;read lock if(r20)lockit ;not available r2  1 r2  X ;atomic exchange if(r20)lockit ;already locked

LD Locked & ST conditional Simpler to implement • atomic exchange using LL and SC try: r3  r2 ;move exchange value LL r1, X ;load locked SC r3, X ;store conditional if(r3=0)try ;branch store fails r2  r1 ;put loaded value in r2 • fetch&increment using LL and SC try: LL r1, X ;load locked r3  r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)try ;branch store fails Anshul Kumar, CSE IITD

Spin Lock with LL & SC lockit: LL r2, X ;load locked if(r20)lockit ;not available r2  1 SC r2, X ;store cond if(r2=0)lockit ;branch store fails spin lock with exponential back-off reduces contention Anshul Kumar, CSE IITD

Barrier Synchronization lock (X) if(count=0)release  0 count++ unlock(X) if(count=total){count0;release1} else spin(release=1) Anshul Kumar, CSE IITD

Improved Barrier Synch. local_sense  !local_sense lock (X) count++ unlock(X) if(count = total) {count0;releaselocal_sense} else {spin(release = local_sense)} tree based barrier reduces contention Anshul Kumar, CSE IITD

Memory Consistency Problem • When must a processor see the value that has been written by another processor? Atomicity of operations – system wide? • Can memory operations be re-ordered? Various models : http://rsim.cs.uiuc.edu/~sadve/Publications/ models_tutorial.ps Anshul Kumar, CSE IITD

Example P1: A = 0 P2: B = 0 ... ... A = 1 B = 1 L1: if(B=0)S1 L2: if(A=0)S2 Which statements among S1 and S2 are done? Both S1, S2 may be done if writes are delayed Anshul Kumar, CSE IITD

Sequential Consistency • result of any execution is same as if the operations of all processors were executed in some sequential order • operations of each processor occur in the order specified by its program - it requires all memory operations to be atomic - too restrictive, high overheads Anshul Kumar, CSE IITD

Relaxing WR order Loads are allowed to overtake stores Write buffering is permitted • Total Store Ordering : Writes are atomic • Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate Anshul Kumar, CSE IITD

Relaxing WR & WW order Partial Store Ordering • Loads are allowed to overtake stores • Writes can be re-ordered • Memory barrier or fence are used to explicitly order any operations Further improves the performance Anshul Kumar, CSE IITD

P1P2 A = 1; while(flag=0); flag = 1; print A; P1P2 A = 1; print B; B = 1; print A; SC ensures that “1” is printed TSO, PC also do so PSO does not SC ensures that if B is printed as “1” then A is also printed as “1” TSO, PC also do so PSO does not Examples Anshul Kumar, CSE IITD

Examples - continued P1P2P3 A = 1; while(A=0); while(B=0); B = 1; print A; SC ensures that “1” is printed. TSO and PSO also do that but PC does not P1P2 A = 1; B = 1; print B; print A; SC ensures that both can’t be printed as “0”. TSO, PC and PSO do not Anshul Kumar, CSE IITD

Relaxing all R/W order Weak Ordering or Weak Consistency • Loads and Stores are not restricted to follow an order • Explicit synchronization primitives are used • Synchronization primitives follow a strict order • Easy to achieve • Low overhead Anshul Kumar, CSE IITD

Release Consistency • Further relaxation of weak ordering • Synch primitives are divided into aquire and release operations • R/W operations after an aquire can not move before it but those before it can be moved after • R/W operations before a release can not move after it but those after it can be moved before Anshul Kumar, CSE IITD

WC and RC Comparison WC RC R/W … R/W R/W … R/W 1 1 synch aquire R/W … R/W R/W … R/W 2 2 synch release R/W … R/W R/W … R/W 3 3 Anshul Kumar, CSE IITD

Multiprocessors Synchronization and Memory Consistency

Multiprocessors Synchronization and Memory Consistency

Presentation Transcript

Lecture 23: Multiprocessors

Chapter4 Multiprocessors

JAVA Multi-thread Programming on multiprocessors.

Bandwidth Adaptive Snooping

New Schedulability Tests for Real-Time task sets scheduled by Deadline Monotonic on Multiprocessors

Multiprocessors continued

User-Level Interprocess Communication for Shared Memory Multiprocessors

Detecting and surviving data races using complementary schedules

Analyzing the Impact of Data Prefetching on Chip MultiProcessors

Symmetric Multiprocessors

Resource augmentation and on-line scheduling on multiprocessors

Comparing Memory Systems for Chip Multiprocessors

Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Software for embedded multiprocessors

Chapter 7 (excl. 7.9): Scalable Multiprocessors

CSL718 : Pipelined Processors

Multiprocessors and Threads

CSL718 : Pipelined Processors

EEL 5764 Graduate Computer Architecture Chapter 4 - Multiprocessors and TLP

Reactive Synchronization Algorithms for Multiprocessors

Multiprocessors 2

Chapter 6 Multiprocessors and Thread-Level Parallelism