1 / 28

Memory Consistency Models (III)

Memory Consistency Models (III). Relaxing all orders. Retain control and data dependences within each thread Why? allow multiple, overlapping read operations non-blocking read, lock-up free caches and speculative execution. Relaxing all orders. Weak ordering

spencer
Télécharger la présentation

Memory Consistency Models (III)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Consistency Models (III)

  2. Relaxing all orders • Retain control and data dependences within each thread • Why? • allow multiple, overlapping read operations • non-blocking read, lock-up free caches and speculative execution

  3. Relaxing all orders • Weak ordering • synchronization operations wait for all previous mem ops to complete • arbitrary completion ordering between • Release Consistency • acquire: read operation to gain access to set of operations or variables • release: write operation to grant access to others • acquire must before following accesses • release must wait for preceeding accesses to complete

  4. Examples

  5. Preserved Orderings Weak Ordering Release Consistency read/write ° ° ° read/write read/write ° ° ° read/write Acquire 1 1 read/write ° ° ° read/write 2 Synch(R) read/write ° ° ° read/write 3 read/write ° ° ° read/write Release 2 Synch(W) read/write ° ° ° read/write 3

  6. Relaxing All Program Orders • Weak Ordering Model (WO) • Release Consistency Models (RCsc/RCpc) • Three Models proposed for commercial architectures Digital Alpha SPARC V9 relaxed memory order (RMO) IBM PowerPC

  7. Relaxing All Program Orders • All models allow a processor to read its own writes early • RCpc and IBM PowerPC are the only models which allow a read to return the value of another processor’s write early • Two types • WO, RCsc, and RCpc models distinguish memory operations based on their type, and provide stricter ordering constraints for some types of operations • Alpha, RMO, and PowerPC models provide explicit fence instructions for imposing pgm orders between various memory operations

  8. Pn P2 P1 M1 M2 Mn Weak Ordering • Two types of operations • data and synchronization operations Program Order R R W W Rs Ws Rs Ws Rs Rs Ws Ws R W R W R R W W R W R W + coherence + write atomicity

  9. Weak Ordering • All sub-operations of the first operation must complete before any sub-operations of the second operation to maintain the program order ( ) • Coherence requirement : all write sub-operations to the same location to complete in the same order across all memory copies • The read sub-operation and the sub-operations of the write (possibly from a different processor) whose value is returned by the read should complete before any of the sub-operations of the operation that follows the read in program order.( ) - Dubois et al. refer to this as the read being “globally performed”.

  10. Weak Ordering • Conflicting op: • access the same mem location and at least one of them is write • Competing op: • If it is possible for them to appear next to each other in a sequentially consistent total order • Synchronized program • All competing memory ops have been labeled as syn operations conflicting P1 A = 1; B = 1; Flag =1 P2 *(while (Flag==0);) .. = Flag . . .. = Flag u=A v=B po competing non-competing co

  11. shared competing non-competing sync (acq, rel) non-sync Pn P2 P1 M1 M2 Mn Release Consistency (RC) Program Order RCsc Rc Rc Wc Wc Rc Wc Rc Wc Racq Racq R W R W Wrel Wrel R R W W R W R W RCpc Rc Rc Wc Wc Rc Wc Rc Wc Racq Racq R W R W Wrel Wrel R R W W R W R W +coherence

  12. DEC Alpha • impose ordering solely through explicit fence instructions • two types of fences • the memory barrier (MB) • the write memory barrier (WMB) • the only model that enforces the program order between reads to the same location • implementation can safely exploit optimizations such as read forwarding

  13. P1 P2 Pn F2 F1 F1 F1 F1 M DEC Alpha Program Order R R W W W R W R R W R R W W R W R W

  14. P1 P2 Pn F1 F2 F3 F4 M SPARC V9 Relaxed Memory Order (RMO) • Extension of the TSO and PSO models • four types of fences • a four-bit opcode Program Order R R W W R W R W R R W W R W R W

  15. Pn P2 P1 F F F F M1 M2 Mn IBM PowerPC Model • multiple-copy semantics of memory to the programmer • the conceptual system is identical to that of WO • single fence instruction, called a SYNC Program Order R R W W R W R W R R W W R W R W + Coherence An approximate representation for the PowerPC model

  16. Simple Categorization of Relaxed Models

  17. Relationship among models SC IBM-370 TSO PC Stricter relation PSO WO Alpha RCSC RMO PowerPC RCPC

  18. Complete Relaxed Consistency Model • System specification • what program orders among mem operations are preserved • what mechanisms are provided to enforce order explicitly, when desired • Programmer’s interface • what program annotations are available • what ‘rules’ must be followed to maintain the illusion of SC • Translation mechanism

  19. Programmer’s Interface • weak ordering allows programmer to reason in terms of SC, as long as programs are ‘data race free’ • release consistency allows programmer to reason in terms of SC for “properly labeled programs” • lock is acquire • unlock is release • barrier is both • ok if no synchronization conveyed through ordinary variables

  20. Identifying Synch events • two memory operation in different threads conflict ifthey access same location and one is write • two conflicting operations compete if one may follow the other in a SC execution with no intervening memory operations on shared data • a parallel program is synchronized if all competing memory operations have been labeled as synchronization operations • perhaps differentiated into acquire and release • allows programmer to reason in terms of SC, rather than underlying potential reorderings

  21. p1 p2 write A write B test (flag) set (flag) test (flag) read A read B P.O. P.O. C.O. Example • Accesses to flag are competing • they constitute a Data Race • two conflicting accesses in different threads not ordered by intervening accesses • Accesses to A (or B) conflict, but do not compete • as long as accesses to flag are labeled as synchronizing

  22. Properly Synchronized Programs • All synchronization operations explicitly identified • All data accesses ordered though synchronizations • no data races! => Compiler generated programs from structured high-level parallel languages

  23. How should programs be labeled? • Data parallel statements a la HPF (Forall loop) • Library routines (lock, unlock, barrier) • Variable attributes for declarations • Annotations at the statement level

  24. READ READ WRITE READ WRITE READ WRITE WRITE Properly-Labeled Model - Level One (PL1) READ/WRITE ... READWRITE READ/WRITE ... READWRITE READ/WRITE ... READWRITE READ/WRITE ... READWRITE READ/WRITE ... READWRITE shared competing READ/WRITE ... READWRITE non-competing

  25. PL2 shared competing non-competing sync (acq, rel) non-sync Labels a1,a2: non-sync c1,c2(test): sync c1c2(set): non-sync d1,d2: non-competing f1,f2: non-sync g1,g2: sync P1 P2 a1: u= Bound a2: u=Bound b1: if (local_bound<u){ b2: if (local_bound<u){ c1: while (test&set(L)==1); c2: while (test&set(L)==1); d1: u=Bound; d2: u=Bound e1: if (local_bound<u) e2: if (local_bound<u) f1: Bound = local_bound; f2: Bound = local_bound; g1: L = 0; g2: L = 0; h1: } h2:} ...... .......

  26. test(READ) (sync) set(WRTIE) (non-sync) PL2 programs READ/WRITE .... READ/WRITE test(READ) (sync) set(WRTIE) (non-sync) READ/WRITE .... READ/WRITE READ/WRITE .... READ/WRITE READ/WRITE .... READ/WRITE READ/WRITE .... READ/WRITE unset(WRITE) (sync) unset(WRITE) (sync) READ/WRITE .... READ/WRITE

  27. Summary of Programmer Model • Contract between programmer and system: • programmer provides synchronized programs • system provides effective “sequential consistency” with more room for optimization • Allows portability over a range of implementations • Research on similar frameworks: • Properly-labeled (PL) programs - Gharachorloo 90 • Data-race-free (DRF) - Adve 90 • Unifying framework (PLpc) - Gharachorloo,Adve 92

  28. Interplay of Micro and multi processor design • Multiprocessors tend to inherit consistency model from their microprocessor • MIPS R10000 -> SGI Origin: SC • PPro -> NUMA-Q: PC • Sparc: TSO, PSO, RMO • Can weaken model or strengthen it • As micros get better at speculation and reordering it is easier to provide SC without as severe performance penalties • speculative execution • speculative loads • write-completion (precise interrupts)

More Related