1 / 39

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors. Mellor-Crummey and Scott Presented by Robert T. Bauer. Problem. Efficient SMMP Reader/Writer Synchronization. Basics. Readers can “share” a data structure Writers need exclusive access Write appears to be atomic

Télécharger la présentation

Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer

  2. Problem • Efficient SMMP Reader/Writer Synchronization

  3. Basics • Readers can “share” a data structure • Writers need exclusive access • Write appears to be atomic • Issues: • Fairness: Fair  every “process” eventually runs • Preference: • Reader preference  Writer can starve • Writer preference  Reader can starve

  4. Organization • Algorithm 1 – simple mutual exclusion • Algorithm 2 – RW with reader preference • Algorithm 3 – A fair lock • Algoirthm 4 – local only spinning (Fair) • Algorithm 5– local only reader preference • Algorithm 6 – local only writer preference • Conclusions Paper’s Contrib

  5. Algorithm I – just a spin lock • Idea is that processors spin on their own lock record • Lock records form a linked list • When a lock is released, the “next” processor waiting on the lock is signaled by passing the lock • By using “compare-swap” when releasing, the algorithm guarantees FIFO • Spinning is “local” by design

  6. Algorithm 1 • Acquire Lock pred := fetch_and_store(L, I) pred /= null  I->locked := true prednext := I repeat while Ilocked • Release Lock Inext == null  compare_and_swap(L,I,null)  return repeat while Inext == null Inextlocked := false

  7. Algorithm 2 – Simple RW lock with reader preference Bit 31:1 – count of interested readers Bit 0 – writer active? start_write – repeat until compare_and_swap(L,0, 0x1) start_read – atomic_add(L,2);repeat until ((L & 0x1) = 0) end_write – atomic_add(L, -1) end_read – atomic_add(L, -2)

  8. Algorithm 3 – Fair Lock Reader Count Writer Count Requests Reader Count Writer Count Completions start_write prev = fetch_clear_then_add(Lrequests, MASK, 1) // ++ write requests repeat until completions = prev // wait for previous readers and writers to go first end_write – clear_then_add(Lcompletions, MASK,1) // ++ write completions start_read // ++ read request, get count of prev writers prev_writer = fetch_clear_then_add(Lrequests, MASK, 1) & MASK repeat until (completions & MASK) = prev_writer // wait for prev writers to go first end_read – clear_then_add(Lcompletions, MASK,1) // ++ read completions

  9. So far so good, but … • Algorithm 2 and 3 spin on a shared memory location. • What we want is for the algorithms to spin on processor local variables. • Note – results weren’t presented for Algorithms 2 and 3. We can guess the performance though, since we know the general characteristics of contention.

  10. Algorithm 4Fair R/W Lock: Local-Only Spinning • Fairness Algorithm • read request granted when all previous write requests have completed • write request granted when all previous read and write requests have completed

  11. Lock and Local Data Layout

  12. Case 1: Just a Read Lock.tail  I Pred == nil Upon exit: Lock.tail  I Lock.reader_count == 1

  13. Case 1: Exit Read next == nil Lock.tailI, so cas ret T Lock.reader_count == 1 Lock.next_writer == nil Upon Exit: Lock.tail == nil Lock.reader_count == 0

  14. Case 2: Overlapping Read After first read: Lock.tail  I1 Lock.reader_count == 1 not nil !!!! predclass == reading Pred->state == [false,none] Locked.reader_count == 2

  15. Case 2: Overlapping Read After the 2nd read enters: Locked.tail  I2 I1next == I2

  16. Case 2: Overlapping reads I1 finishes  next != nil I2 finishes  Locked.tail = nil count goes to zero after I1 and I2 finish

  17. Case 3: Read Overlaps Write • The previous cases weren’t interesting, but they did help us get familiar with the data structures and (some of) the code. • Now we need to consider the case where a “write” has started, but a read is requested. The read should block (spin) until the write completes. • We need to “prove” that the spinning occurs on a locally cached memory location.

  18. Case 3: Read Overlaps WriteThe Write pred == nil reset blocked to false Upon exit: Locked.tail  I Locked.next_writer = nil I.class = writing, I.next = nil I.blocked = false, success… = none

  19. Case 3: Read Overlaps WriteThe Read pred class == writing wait here for write to complete

  20. Case 3: Read Overlaps WriteThe Write Completes I.next  The Read Yes! Works, but is “uncomfortable” because concerns aren’t separated unlock the reader

  21. Case 3: What if there were more than 1 reader? change the predecessor reader wait here Yes! Changed by the successor unblock the successor

  22. Case 4: Write Overlaps Read • Overlapping reads form a chain • The overlapping write, “spins” waiting for the read chain to complete • Reads that “enter” after the write as “enter”, but before the write completes (even while the write is “spinning”), form a chain following the write (as with case 3).

  23. Case 4: Write Overlaps Read wait here

  24. Algorithm 5  Reader Preference R/W Local-Only Spinning • We’ll look at the Reader-Writer-Reader case and demonstrate that the second Reader completes before the Writer is signaled to start.

  25. 1st Reader ++reader_count Waflag == 0  false 1st reader just runs!

  26. Overlapping Write queue the write Register writer interest, result not zero, since there is a reader We have a reader, so the cas fails. The writer blocks here waiting for a reader set blocked = false

  27. 2nd Reader Still no active reader ++reader_count

  28. Reader Completes Only last reader will satisfy equality Last reader to complete will set WAFLAG and unblock writer

  29. Algorithm 6  Writer Preference R/W Local-Only Spinning • We’ll look at the Writer-Reader-Writer case and demonstrate that the second Writer completes before the Reader is signaled to start.

  30. 1st Writer 1st writer

  31. “set_next_writer” 1st writer writer interested or active no readers, just writer writer should run

  32. 1st Writer 1st writer blocked = false, so writer starts

  33. Reader put reader on queue “register” reader, see if there are writers wait here for writer to complete

  34. 2nd Writer queue this write behind the other write and wait

  35. Writer Completes start the queued write

  36. Last Writer Completes clear write flags signal readers

  37. Unblock Readers ++reader count, clear rdr’s interested no writers waiting or active empty the “waiting” reader list when this reader continues, it will unblock the “next” reader -- which will unblock the “next” reader, etc. reader count gets bumped

  38. Results & Conclusion • The authors reported results for a different algorithm than was presented here. • The “algorithms” used were “more” costly in a multiprocessor environment; so they’re claiming that the algorithms presented here would be “better.”

  39. Timing Results Latency is costly because of the number of atomic operations.

More Related