1 / 100

Shared Memory Consistency Models: A Tutorial

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson. Shared Memory Consistency Models: A Tutorial. Outline. Concurrent programming on a uniprocessor The effect of o ptimizations on a uniprocessor The e ffect of the same optimizations on a multiprocessor

finley
Télécharger la présentation

Shared Memory Consistency Models: A Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By SaritaAdve & KouroshGharachorloo Slides by Jim Larson Shared Memory Consistency Models:A Tutorial

  2. Outline • Concurrent programming on a uniprocessor • The effect of optimizations on a uniprocessor • The effect of the same optimizations on a multiprocessor • Methods for restoring sequential consistency • Conclusion

  3. Outline • Concurrent programming on a uniprocessor • The effect of optimizations on a uniprocessor • The effect of the same optimizations on a multiprocessor • Methods for restoring sequential consistency • Conclusion

  4. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0

  5. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0

  6. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0

  7. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0

  8. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1

  9. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1

  10. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Critical Section is Protected Works the same if Process 2 runs first! Process 2 enters its Critical Section

  11. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 0 Arbitrary interleaving of Processes

  12. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Arbitrary interleaving of Processes

  13. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Both processes can block but the critical section remains protected. Deadlock can be fixed by extending the algorithm with turn-taking

  14. Outline • Concurrent Programming on a Uniprocessor • The effect of optimizations on a Uniprocessor • The effect of the same optimizations on a Multiprocessor without Sequential Consistency • Methods for restoring Sequential Consistency • Conclusion

  15. SpeedUp: Write takes 100 cycles, buffering takes 1 cycle. So Buffer and keep going. Problem: Read from a Location with a buffered Write pending?? (Single Processor Case) Optimization: Write Buffer with Bypass

  16. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 0 Write Buffering

  17. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering

  18. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering Uh-Oh!

  19. SpeedUp: Write takes 100 cycles, buffering takes 1 cycle. Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete. Optimization: Write Buffer with Bypass

  20. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section STALL! Flag1 = 0 Flag2 = 1 Flag1 = 1 Flag2 = 0 Write Buffering Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  21. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 1 Flag2 = 1 Flag2 = 0 Write Buffering Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  22. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Does this work for Multiprocessors??

  23. Outline • Concurrent programming on a uniprocessor • The effect of optimizations on a uniprocessor • The effect of the same optimizations on a multiprocessor • Methods for restoring sequential consistency • Conclusion

  24. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Does this work for Multiprocessors?

  25. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  26. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  27. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  28. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  29. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 What Now?? Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  30. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  31. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 How Did That Happen?? Flag2 = 0 Multiprocessor Case Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  32. What happens on a Processor stays on that Processor

  33. Dekker's Algorithm: Global Flags Init to 0 Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Flag1 = 0 Flag1 = 1 Flag2 = 1 Flag2 = 0 Processor 2 knows nothing about the write to Flag1, so has no reason to stall! Rule: If a WRITE is issued, buffer it and keep executing Unless: there is a READ from the same location (subsequent WRITEs don't matter), then wait for the WRITE to complete.

  34. A more general way to look at the Problem: Reordering of Reads and Writes (Loads and Stores).

  35. Consider the Instructions in these processes. Process 1:: Flag1 = 1 If (Flag2 == 0) critical section Process 2:: Flag2 = 1 If (Flag1 == 0) critical section Simplify as: WX WY RX RY

  36. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 RY RY WY RX WY RX WX WX WX WX WX WX WY RX RY RY RX WY WY RX RY RY RX WY RX WY RX WY RY RY RX WY RX WY RY RY RX WY RX WY RY RY WX WX WX WX WX WX WX WX WX WX WX WX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX WY RX RY RY RX WY WY RX RY RY RX WY WX WX WX WX WX WX RX WY RX WY RY RY There are 4! or 24 possible orderings. If either WX<RX or WY<RY Then the Critical Section is protected (Correct Behavior).

  37. WY RX RY RY RX WY WY RX RY RY RX WY WX WX WX WX WX WX RX WY RX WY RY RY 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 WX WX WX WX WX WX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX RY RY WY RX WY RX WX WX WX WX WX WX WY RX RY RY RX WY WY RX RY RY RX WY RX WY RX WY RY RY RX WY RX WY RY RY RX WY RX WY RY RY WX WX WX WX WX WX There are 4! or 24 possible orderings. If either WX<RX or WY<RY Then the Critical Section is protected (Correct Behavior) 18 of the 24 orderings are OK. But the other 6 are trouble!

  38. Consider another example...

  39. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  40. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  41. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 1 Data = 2000 Head = 0 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  42. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  43. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  44. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Data = 2000 Wrong Data! Head = 1 Data = 0 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate.

  45. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 1 Data = 2000 Write By-Pass: General Interconnect to multiple memory modules means write arrival in memory is indeterminate. Fix: Write must be acknowledged before another write (or read) from the same processor.

  46. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.

  47. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.

  48. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.

  49. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data (0) Memory Interconnect Head = 0 Data = 0 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.

  50. Global Data Initialized to 0 Process 1:: Data = 2000; Head = 1; Process 2:: While (Head == 0) {;} LocalValue = Data (0) Memory Interconnect Head = 0 Data = 2000 Non-Blocking Reads: Lockup-free Caches, speculative execution, dynamic scheduling allow execution to proceed past a Read. Assume Writes are acknowledged.

More Related