1 / 22

Fence Complexity in Concurrent Algorithms

Fence Complexity in Concurrent Algorithms. Petr Kuznetsov TU Berlin/DT-Labs. STM is about ease-of-programming and efficiency. What is “efficient“ in a concurrent system?. Cost metrics. Space: used memory Cheap Advanced garbage-collection Time:

serge
Télécharger la présentation

Fence Complexity in Concurrent Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fence Complexity in Concurrent Algorithms Petr Kuznetsov TU Berlin/DT-Labs

  2. STM is about ease-of-programming and efficiency What is “efficient“ in a concurrent system?

  3. Cost metrics • Space: used memory • Cheap • Advanced garbage-collection • Time: • the number of reads and writes (per operation) • the number of stalls

  4. Relaxed memory models Memory is much slower than CPU Read: check the cache -> read the memory Write: invalidate the caches -> update the memory To overcome “stalled writes” – reorder operations Reordering may result in inconsistency

  5. What is inconsistency? Process P: Write(X,1) Read(Y) Process Q: Write(Y,1) Read(X) R(Y) W(X,1) W(X,1) P Q W(Y,1) R(X)

  6. Possible outcomes Out-of-order P Q P reads before Q writes Q reads after P writes P reads after Q writes Q reads before P writes

  7. Fixing out-of-order • Memory fences: read-after-write (RAW) write(X,1) fence() // enforce the order read(Y) W(X,1) R(Y) P Q W(Y,1) R(X)

  8. Fixing out-of-order • Atomic operations: atomic-write-after-read atomic{ read(Y) … write(X,1) } E.g., CAS, TAS, Fetch&Add,… RAW/AWAR fences take ~60 RMRs

  9. Our result • Any concurrent program in a certain class must use RAW/AWARs

  10. What programs? • Concurrent data types: • queues, counters, hash tables, trees,… • Non-commutative operations • Linearizable solo-terminating implementations • Mutual exclusion

  11. Non-commutative operations Operation A is non-commutative if there exists operation B where (applied to some state): A influences B and B influences A

  12. Example: Queue • enq(v) – add v to the end of the queue • deq() – dequeues the item at the head of the queue Q=1;2 Q.deq():1;Q.deq():2 vs. Q.deq():2;Q.deq():1 deq() influence each other Q.enq(3):ok;Q.deq():1 vs. Q.deq():1;Q.enq(3):ok enq() is commutative

  13. Proof sketch • A non-commutative operation must write • Suppose not deq():1 deq():1 1;2 w there must be a write!

  14. Proof sketch • Let w be the first write • Suppose there are no AWAR A(w) - the longest atomic construct containing w deq():1 1;2 w w must be the first base-object event in A(w)!

  15. Proof sketch • Suppose there are no RAWs deq():1 deq():1 1;2 A(w) No RAW - no difference for deq()!

  16. Mutual exclusion Lock() – acquire the lock Unlock() – release the lock • (Mutex) No two process holds the lock at the same time • (Deadlock-freedom) If at least one process executes Lock() and no active process fails, at least one process acquires the lock Two Lock() operations influence each other!

  17. Our result • In any implementation of mutual exclusion or a concurrent data type with a non-commutative operation op, a complete execution of op or lock() contains a • RAW or AWAR • Every successful lock acquire incurs • a RAW/AWAR fence

  18. Why do we care? • Hardware design: what primitives must be optimized? • API design: returned values matter • Set with add returning fail vs. returning ok • Verification – early catch of obviously incorrect algorithm

  19. What’s next? • Weaker primitives? • Idempotent Work Stealing [Michael et al,PPoPP’09 ] • Tight lower bounds? • How many RAW/AWAR fences are incurred? • Other patterns • Read-after-read • Write-after-write • Multi-RAW: write(Xi,1) collect(X1,..,Xn)

  20. References • H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. Michael, M. VechevLaws of Order: Expensive Synchronization in Concurrent Algorithms Cannot be EliminatedIn POPL 2011 • Srivatsan’s talk on STM fence complexity, TR on the way

  21. QUESTIONS?

More Related