1 / 51

Wait-Free Queues with Multiple Enqueuers and Dequeuers in Computer Science

Concurrent FIFO queues, such as lock-free and wait-free implementations, offer correct concurrent adding and removing of elements without blocking threads, ensuring progress guarantees. This work presents an innovative helping mechanism for ensuring wait-free operation in concurrent queues, inspired by the Doorway mechanism used in Bakery mutex. The helping mechanism assigns dynamic age-based priorities to operations, optimizing thread assistance and reducing redundant work for improved efficiency.

berguio
Télécharger la présentation

Wait-Free Queues with Multiple Enqueuers and Dequeuers in Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex Kogan Computer Science, Technion, Israel Erez Petrank

  2. FIFO queues  One of the most fundamental and common data structures enqueue dequeue 5 3 2 9

  3. Concurrent FIFO queues  Concurrent implementation supports “correct” concurrent adding and removing elements  correct = linearizable enqueue dequeue 3 2 9 dequeue dequeue empty! dequeue  The access to the shared memory should be synchronized

  4. Non-blocking synchronization  No thread is blocked in waiting for another thread to complete  e.g., no locks / critical sections  Progress guarantees:  Obstruction-freedom  progress is guaranteed only in the eventual absence of interference  Lock-freedom  among all threads trying to apply an operation, one will succeed  Wait-freedom  a thread completes its operation in a bounded number of steps

  5. Lock-freedom  Among all threads trying to apply an operation, one will succeed  opportunistic approach  make attempts until succeeding global progress all but one threads may starve  Many efficient and scalable lock-free queue implementations

  6. Wait-freedom  A thread completes its operation in a bounded number of steps  regardless of what other threads are doing  A highly desired property of any concurrent data structure  but, commonly regarded as inefficient and too costly to achieve  Particularly important in several domains  real-time systems  operating under SLA  heterogeneous environments

  7. Related work: existing wait-free queues  Limited concurrency  one enqueuer and one dequeuer  multiple enqueuers, one concurrent dequeuer  multiple dequeuers, one concurrent enqueuer [Lamport’83] [David’04] [Jayanti&Petrovic’05]  Universal constructions  generic method to transform any (sequential) object into lock- free/wait-free concurrent object  expensive impractical implementations [Herlihy’91]  (Almost) no experimental results

  8. Related work: lock-free queue [Michael & Scott’96]  One of the most scalable and efficient lock-free implementations  Widely adopted by industry  part of Java Concurrency package  Relatively simple and intuitive implementation  Based on singly-linked list of nodes 12 4 17 head tail

  9. MS-queue brief review: enqueue CAS 12 17 9 4 head CAS tail enqueue 9

  10. MS-queue brief review: enqueue CAS 12 17 9 5 4 CAS head tail CAS enqueue enqueue 9 5

  11. MS-queue brief review: dequeue 12 17 9 4 12 head tail CAS dequeue

  12. Our idea (in a nutshell)  Based on the lock-free queue by Michael & Scott  Helping mechanism  each operation is applied in a bounded time  “Wait-free” implementation scheme  each operation is applied exactly once

  13. Helping mechanism  Each operation is assigned a dynamic age-based priority  inspired by the Doorway mechanism used in Bakery mutex Each thread accessing a queue  chooses a monotonically increasing phase number  writes down its phase and operation info in a special state array  helps all threads with a non-larger phase to apply their operations phase: long pending: boolean enqueue: boolean node: Node state entry per thread

  14. Helping mechanism in action phase 4 9 9 3 pending true false true false enqueue true true true true node ref null ref ref

  15. Helping mechanism in action 10 true phase 4 9 9 pending true false true enqueue true true true true node ref null ref ref I need to help!

  16. Helping mechanism in action 10 true phase 4 9 9 pending true false true enqueue true true true true node ref null ref ref I do not need to help!

  17. Helping mechanism in action 10 true phase 4 9 11 pending true false true enqueue true true false true node ref null null ref I need to help! I do not need to help!

  18. Helping mechanism in action 10 true phase 4 9 11 pending true false true enqueue true true false true node ref null null ref  The number of operations that may linearize before any given operation is bounded  hence, wait-freedom

  19. Optimized helping  The basic scheme has two drawbacks:  the number of steps executed by each thread on every operation depends on n (the number of threads)  even when there is no contention  creates scenarios where many threads help same operations  e.g., when many threads access the queue concurrently  large redundant work  Optimization: help one thread at a time, in a cyclic manner  faster threads help slower peers in parallel  reduces the amount of redundant work

  20. How to choose the phase numbers  Every time tichooses a phase number, it is greater than the number of any thread that made its choice before ti  defines a logical order on operations and provides wait- freedom 4 3 5  Like in Bakery mutex:  scan through state  calculate the maximal phase value + 1 requires O(n) steps true false true true true true ref null ref  Alternative: use an atomic counter requires O(1) steps 6!

  21. “Wait-free” design scheme  Break each operation into three atomic steps  can be executed by different threads  cannot be interleaved Initial change of the internal structure concurrent operations realize that there is an operation-in-progress 1.  Updating the state of the operation-in-progress as being performed (linearized) 2. Fixing the internal structure finalizing the operation-in-progress 3. 

  22. Internal structures 1 2 4 head tail phase 9 4 9 pending false false false enqueue false true true node null null null state

  23. Internal structures these elements were enqueued by Thread 0 this element was enqueued by Thread 1 enqTid: int 1 0 1 4 1 -1 2 0 -1 holds ID of the thread that performs / has performed the insertion of the node into the queue head tail phase 9 4 9 pending false false false enqueue false true true node null null null state

  24. Internal structures this element was dequeued by Thread 1 deqTid: int 1 0 1 4 1 -1 2 0 -1 holds ID of the thread that performs / has performed the removal of the node into the queue head tail phase 9 4 9 pending false false false enqueue false true true node null null null state

  25. enqueue operation Creating a new node 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 9 pending false false false enqueue enqueue false true true node null null null 6 state ID: 2

  26. enqueue operation Announcing a new operation 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 10 pending false false true enqueue enqueue false true true node null null 6 state ID: 2

  27. enqueue operation Step 1: Initial change of the internal structure CAS 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 10 pending false false true enqueue enqueue false true true node null null 6 state ID: 2

  28. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail 4 CAS phase 9 10 pending false false false enqueue enqueue false true true node null null 6 state ID: 2

  29. enqueue operation Step 3: Fixing the internal structure 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail CAS phase 9 4 10 pending false false false enqueue enqueue false true true node null null 6 state ID: 2

  30. enqueue operation Step 1: Initial change of the internal structure 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 10 pending false false true enqueue enqueue enqueue false true true node null null 3 6 state ID: 2 ID: 0

  31. enqueue operation Creating a new node Announcing a new operation 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail phase 11 4 10 pending true false true enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

  32. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail phase 11 4 10 pending true false true enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

  33. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail 4 CAS phase 11 10 pending true false false enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

  34. enqueue operation Step 3: Fixing the internal structure 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head CAS tail phase 11 4 10 pending true false false enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

  35. enqueue operation Step 1: Initial change of the internal structure CAS 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail phase 11 4 10 pending true false false enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

  36. dequeue operation 12 0 -1 4 1 -1 17 0 -1 head tail phase 9 4 9 pending false false false dequeue enqueue false true true node null null null state ID: 2

  37. dequeue operation Announcing a new operation 12 0 -1 4 1 -1 17 0 -1 head tail phase 9 4 10 pending false false true dequeue enqueue false true false node null null null state ID: 2

  38. dequeue operation Updating state to refer the first node 12 0 -1 4 1 -1 17 0 -1 head tail phase 9 4 10 pending false false true dequeue enqueue false true false CAS node null null state ID: 2

  39. dequeue operation Step 1: Initial change of the internal structure 12 0 2 4 1 -1 17 0 -1 CAS head tail phase 9 4 10 pending false false true dequeue enqueue false true false node null null state ID: 2

  40. dequeue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 2 4 1 -1 17 0 -1 head tail 4 CAS phase 9 10 pending false false false dequeue enqueue false true false node null null state ID: 2

  41. dequeue operation Step 3: Fixing the internal structure 12 0 2 4 1 -1 17 0 -1 head tail CAS phase 9 4 10 pending false false false dequeue enqueue false true false node null null state ID: 2

  42. Performance evaluation two 2.5 GHz quadcore Xeon E5420 processors two 1.6 GHz quadcore Xeon E5310 processors Architecture # threads 8 8 8 RAM 16GB 16GB 16GB CentOS 5.5 Server Ubuntu 8.10 Server RedHat Enterpise 5.3 Server OS Java Sun’s Java SE Runtime 1.6.0 update 22, 64-bit Server VM

  43. Benchmarks  Enqueue-Dequeue benchmark  the queue is initially empty  each thread iteratively performs enqueue and then dequeue  1,000,000 iterations per thread  50%-Enqueue benchmark  the queue is initialized with 1000 elements  each thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue  1,000,000 operations per thread

  44. Tested algorithms Compared implementations:  MS-queue  Base wait-free queue  Optimized wait-free queue  Opt 1: optimized helping (help one thread at a time)  Opt 2: atomic counter-based phase calculation  Measure completion time as a function of # threads

  45. Enqueue-Dequeue benchmark  TBD: add figures

  46. The impact of optimizations  TBD: add figures

  47. Optimizing further: false sharing  Created on accesses to state array  Resolved by stretching the state with dummy pads  TBD: add figures

  48. Optimizing further: memory management  Every attempt to update state is preceded by an allocation of a new record  these records can be reused when the attempt fails  (more) validation checks can be performed to reduce the number of failed attempts  When an operation is finished, remove the reference from state to a list node  help garbage collector

  49. Implementing the queue without GC  Apply Hazard Pointers technique [Michael’04]  each thread is associated with hazard pointers  single-writer multi-reader registers  used by threads to point on objects they may access later  when an object should be deleted, a thread stores its address in a special stack  once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it  In our case, the technique can be applied with a slight modification in the dequeue method

  50. Summary  First wait-free queue implementation supporting multiple enqueuers and dequeuers  Wait-freedom incurs an inherent trade-off  bounds the completion time of a single operation  has a cost in a “typical” case  The additional cost can be reduced and become tolerable  Proposed design scheme might be applicable for other wait-free data structures

More Related