Wait-Free Queues with Multiple Enqueuers and Dequeuers in Computer Science

Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex Kogan Computer Science, Technion, Israel Erez Petrank

FIFO queues  One of the most fundamental and common data structures enqueue dequeue 5 3 2 9

Concurrent FIFO queues  Concurrent implementation supports “correct” concurrent adding and removing elements  correct = linearizable enqueue dequeue 3 2 9 dequeue dequeue empty! dequeue  The access to the shared memory should be synchronized

Non-blocking synchronization  No thread is blocked in waiting for another thread to complete  e.g., no locks / critical sections  Progress guarantees:  Obstruction-freedom  progress is guaranteed only in the eventual absence of interference  Lock-freedom  among all threads trying to apply an operation, one will succeed  Wait-freedom  a thread completes its operation in a bounded number of steps

Lock-freedom  Among all threads trying to apply an operation, one will succeed  opportunistic approach  make attempts until succeeding global progress all but one threads may starve  Many efficient and scalable lock-free queue implementations

Wait-freedom  A thread completes its operation in a bounded number of steps  regardless of what other threads are doing  A highly desired property of any concurrent data structure  but, commonly regarded as inefficient and too costly to achieve  Particularly important in several domains  real-time systems  operating under SLA  heterogeneous environments

Related work: existing wait-free queues  Limited concurrency  one enqueuer and one dequeuer  multiple enqueuers, one concurrent dequeuer  multiple dequeuers, one concurrent enqueuer [Lamport’83] [David’04] [Jayanti&Petrovic’05]  Universal constructions  generic method to transform any (sequential) object into lock- free/wait-free concurrent object  expensive impractical implementations [Herlihy’91]  (Almost) no experimental results

Related work: lock-free queue [Michael & Scott’96]  One of the most scalable and efficient lock-free implementations  Widely adopted by industry  part of Java Concurrency package  Relatively simple and intuitive implementation  Based on singly-linked list of nodes 12 4 17 head tail

MS-queue brief review: enqueue CAS 12 17 9 4 head CAS tail enqueue 9

MS-queue brief review: enqueue CAS 12 17 9 5 4 CAS head tail CAS enqueue enqueue 9 5

MS-queue brief review: dequeue 12 17 9 4 12 head tail CAS dequeue

Our idea (in a nutshell)  Based on the lock-free queue by Michael & Scott  Helping mechanism  each operation is applied in a bounded time  “Wait-free” implementation scheme  each operation is applied exactly once

Helping mechanism  Each operation is assigned a dynamic age-based priority  inspired by the Doorway mechanism used in Bakery mutex Each thread accessing a queue  chooses a monotonically increasing phase number  writes down its phase and operation info in a special state array  helps all threads with a non-larger phase to apply their operations phase: long pending: boolean enqueue: boolean node: Node state entry per thread

Helping mechanism in action phase 4 9 9 3 pending true false true false enqueue true true true true node ref null ref ref

Helping mechanism in action 10 true phase 4 9 9 pending true false true enqueue true true true true node ref null ref ref I need to help!

Helping mechanism in action 10 true phase 4 9 9 pending true false true enqueue true true true true node ref null ref ref I do not need to help!

Helping mechanism in action 10 true phase 4 9 11 pending true false true enqueue true true false true node ref null null ref I need to help! I do not need to help!

Helping mechanism in action 10 true phase 4 9 11 pending true false true enqueue true true false true node ref null null ref  The number of operations that may linearize before any given operation is bounded  hence, wait-freedom

Optimized helping  The basic scheme has two drawbacks:  the number of steps executed by each thread on every operation depends on n (the number of threads)  even when there is no contention  creates scenarios where many threads help same operations  e.g., when many threads access the queue concurrently  large redundant work  Optimization: help one thread at a time, in a cyclic manner  faster threads help slower peers in parallel  reduces the amount of redundant work

How to choose the phase numbers  Every time tichooses a phase number, it is greater than the number of any thread that made its choice before ti  defines a logical order on operations and provides wait- freedom 4 3 5  Like in Bakery mutex:  scan through state  calculate the maximal phase value + 1 requires O(n) steps true false true true true true ref null ref  Alternative: use an atomic counter requires O(1) steps 6!

“Wait-free” design scheme  Break each operation into three atomic steps  can be executed by different threads  cannot be interleaved Initial change of the internal structure concurrent operations realize that there is an operation-in-progress 1.  Updating the state of the operation-in-progress as being performed (linearized) 2. Fixing the internal structure finalizing the operation-in-progress 3. 

Internal structures 1 2 4 head tail phase 9 4 9 pending false false false enqueue false true true node null null null state

Internal structures these elements were enqueued by Thread 0 this element was enqueued by Thread 1 enqTid: int 1 0 1 4 1 -1 2 0 -1 holds ID of the thread that performs / has performed the insertion of the node into the queue head tail phase 9 4 9 pending false false false enqueue false true true node null null null state

Internal structures this element was dequeued by Thread 1 deqTid: int 1 0 1 4 1 -1 2 0 -1 holds ID of the thread that performs / has performed the removal of the node into the queue head tail phase 9 4 9 pending false false false enqueue false true true node null null null state

enqueue operation Creating a new node 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 9 pending false false false enqueue enqueue false true true node null null null 6 state ID: 2

enqueue operation Announcing a new operation 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 10 pending false false true enqueue enqueue false true true node null null 6 state ID: 2

enqueue operation Step 1: Initial change of the internal structure CAS 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 10 pending false false true enqueue enqueue false true true node null null 6 state ID: 2

enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail 4 CAS phase 9 10 pending false false false enqueue enqueue false true true node null null 6 state ID: 2

enqueue operation Step 3: Fixing the internal structure 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail CAS phase 9 4 10 pending false false false enqueue enqueue false true true node null null 6 state ID: 2

enqueue operation Step 1: Initial change of the internal structure 12 0 -1 4 1 -1 17 0 -1 6 2 -1 head tail phase 9 4 10 pending false false true enqueue enqueue enqueue false true true node null null 3 6 state ID: 2 ID: 0

enqueue operation Creating a new node Announcing a new operation 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail phase 11 4 10 pending true false true enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail phase 11 4 10 pending true false true enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail 4 CAS phase 11 10 pending true false false enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

enqueue operation Step 3: Fixing the internal structure 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head CAS tail phase 11 4 10 pending true false false enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

enqueue operation Step 1: Initial change of the internal structure CAS 12 0 -1 4 1 -1 17 0 -1 6 2 -1 3 0 -1 head tail phase 11 4 10 pending true false false enqueue enqueue enqueue true true true node null 3 6 state ID: 0 ID: 2

dequeue operation 12 0 -1 4 1 -1 17 0 -1 head tail phase 9 4 9 pending false false false dequeue enqueue false true true node null null null state ID: 2

dequeue operation Announcing a new operation 12 0 -1 4 1 -1 17 0 -1 head tail phase 9 4 10 pending false false true dequeue enqueue false true false node null null null state ID: 2

dequeue operation Updating state to refer the first node 12 0 -1 4 1 -1 17 0 -1 head tail phase 9 4 10 pending false false true dequeue enqueue false true false CAS node null null state ID: 2

dequeue operation Step 1: Initial change of the internal structure 12 0 2 4 1 -1 17 0 -1 CAS head tail phase 9 4 10 pending false false true dequeue enqueue false true false node null null state ID: 2

dequeue operation Step 2: Updating the state of the operation-in-progress as being performed 12 0 2 4 1 -1 17 0 -1 head tail 4 CAS phase 9 10 pending false false false dequeue enqueue false true false node null null state ID: 2

dequeue operation Step 3: Fixing the internal structure 12 0 2 4 1 -1 17 0 -1 head tail CAS phase 9 4 10 pending false false false dequeue enqueue false true false node null null state ID: 2

Performance evaluation two 2.5 GHz quadcore Xeon E5420 processors two 1.6 GHz quadcore Xeon E5310 processors Architecture # threads 8 8 8 RAM 16GB 16GB 16GB CentOS 5.5 Server Ubuntu 8.10 Server RedHat Enterpise 5.3 Server OS Java Sun’s Java SE Runtime 1.6.0 update 22, 64-bit Server VM

Benchmarks  Enqueue-Dequeue benchmark  the queue is initially empty  each thread iteratively performs enqueue and then dequeue  1,000,000 iterations per thread  50%-Enqueue benchmark  the queue is initialized with 1000 elements  each thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue  1,000,000 operations per thread

Tested algorithms Compared implementations:  MS-queue  Base wait-free queue  Optimized wait-free queue  Opt 1: optimized helping (help one thread at a time)  Opt 2: atomic counter-based phase calculation  Measure completion time as a function of # threads

Enqueue-Dequeue benchmark  TBD: add figures

The impact of optimizations  TBD: add figures

Optimizing further: false sharing  Created on accesses to state array  Resolved by stretching the state with dummy pads  TBD: add figures

Optimizing further: memory management  Every attempt to update state is preceded by an allocation of a new record  these records can be reused when the attempt fails  (more) validation checks can be performed to reduce the number of failed attempts  When an operation is finished, remove the reference from state to a list node  help garbage collector

Implementing the queue without GC  Apply Hazard Pointers technique [Michael’04]  each thread is associated with hazard pointers  single-writer multi-reader registers  used by threads to point on objects they may access later  when an object should be deleted, a thread stores its address in a special stack  once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it  In our case, the technique can be applied with a slight modification in the dequeue method

Summary  First wait-free queue implementation supporting multiple enqueuers and dequeuers  Wait-freedom incurs an inherent trade-off  bounds the completion time of a single operation  has a cost in a “typical” case  The additional cost can be reduced and become tolerable  Proposed design scheme might be applicable for other wait-free data structures

Wait-Free Queues with Multiple Enqueuers and Dequeuers in Computer Science

Wait-Free Queues with Multiple Enqueuers and Dequeuers in Computer Science

Presentation Transcript

Queueing Theory

Strength will rise as W e wait upon the Lord We will wait upon the Lord We will wait upon the Lord

Wait Training

FIFO Queues

Queues

Queues in Azure

Seminario de Queues

תרשים מצבים 1

Global Multiple Sclerosis Market Forecast

This Is Your Brain on where to recycle computer monitors

An Introduction to refurbished touch screen laptop

5 Cliches About best site to buy refurbished laptops You Should Avoid

Common Unnecessary Errors (1)

The Best Kept Secrets About best Micro SD Card deal

A Productive Rant About wireless mouse and keyboard jumping

How Much Should You Be Spending on computer desk metal?

Responsible for a nreal AR glasses Budget? 12 Top Notch Ways to Spend Your Money

2020 Gluten Free Food Market Size, Share and Trend Analysis Report to 2026

How To Get Tinder Gold For Free Android 2021

Hot Vce 5V0-42.21 Free | High-quality 5V0-42.21 Reliable Exam Questions: VMware SD-WAN Design and Deploy Skills

EC-COUNCIL 312-50v11 Desktop Practice Exam Dumps

SOA-C02 Prüfungsguide: AWS Certified SysOps Administrator - Associate (SOA-C02) & SOA-C02 echter Test & SOA-C02 sicherli