Wait-Free Linked-Lists

Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 4 6 9 -∞ +∞

Our Contribution • A fast, wait-free linked-list • The first wait-free list fast enough to be used in practice

Agenda • What is a wait-free linked-list? • Related work and existing tools • Wait-Free Linked-List design • Performance

Concurrent Data Structures • Allow several threads to read or modify the data-structure simultaneously • Increasing demands due to highly-parallel systems

Progress Guarantees • Obstruction Free – A thread running exclusively will make a progress • Lock Free – At least one of the running threads will make a progress • Wait Free – every thread that gets the CPU will make a progress.

Wait Free Algorithms • Provides the strongest progress guarantee • Always desirable, particularly in real-time systems. • Relatively rare • Hard to design • Typically slower

The Linked List Interface • Following the traditional choice; a sorted list-based set of integers insert(intx); delete(intx); contains(intx); 4 6 9 -∞ +∞

Prior Wait-Free Lists • Only Universal Constructions • Non-scalable (by nature ?) • Achieve good complexity, but poor performance • State-of-the-art construction (Chuong, Ellen, Ramachandran) significantlyunder-perform our construction.

Our wait-free versus a universal construction

Linked-Lists with Progress Guarantee • No practical wait-free linked-lists available • Lock-free linked-lists exists • Most notably: Harris’s linked-list

4 6 9 4 6 9 Existing Lock-Free List(by Harris) • Deletion in two steps • Logical: Mark the next field using a CAS • Physical: Remove the node

4 6 9 4 6 9 Existing Lock-Free List(by Harris) • Use the least significant bit in each next field, as a mark bit • The mark bit signals that a node is logically deleted • The Node’s next field cannot be changed (the CAS will fail) if it is logically deleted

Help Mechanism • A common technique to achieve wait-freedom • Each thread declares in a designated state array the operation it desires • Many threads may attempt to execute it

Help Mechanism - Difficulties • Multiple threads should be able to work concurrently on the same operation • Many potential races • Difficult to design • Usually slower

Complication: Deletion Owning • T1, T2 both attempt delete(6) 4 6 9 -∞ +∞

Complication: Deletion Owning • T1, T2 both attempt delete(6) • T1, T2 both declare in the state array 4 6 9 -∞ +∞

Complication: Deletion Owning • T1, T2 both attempt delete(6) • T1, T2 both declare in the state array • T3 sees T1 declaration and tries to help it, while T4 helps T2 4 6 9 -∞ +∞

Complication: Deletion Owning • If both helpers T3, T4 “go to sleep” after the mark was done, which thread (T1 or T2) should return true and which false? 4 6 9 -∞ +∞

"Solution: use a “success bit • Each node holds an extra “success bit” (initially 0) • Potential owners compete to CAS it to 1 (no help in this part) • Note the node is deleted before it is decided which thread owns its deletion

Helping an Insert Operation • Search • Direct • Insert • Report

4 6 9 7 Helping an Insert Operation • Search • Direct • Insert • Report Status: PendingOperation: Insert New node:

4 6 9 7 Helping an Insert Operation • Search • Direct • Insert • Report Status: PendingOperation: Insert New node: CAS

4 6 9 7 Helping an Insert Operation • Search • Direct • Insert • Report Status: PendingOperation: Insert New node: Status: SuccessOperation: Insert New node: CAS

4 6 9 7 Incorrect Result Returnedconsider 2 threads helping insert(7) T2 { found(6,7) CAS(state[tid],s,failure) } T1 { found (6,9)node.next = &9 inserts new node. CAS(state[tid],s,success) }

4 6 9 7 Incorrect Result Returnedconsider 2 threads helping insert(7) T2 { found(6,7) CAS(state[tid],s,failure) } T1 { found (6,9)node.next = &9 inserts new node CAS(state[tid],s,success) }

4 6 9 7 Incorrect Result Returnedconsider 2 threads helping insert(7) T2 { found(6,7) CAS(state[tid],s,failure) } T1 { found (6,9)node.next = &9 inserts new node. CAS(state[tid],s,success) }

4 6 9 7 Incorrect Result Returnedconsider 2 threads helping insert(7) T2 { found(6,7)CAS(state[tid],s,failure) } T1 { found (6,9)node.next = &9 inserts new node. CAS(state[tid],s,success) }

4 6 9 7 Incorrect Result Returnedconsider 2 threads helping insert(7) T2 { found(6,7)CAS(state[tid],s,failure) } T1 { found (6,9)node.next = &9 inserts new node CAS(state[tid],s,success) }

4 6 9 7 Incorrect Result Returned 2 T2 { found(6,7) CAS(->failure} T1 { found (6,9)node.next = &9 inserts new node CAS(->success) } T3 { Delete(7) Insert(7) }

4 6 7 9 Incorrect Result Returned 2 T2 { found(6,7) CAS(->failure} T1 { found (6,9)node.next = &9 inserts new node CAS(->success) } T3 { Delete(7) Insert(7) }

7’ 4 6 7 9 Incorrect Result Returned 2 T2 { found(6,7) CAS(->failure} T1 { found (6,9)node.next = &9 inserts new node CAS(->success) } T3 { Delete(7) Insert(7) }

7’ 4 6 7 9 Incorrect Result Returned 2 T2 { found(6,7)CAS(->failure} T1 { found (6,9)node.next = &9 inserts new node CAS(->success) } T3 { Delete(7) Insert(7) }

4 6 9 7 Ill-timed Directconsider 2 threads helping insert(7) T2 { found (6,9)node.next = &9 inserts the new node CAS(->success) . ..Insert(8) (after 7) } T1 { found (6,9) node.next = &9}

4 6 9 7 Ill-timed Directconsider 2 threads helping insert(7) T2 { found (6,9)node.next = &9inserts the new node CAS(->success) . ..Insert(8) (after 7) } T1 { found (6,9) node.next = &9}

4 6 9 7 Ill-timed Directconsider 2 threads helping insert(7) T2 { found (6,9)node.next = &9inserts the new nodeCAS(->success) . ..Insert(8) (after 7) } T1 { found (6,9) node.next = &9}

4 6 8 7 9 Ill-timed Directconsider 2 threads helping insert(7) T2 { found (6,9)node.next = &9inserts the new nodeCAS(->success) ...Insert(8) (after 7) } T1 { found (6,9) node.next = &9}

More Races Exist • Additional races were handled in both the delete and insert operations • We constructed a formal proof for the correctness of the algorithm

Main Invariant • Each modification of a node’s next field belongs into one of four categories • Marking (change the mark bit to true) • Snipping (removing a marked node) • Redirection (of an infant node) • Insertion (a non-infant to an infant) • Proof by induction and by following the code lines

Fast-Path-Slow-Path(Kogan and Petrank, PPOPP 2012) • Each thread: • Tries to complete the operation without help • Asks For help Only if it failed due to contention • (Almost) as fast as the lock-free  • Gives the stronger wait-free guarantee 

Wait-Free Linked-Lists

Wait-Free Linked-Lists

Presentation Transcript

Linked Lists

Linked Lists

Linked Lists

Linked lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists II Doubly Linked Lists

Linked Lists

Linked lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists

Linked Lists