Progress with Progress Guarantees

Progress with Progress Guarantees Erez Petrank - Technion Based on joint work with Anastasia Braginsky, Alex Kogan, MadanlalMusuvathi, FilipPizlo, and BjarneSteensgaard.

Responsive Parallelism • Develop responsive parallel systems by developing stronger algorithms and better system support. • Main tool: progress guarantees. • Guaranteed responsiveness, good scalability, avoidingdeadlocks and livelocks. • Particularly important in several domains • real-time systems • Interactive systems (also OS’s) • operating under service level agreement. • Always good to have • If performance is not harmed too much.

A Progress Guarantee • Intuitively: “ No matter which interleaving is scheduled, my program will make progress. ” • “Progress” is something the developer defines. • Specific lines of code

Progress Guarantees Great guarantee, but difficult to achieve. • Wait-Freedom • If you schedule enough steps of any thread, it will make progress. • Lock-Freedom • If you schedule enough steps across all threads, one of them will make progress. The middle way ? • Obstruction-Freedom • If you let any thread run alone for enough steps, it will make progress. Somewhat weak.

This Talk • Advances with progress guarantees. • Some of it is work-in-progress. • Attempts for lock-free garbage collection [PLDI 08, ISMM 07] • Bounded lock-freedom and lock-free system services [PLDI 09] • A lock-free locality-conscious linked-list [ICDCN 11] • A Lock-free balanced tree [submitted] • A wait-free queue [PPOPP 11] • Making wait-freedom fast [submitted]

A Lock-Free Garbage Collector • There is no such thing right now. • A recent line of work on real-time GC, allows moving objects concurrently (while the program accesses them). • See Pizlo, Petrank, and Steensgaard at [ISMM 2007, PLDI 2008]. What does it mean to “support” lock-freedom?

Services Supporting Lock-Freedom

Services Supporting Lock-Freedom • Consider system services: event counters,memory management, micro-code interpreters, caching, paging, etc. • Normally, designs of lock-free algorithms assume that the underlying system operations do not foil lock-freedom.

Can We Trust Services to Support Lock-Freedom ? • Valois’ lock-free linked-list algorithm has a well known C++ implementation, which uses the C++ new operation. • A hardware problem: lock Free algorithms typically use CAS or LL/SC, but LL/SC implementations are typically weak: spurious failures are permitted. • Background threads increase the chaos (if syncing on shared variables). Conclusion: system support matters.

In the Paper • Definition of a lock-free supporting service (including GC), • Definition of a lock-free program that uses services, • A composition theorem for the two. • Also: bounded lock-freedom Musuvathi, Petrank, and Steensgaard. Progress Guarantee via Bounded Lock-Freedom. [PLDI 2009]

Open Questions for Progress Guarantees • Some important lock-free data structures are not known • Can we build a balanced tree? • Wait-free data structures are difficult to design. • Can we design some basic ones? • Wait-free implementations are slow. • Can we make them fast?

A Locality-Conscious Linked-List [ICDCN’11]A Lock-Free B-Tree [Submitted] Anastasia Braginsky & Erez Petrank

A B-Tree • An important instance of balanced trees, suitable for file systems. Typically a large node with many children. • Fast access to the leaves in a very short traversal. • A node is split or merged when it becomes too dense or sparse. 9 52 92 3 7 9 26 31 40 52 63 77 89 92

Lock-Free Locality-Conscious Linked Lists • List of constant size ''chunks", with minimal and maximal bounds on the number of elements contained. • Each chunk consists of a small linked list. • When a chunk gets too sparse or dense, it is split or merged with its preceding chunk. • Lock-free, locality-conscious, fast access, scalable 3 7 9 12 18 25 26 31 40 52 63 77 89 92

Chunk Split or Merge • A chunk that needs splitting or merging is frozen. • No operations are applied on it anymore. • New chunks (with the updated data) replace it .

A list Structure (2 Chunks) NULL HEAD NextChunk NextChunk Chunk B Chunk A EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 14 Data: K

When No More Space for Insertion NULL HEAD Freeze NextChunk NextChunk Chunk B Chunk A EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 12 Data: H

Split NULL HEAD Freeze NextChunk NextChunk Chunk B Chunk A EntriesHead EntriesHead Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 12 Data: H NextChunk NextChunk Chunk C Chunk D EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K Key: 12 Data: H

When a Chunk Gets Sparse NULL HEAD Freeze slave NextChunk NextChunk Chunk C Chunk B EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A NextChunk Chunk D EntriesHead Key: 14 Data: K Freeze master

Merge NULL HEAD Freeze slave NextChunk NextChunk Chunk C Chunk B EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 89 Data: M Key: 67 Data: D Key: 25 Data: A NextChunk Chunk E NextChunk Chunk D EntriesHead EntriesHead Key: 3 Data: G Key: 6 Data: B Key: 14 Data: K Key: 9 Data: C Key: 14 Data: K Freeze master

Extending to a Lock-Free B-Tree • Each node holds a chunk, handling splits and merges. • Design simplified by a stringent methodology. • Monotonic life span of a node: infant, normal, frozen, & reclaimed, reduces the variety of possible states. • Care with replacement of old nodes with new ones (while data in both old and new node simultaneously). • Care with search for a neighboring node to merge with and with coordinating the merge.

Having built a lock-free B-Tree, let’s look at a … Wait-Free Queue Kogan and Petrank PPOPP’10

FIFO queues enqueue dequeue 3 2 9 A fundamental and ubiquitous data structures

Existing wait-free queues • There exist universal constructions • but are too costly in time & space. • There exist constructions with limited concurrency • [Lamport 83] one enqueuer and one dequeuer. • [David 04] multiple enqueuers, one concurrent dequeuer. • [Jayanti & Petrovic 05] multiple dequeuers, one concurrent enqueuer.

A Wait-Free Queue “Wait-Free Queues With Multiple Enqueuers and Dequeuers”. Kogan & Petrank [PPOPP 2011] • First wait-free queue (which is dynamic, parallel, and based on CASes). • We extend the lock-free design of Michael & Scott. • part of Java Concurrency package

Building a Wait-Free Data Structure The first makes it inefficient. The second makes it hard to design. • Universal construction skeleton: • Publish an operation before starting execution. • Next, help delayed threads and then start executing operation. • Eventually a stuck thread succeeds because all threads help it. • Solve for each specific data structures: • The interaction between threads running the same operation. • In particular, apply each operation exactly once and obtain a consistent return code.

Results • The naïve wait-free queue is 3x slower than the lock-free one. • Optimizations can reduce the ratio to 2x. Can we eliminate the overhead for the wait-free guarantee?

Wait-free algorithms are typically slower than lock-free (but guarantee more). Can we eliminate the overhead completely ? Kogan and Petrank [submitted]

Reducing the Help Overhead • Standard method: Help all previously registered operations before executing own operation. • But this goes over all threads. • Alternative: help only one thread (cyclically). • This is still wait-free ! An operation can only be interrupted a limited number of times. • Trade-off between efficiency and guarantee. But this is still costly !

Why is Wait-Free Slow ? • Because it needs to spend a lot of time on helping others. • Main idea: typically help is not needed. • Ask for help when you need it; • Provide help infrequently. • Teacher: why are you late for school again? • Boy: I helped an old lady cross the street. • Teacher: why did it take so long? • Boy: because she didn’t want to cross!

Fast-Path Slow-Path • When executing an operation, start by running the fast lock-free implementation. • Upon several failures “switch” to the wait-free implementation. • Ask for help, • Keep trying. • Once in a while, threads on the fast path check if help is needed and provide help. Fast path Slow path The ability to switch between modes

Do I need to help ? yes Start Help Someone Apply my op fast path (at most n times) no Success? Apply my op using slow path no yes Return

Results for the Queue • There is no observable overhead ! • Implication: it is possible to avoid starvation at no additional cost. • If you can create a fast- and slow- path and can switch between them.

Conclusion • Some advances on lock-free garbage collection. • Lock-free services, bounded lock-freedom. • A few new important data structures with progress guarantees: • Lock-free chunked linked-list • Lock-free B-Tree • Wait-free Queue • We have proposed a methodology to make wait-freedom as fast as lock-freedom.

Progress with Progress Guarantees

Progress with Progress Guarantees

Presentation Transcript

Progress with Personalisation

PROGRESS

Progress

Progress

Progress with POPS

Progress

Progress

Progress with beam

progress

Progress

Progress

Progress with Carrot

PROGRESS

Progress with Geant4

PROGRESS

Progress

Progress?

Progress

Progress