Scalable Composite Abortable Locks for Multiprocessor Synchronization

Composite Abortable Locks (CALs) Virendra J Marathe University of Rochester Mark Moir, Nir Shavit Sun Microsystems Labs

Mutual Exclusion Locks • Used in Multiprocessor synchronization • Lock owner gets to execute the critical section • Enforces waiting; leads to • Waiting for arbitrary time (real-time systems) • Deadlocks

Timeout Capability – Try locks • Timeout and retry • Obeys real-time constraints • Helps avoid deadlocks (common solution in databases) • CAL is a try-lock

Talk Outline • Motivation • Overview of CALs • A CLH-based CAL • Experimental Results • Conclusion

Motivation: Issues in Try-lock Implementations • Test-and-Set based Backoff locks (BLs) • Fast; easy to implement timeout capability • Don’t scale • Queue based locks (QLs) • Scale very well • Not trivial to implement timeouts • Need separate memory management • High-to-unbounded worst case space overheads

Our Approach • Combine the best features of BLs and QLs • BLs: Low space overhead and no memory management • QLs: Scalability • Key Insights • Need only front part of a QL for scaling • Distribute BL across multiple locations • Key Difficulty • The two algorithms are quite different in structure

Overview of CAL • Fixed-sized Backoff array • Nodes allotted to threads from this array • Threads insert nodes into wait-queue Logical Queue Ptr 1 2 3 4 Backoff Array Tail

Lock Acquisition • Node States: Free (F), Waiting (W), Released (R), Aborted (A) • Node Acquisition (F  W) • Thread selects a node (maybe randomly) • Acquire exclusive ownership using a BL • Enqueue and Wait as per queue mechanism (e.g. CLH, MCS)

Node Release • Write R into node to release lock (W  R) • Write A on timeout (W  A) • Someone else will “reclaim” the node (R  F or A  F) • Write F on timeout (W  F) only if the node was not enqueued

CLH Queue Lock • Node States: Free (F), Waiting (W), Released (R) Lock Owner Ptr W W W R F Dummy CAS Spin on previous Tail W New Thrd

CLH-based CAL Lock Owner CAS Timeout Ver# Ptr F W F A W F W W R F Backoff CAS Tail Thrd B Thrd A Acquire Release Aborts Cleanup

Low Contention Case Performance Problem • Need 2 CASes in the critical path • Acquire Node • Enqueue in wait-queue

Optimization • Eliminate extra CAS in no contention case • Use lsb of Tail to indicate if lock acquired • If not, flip the bit • Otherwise follow the CLH-based CAL algorithm • Switch back to no contention case • If the lock releaser is the Tail, flip (using CAS) Tail to uncontended mode

Experimentation • Aimed at • Scalability, Preemption tolerance, Degradation • Framework • Backoff array of size 4 • 30-processor Sun Enterprise 6000 cache coherent machine with 366MHz UltraSPARC II processors • Critical:Non-critical work  300:300 (nanosecs) • Implementation in C, with Sun’s cc –O5 • Comparison with Scott’s state-of-the-art nonblocking CLH try lock

Scalability

Effect of Preemption

Degradation with lower Timeouts (30 threads)

Conclusion • Introduced the new CAL approach to implement try-locks • Scalable • Constant Space Overhead • No extra memory management needed • Preemption tolerant • No fairness guarantees

Future Work • Scalability: Stress test for huge number of threads • Address fairness problem • Other variants of CALs (e.g. MCS-lock)

Thanks!

Scalable Composite Abortable Locks for Multiprocessor Synchronization

Scalable Composite Abortable Locks for Multiprocessor Synchronization

Presentation Transcript

Driftless Area Restoration Effort Economic Impact Study

Undergraduate Program Agronomy

CALS 8/22/13

CALS Sandpoint Greenhouse

Learn about CALS student organizations Network with alumni Take your picture with Albert

CALS Effort Coordination

A Food Systems Spire of Excellence at the University of Vermont

CALS Green

CALS Budget Overview

Carol Jurgens Nebraska Department of Education 2004

Research Programs In the College of Agriculture and Life Sciences