530 likes | 681 Vues
This presentation introduces LFthreads, a lock-free thread library designed to enhance multithreading performance in multiprocessor environments. It discusses lock-free synchronization strategies to prevent blocking issues such as convoy effects and priority inversion. The library aims to ensure constant progress for processor threads, even under adverse conditions. Key concepts include the correctness of concurrent objects, linearizability, and efficient thread synchronization primitives. The results of extensive experiments are shared, illustrating the benefits and effectiveness of LFthreads against traditional blocking mechanisms.
E N D
“Locking without blocking”LFthreads: A lock-free thread library Anders Gidenstam Joint work with: Marina Papatriantafilou Distributed Computing and Systems group Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden
Outline • Introduction • Lock-free synchronization • The Problem & Background • LFthreads • Overview • Lock-free thread-blocking synchronization • Experiments • Conclusions Anders Gidenstam, Max-Planck Institute for Computer Science
Synchronization on a shared object • Lock-free synchronization • Concurrent operations without enforcing mutual exclusion • Avoids: • Blocking (or busy waiting), convoy effects and priority inversion • Progress Guarantee • At least one operation always makes progress • Synchronization primitives • Built into CPU and memory system • Atomic read-modify-write (i.e. a critical section of one instruction) • Examples: Compare-and-Swap, Load-Linked / Store-Conditional Anders Gidenstam, Max-Planck Institute for Computer Science
O1 O2 O3 Correctness of a concurrent object • Desired semantics of a shared data object • Linearizability [Herlihy & Wing, 1990] • For each operation invocation there must be one single time instant during its duration where the operation appears to takeeffect. • The observed effectsshould be consistentwith a sequentialexecution of the operationsin that order. Anders Gidenstam, Max-Planck Institute for Computer Science
O1 O2 O3 O1 O3 O2 Correctness of a concurrent object • Desired semantics of a shared data object • Linearizability [Herlihy & Wing, 1990] • For each operation invocation there must be one single time instant during its duration where the operation appears to takeeffect. • The observed effectsshould be consistentwith a sequentialexecution of the operationsin that order. Anders Gidenstam, Max-Planck Institute for Computer Science
The Problem • Multithreading on a multiprocessor • Thread multiplexing and scheduling • Thread synchronization objects • Aim: The POSIX pthread API implemented in a lock-free way • Motivation: Why lock-free? • Processors should always be able to do useful work • In spite of others being slow: page faults, interrupts, h/w problems • Improved performance? Running threads Ready Queue Processors T2 T1 P1 T3 Waiting runnablethreads P2 T5 P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science
The Problem(s) • Contention on the ready queue • Non-blocking work-stealing; Hood,[Blumhofe et al, 1994],[Blumhofe et al, 1998] • Lesser Bear,[Oguma et al, 2001] • In LFthreads: a lock-free queue Ready Queue Processors T2 T1 P1 Dequeue Dequeue P2 Enqueue P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science
The Problem(s) • Blocking synchronization • Often undesirable • Hinders progress • Expensive context switches • Sometimes required • Waiting for some event • Legacy applications • Goal in LFthreads: • Lock-free for processors • Resilience to slow operations/processors Processors P1 T3 T5 P2 Block T5 P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science
The Problem(s) • Implementing blocking synchronization • Keep common case fast • Often no contention • User / kernel level split implementation • E.g. Linux, Solaris • Critical sections are short • Spinning v.s. blocking[Zahorjan et al, 1991] Processors P1 T3 P2 T5 Acquire P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science
The Problem(s) • Implementing blocking synchronization • Keep common case fast • Often no contention • User / kernel level split implementation • E.g. Linux, Solaris • Critical sections are short • Spinning v.s. blocking[Zahorjan et al, 1991] • Enlisting the scheduler • [Devi et al, 2006] • [Kontothanassis et al, 1997] Processors P1 T3 T2 P2 T5 Owner P3 T4 Block Anders Gidenstam, Max-Planck Institute for Computer Science
The Problem(s) • Implementing blocking synchronization • There can be concurrent operations – synchronization needed • Mutual exclusion? • A slow processor could force others to spin • Slow: e.g. page faults, interrupts, h/w problems Kernel partial solution: • spin lock + disable IRQ Processors P1 T3 T5 P2 Blocking Blocking P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science
More Related Work • Lock-free threading • Hood. R. Blumofe, D. Papadopoulos, 1998. • Lesser Bear. H. Oguma, Y. Nakayama, 2001. • Lock-free operating systems kernels • Synthesis. H. Massalin, 1992. • Cache Kernel. M. Greenwald, D. Cheriton, 1999. • A. Gavare, P. Tsigas, 2005. Anders Gidenstam, Max-Planck Institute for Computer Science
LFthreads – system overview Processors Ready Queue Running threads T2 T1 P1 T3 Waiting runnable threads P2 T5 T6 T4 P3 Synchronization object w. blocked threads Anders Gidenstam, Max-Planck Institute for Computer Science
Thread synchronization objects • Mutual exclusion object – mutex • Two states: unlocked / locked • Three operations for threads • lock(m) - Locks m. If m is already locked the thread is blocked. • trylock(m) - Tries to lock m. Returns false if m is already locked. • unlock(m) - Unlocks m. Anders Gidenstam, Max-Planck Institute for Computer Science
Mutex implementation • “Typical” mutex implementation • Note: The spin-lock protects the mutex_t record from concurrent updates by different processors • Operations cannot overlap • Processors might be forced to spin waiting for others type mutex_t isrecord state : enum (UNLOCKED, LOCKED); waiting : Queue of thread_t; slock : spin_lock_t; Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-freeQueue of thread_t;hand-off : integer; lock(M) TA count 0 TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; Increase Increase lock(M) TA count 1 TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; Increase Increase lock(M) TA count 2 lock(M) TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; Increase Enqueue lock(M) TA count 2 TX lock(M) TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 2 TX lock(M) TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX 0 Hand-off TY Decrease Decrease Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Dequeue Enqueue on ready queue Dequeue and Enqueue on ready queue Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Dequeue and Enqueue on ready queue Dequeue Enqueue on ready queue Done! Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TY Decrease Decrease Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TY Decrease Uhu?! Empty? Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX TA Hand-off TY Set hand-off flag Decrease Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Done! Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Erase if unchanged Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Set hand-off flag Still empty? Erase if unchanged Dequeue and Enqueue on ready queue Dequeue Enqueue on ready queue Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Set hand-off flag Still empty? Erase if unchanged Dequeue and Enqueue on ready queue Dequeue Enqueue on ready queue Done! Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (iii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX Z Hand-off TY Set hand-off flag Still empty? Erase if unchanged Anders Gidenstam, Max-Planck Institute for Computer Science
Lock-free mutex implementation:Slow lock (iii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX Z Hand-off TY Set hand-off flag Still empty? Erase if unchanged Done! Anders Gidenstam, Max-Planck Institute for Computer Science
Proof-of-concept implementation • LFthreads user-level thread library • Linux/IA32 platform • “Cloned” processes as virtual CPUs • Share the sameaddress space,file descriptor table etc • User-level thread contextsbased on • SUSv2 setcontext(2) / getcontext(2) • Non-preemptive scheduling (so far) • POSIX compliance a goal Virtual CPUs CPUs Threads VP1 T3 P1 Ready Queue VP2 T5 T2 T1 VP3 P2 VP4 T4 Kernel space User space Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation • Micro benchmark • Threads competing for a critical section • High contention • Work: 1 -1 • Low contention • Work: 1 - 1000 • Configurations • LFthreads with 1, 2, 4, 8,16 virtual CPUs • lock-free mutex • spin-lock-based mutex • Platforms standard pthreads (2.6.x kernel) • 2x dual-core AMD Opteron processors Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (i) Why? Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (i) Why? Remember spinning v.s. blocking? Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science
Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science
Open Problems • Condition variables • Seem to need atomic move thread operation. • Other synchronization objects • Semaphores, barriers, readers-writers locks, … • Signals • Support required by POSIX • Can interrupt blocking • Must be able to remove any thread control block in e.g. a waiting queue. • Thread cancellation • Blocking IO Anders Gidenstam, Max-Planck Institute for Computer Science