1 / 53

“Locking without blocking” LFthreads: A lock-free thread library

“Locking without blocking” LFthreads: A lock-free thread library. Anders Gidenstam Joint work with: Marina Papatriantafilou Distributed Computing and Systems group Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden. Outline. Introduction

orly
Télécharger la présentation

“Locking without blocking” LFthreads: A lock-free thread library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Locking without blocking”LFthreads: A lock-free thread library Anders Gidenstam Joint work with: Marina Papatriantafilou Distributed Computing and Systems group Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden

  2. Outline • Introduction • Lock-free synchronization • The Problem & Background • LFthreads • Overview • Lock-free thread-blocking synchronization • Experiments • Conclusions Anders Gidenstam, Max-Planck Institute for Computer Science

  3. Synchronization on a shared object • Lock-free synchronization • Concurrent operations without enforcing mutual exclusion • Avoids: • Blocking (or busy waiting), convoy effects and priority inversion • Progress Guarantee • At least one operation always makes progress • Synchronization primitives • Built into CPU and memory system • Atomic read-modify-write (i.e. a critical section of one instruction) • Examples: Compare-and-Swap, Load-Linked / Store-Conditional Anders Gidenstam, Max-Planck Institute for Computer Science

  4. O1 O2 O3 Correctness of a concurrent object • Desired semantics of a shared data object • Linearizability [Herlihy & Wing, 1990] • For each operation invocation there must be one single time instant during its duration where the operation appears to takeeffect. • The observed effectsshould be consistentwith a sequentialexecution of the operationsin that order. Anders Gidenstam, Max-Planck Institute for Computer Science

  5. O1 O2 O3 O1 O3 O2 Correctness of a concurrent object • Desired semantics of a shared data object • Linearizability [Herlihy & Wing, 1990] • For each operation invocation there must be one single time instant during its duration where the operation appears to takeeffect. • The observed effectsshould be consistentwith a sequentialexecution of the operationsin that order. Anders Gidenstam, Max-Planck Institute for Computer Science

  6. The Problem • Multithreading on a multiprocessor • Thread multiplexing and scheduling • Thread synchronization objects • Aim: The POSIX pthread API implemented in a lock-free way • Motivation: Why lock-free? • Processors should always be able to do useful work • In spite of others being slow: page faults, interrupts, h/w problems • Improved performance? Running threads Ready Queue Processors T2 T1 P1 T3 Waiting runnablethreads P2 T5 P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science

  7. The Problem(s) • Contention on the ready queue • Non-blocking work-stealing; Hood,[Blumhofe et al, 1994],[Blumhofe et al, 1998] • Lesser Bear,[Oguma et al, 2001] • In LFthreads: a lock-free queue Ready Queue Processors T2 T1 P1 Dequeue Dequeue P2 Enqueue P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science

  8. The Problem(s) • Blocking synchronization • Often undesirable • Hinders progress • Expensive context switches • Sometimes required • Waiting for some event • Legacy applications • Goal in LFthreads: • Lock-free for processors • Resilience to slow operations/processors Processors P1 T3 T5 P2 Block T5 P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science

  9. The Problem(s) • Implementing blocking synchronization • Keep common case fast • Often no contention • User / kernel level split implementation • E.g. Linux, Solaris • Critical sections are short • Spinning v.s. blocking[Zahorjan et al, 1991] Processors P1 T3 P2 T5 Acquire P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science

  10. The Problem(s) • Implementing blocking synchronization • Keep common case fast • Often no contention • User / kernel level split implementation • E.g. Linux, Solaris • Critical sections are short • Spinning v.s. blocking[Zahorjan et al, 1991] • Enlisting the scheduler • [Devi et al, 2006] • [Kontothanassis et al, 1997] Processors P1 T3 T2 P2 T5 Owner P3 T4 Block Anders Gidenstam, Max-Planck Institute for Computer Science

  11. The Problem(s) • Implementing blocking synchronization • There can be concurrent operations – synchronization needed • Mutual exclusion? • A slow processor could force others to spin • Slow: e.g. page faults, interrupts, h/w problems Kernel partial solution: • spin lock + disable IRQ Processors P1 T3 T5 P2 Blocking Blocking P3 T4 Anders Gidenstam, Max-Planck Institute for Computer Science

  12. More Related Work • Lock-free threading • Hood. R. Blumofe, D. Papadopoulos, 1998. • Lesser Bear. H. Oguma, Y. Nakayama, 2001. • Lock-free operating systems kernels • Synthesis. H. Massalin, 1992. • Cache Kernel. M. Greenwald, D. Cheriton, 1999. • A. Gavare, P. Tsigas, 2005. Anders Gidenstam, Max-Planck Institute for Computer Science

  13. LFthreads – system overview Processors Ready Queue Running threads T2 T1 P1 T3 Waiting runnable threads P2 T5 T6 T4 P3 Synchronization object w. blocked threads Anders Gidenstam, Max-Planck Institute for Computer Science

  14. Thread synchronization objects • Mutual exclusion object – mutex • Two states: unlocked / locked • Three operations for threads • lock(m) - Locks m. If m is already locked the thread is blocked. • trylock(m) - Tries to lock m. Returns false if m is already locked. • unlock(m) - Unlocks m. Anders Gidenstam, Max-Planck Institute for Computer Science

  15. Mutex implementation • “Typical” mutex implementation • Note: The spin-lock protects the mutex_t record from concurrent updates by different processors • Operations cannot overlap • Processors might be forced to spin waiting for others type mutex_t isrecord state : enum (UNLOCKED, LOCKED); waiting : Queue of thread_t; slock : spin_lock_t; Anders Gidenstam, Max-Planck Institute for Computer Science

  16. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-freeQueue of thread_t;hand-off : integer; lock(M) TA count 0 TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science

  17. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; Increase Increase lock(M) TA count 1 TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science

  18. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; Increase Increase lock(M) TA count 2 lock(M) TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science

  19. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; Increase Enqueue lock(M) TA count 2 TX lock(M) TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science

  20. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 2 TX lock(M) TX 0 Hand-off TY Anders Gidenstam, Max-Planck Institute for Computer Science

  21. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX 0 Hand-off TY Decrease Decrease Anders Gidenstam, Max-Planck Institute for Computer Science

  22. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Dequeue Enqueue on ready queue Dequeue and Enqueue on ready queue Anders Gidenstam, Max-Planck Institute for Computer Science

  23. Lock-free mutex implementationThe Hand-off method type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Dequeue and Enqueue on ready queue Dequeue Enqueue on ready queue Done! Anders Gidenstam, Max-Planck Institute for Computer Science

  24. Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TY Decrease Decrease Anders Gidenstam, Max-Planck Institute for Computer Science

  25. Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TY Decrease Uhu?! Empty? Anders Gidenstam, Max-Planck Institute for Computer Science

  26. Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX TA Hand-off TY Set hand-off flag Decrease Anders Gidenstam, Max-Planck Institute for Computer Science

  27. Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Anders Gidenstam, Max-Planck Institute for Computer Science

  28. Lock-free mutex implementation:Slow lock (i) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Done! Anders Gidenstam, Max-Planck Institute for Computer Science

  29. Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Anders Gidenstam, Max-Planck Institute for Computer Science

  30. Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX TA Hand-off TY Set hand-off flag Still empty? Erase if unchanged Anders Gidenstam, Max-Planck Institute for Computer Science

  31. Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Set hand-off flag Still empty? Erase if unchanged Dequeue and Enqueue on ready queue Dequeue Enqueue on ready queue Anders Gidenstam, Max-Planck Institute for Computer Science

  32. Lock-free mutex implementation:Slow lock (ii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 lock(M) TX 0 Hand-off TX TY Set hand-off flag Still empty? Erase if unchanged Dequeue and Enqueue on ready queue Dequeue Enqueue on ready queue Done! Anders Gidenstam, Max-Planck Institute for Computer Science

  33. Lock-free mutex implementation:Slow lock (iii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX Z Hand-off TY Set hand-off flag Still empty? Erase if unchanged Anders Gidenstam, Max-Planck Institute for Computer Science

  34. Lock-free mutex implementation:Slow lock (iii) type mutex_t isrecord count : integer; waiting : Lock-free Queue of thread_t; hand-off : integer; lock(M) unlock(M) TA count 1 TX lock(M) TX Z Hand-off TY Set hand-off flag Still empty? Erase if unchanged Done! Anders Gidenstam, Max-Planck Institute for Computer Science

  35. Proof-of-concept implementation • LFthreads user-level thread library • Linux/IA32 platform • “Cloned” processes as virtual CPUs • Share the sameaddress space,file descriptor table etc • User-level thread contextsbased on • SUSv2 setcontext(2) / getcontext(2) • Non-preemptive scheduling (so far) • POSIX compliance a goal Virtual CPUs CPUs Threads VP1 T3 P1 Ready Queue VP2 T5 T2 T1 VP3 P2 VP4 T4 Kernel space User space Anders Gidenstam, Max-Planck Institute for Computer Science

  36. Experimental evaluation • Micro benchmark • Threads competing for a critical section • High contention • Work: 1 -1 • Low contention • Work: 1 - 1000 • Configurations • LFthreads with 1, 2, 4, 8,16 virtual CPUs • lock-free mutex • spin-lock-based mutex • Platforms standard pthreads (2.6.x kernel) • 2x dual-core AMD Opteron processors Anders Gidenstam, Max-Planck Institute for Computer Science

  37. Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science

  38. Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science

  39. Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science

  40. Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science

  41. Experimental evaluation (i) Anders Gidenstam, Max-Planck Institute for Computer Science

  42. Experimental evaluation (i) Why? Anders Gidenstam, Max-Planck Institute for Computer Science

  43. Experimental evaluation (i) Why? Remember spinning v.s. blocking? Anders Gidenstam, Max-Planck Institute for Computer Science

  44. Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science

  45. Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science

  46. Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science

  47. Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science

  48. Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science

  49. Experimental evaluation (ii) Anders Gidenstam, Max-Planck Institute for Computer Science

  50. Open Problems • Condition variables • Seem to need atomic move thread operation. • Other synchronization objects • Semaphores, barriers, readers-writers locks, … • Signals • Support required by POSIX • Can interrupt blocking • Must be able to remove any thread control block in e.g. a waiting queue. • Thread cancellation • Blocking IO Anders Gidenstam, Max-Planck Institute for Computer Science

More Related