Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Micha

Why The Grass May Not Be Greener On The Other Side:A Comparison of Locking vs. Transactional MemoryBy McKenney, Michael, Triplett and Walpole

Agenda • Locking Critique • TM Critique • Need for a combined approach

Locking Simple approach based on mutual exclusion: Allow only a single CPU at a time to manipulate a given set of shared objects Lock granularity determines scalability Partition the shared data and protect each partition with separate a lock This allows greater concurrency but also creates problems

Locking Strengths • Can be used on existing commodity hardware • Standardized well-defined locking APIs e.g. the POSIX pthread API allows lock-based code to run on multiple platforms. • Contention effects are concentrated within locking primitives, allowing critical sections to run at full speed. • Locking can protect a wide range of operations, including non-idempotent operations such as I/O • Waiting on a lock minimally degrades performance of the rest of the system. • Interacts naturally with a variety of synchronization mechanisms, including reference counting, atomic operations, non-blocking synchronization, RCU • Interacts in a natural manner with debuggers

Locking Weaknesses Since lock granularity determines scalability, partition the shared data and protect each partition with separate lock. While this increases concurrency, it also creates problems: • Loss of modularity: need to know what locks other modules use before calling them in order to avoid self-deadlock • Multiple threads may need to acquire the same set of locks. Acquiring these in different orders can cause deadlock • Self-deadlock can result if interrupt is received while a lock is held by a thread and the interrupt handler also needs that lock • Lack of composibility: operations may be thread safe individually, but not composed together. E.g. delete item from one hashtable and insert into another. Intermediate state (item is in neither hashtable) is visible Some data structures such as unstructured graphs are difficult to partition. May have to settle for coarser locks which leading to high contention and reduced scalability

Locking Weaknesses • Priority inversion can cause a high-priority thread to miss its real-time scheduling deadline, which is unacceptable in safety-critical systems. • Non-deterministic lock acquisition latency is a problem for real-time workloads. • Locking uses expensive instructions and creates high synchronization overhead even at low levels of contention. Worse with fine grained locking. • Locking introduces communication related cache misses into read mostly workloads which would otherwise run entirely within the cpu cache. • Indefinite blocking: due to termination of the lock holder. Creates problems for fault tolerant software. • Convoying: Preemption or blocking (due to I/O, page fault etc.) of the lock holder can block other threads.

Solutions to Locking Problems Priority inversion • Lower priority threads temporarily inherits priority of high priority blocked thread • Lock holder is assigned priority of the highest priority task that might acquire that lock • Preemption is disabled entirely while locks are held Deadlock • Require a clear locking hierarchy; when multiple locks are acquired they are acquired in a pre-specified order • If lock not available, thread surrenders conflicting locks and retries • Detect deadlock; break cycle by terminating selected threads based upon priority, work done • Track lock acquisition, dynamically detect potential deadlock and prevent Self-deadlock • Disable interrupts

Solutions to Locking Problems Non-partitionable data structures • Redesign to use partition-able data structures such as hash tables etc. • In read mostly situations, locked updates may be paired with read-copy-update (RCU) or hazard pointers Convoying • Use scheduler-conscious synchronization • But this does not help the case of the lock holder terminating Non-deterministic lock acquisition latency • Use RCU for read side critical sections

Transactional Memory Approach borrowed from DBMS: • A programmer delimits the regions of code that access shared data • TM system executes these regions atomically and in isolation Mechanism: • Updates done during the transaction are buffered. • Validation is done to check that isolation was not violated due to conflict. • If passed, updates are committed • Else updates are discarded and the transaction is retried TM is a non-blocking synchronization mechanism: at least one thread will succeed Optimistic approach performs well when critical regions do not interfere with each other

HW Transactional Memory Hardware TM: • New instructions (LT, LTX, ST, Abort, Commit, Validate) • Fully-associative transactional cache for buffering updates • Straightforward extensions to multi-processor cache coherence protocol to detect transaction conflicts Drawbacks: • Portability: need special hardware • Size of transaction limited by transaction cache: overflow of transaction cache addressed by virtualization in newer implementations

SW Transactional Memory Software TM: • Revocable Two Phase Locking for Writes: A transaction locks all objects that it writes and does not release these locks until the transaction terminates. If deadlock occurs then one transaction aborts, releasing its locks and reverting its writes. • Optimistic Concurrency Control for Reads: Whenever a transaction reads from an object, it logs the version it read. When the transaction commits, it verifies that these are still the current versions of the objects. Drawbacks: Poor performance compared to locking • Atomic operations for acquiring shared object handles • Cost of consistency validation • Effect on cache of shared object metadata • Dynamic allocation, data copying and memory reclamation

TM’s Strengths • Provide performance and scalability by allowing multiple, non-interfering threads to concurrently execute in a critical section. Attains benefits of fine-grained locking but without the effort and complexity • Non-blocking: at least one transaction succeeds. • Fault tolerance: failure of one transaction will not affect others • For multi-word objects, requires fewer memory accesses than locks since no explicit lock variable • Can be used with difficult to partition data structures such as unstructured graphs • Can exploit concurrency where locks cannot: e.g. enque at head and deque at tail, typically can proceed concurrently except when empty queue in which case both must update both head and tail • Modular & Composible: transactions may be nested or composed

TM’s Weaknesses • The performance of transactions might suffer from excessive restarts along high-contention access paths to particular data structures • When transactions collide, only one can proceed, others must be rolled back. This can result in • Starvation of large transactions by smaller ones • delay of a high-priority thread via rollback of its transactions due to conflicts with those of a lower-priority thread • Cannot be used with non-idempotent operations such as I/O due to possibility of restarts • In earlier slides saw drawbacks of HTM and STM • Certain STM optimizations can result in allowing concurrent access to privatized data.

TM’s Weaknesses (cont.)Certain STM optimizations can result in allowing concurrent access to privatized data:

TM’s Weaknesses (cont.) • Cannot be used with non-idempotent operations such as I/O due to possibility of restarts: • Client cannot defer message until commit since it depends on the Server’s reply

Solutions to TMs Problems I/O • Buffered I/O might be addressed by including the buffering mechanism within the scope of the transactions doing I/O • Inevitable transactions which always commit can have non-idempotent operations. However there can be at most one of these. Contention Management • Carefully select the transactions to roll back based on priority, amount of work done etc. • Convert read only transactions to non-transactional form, in a manner similar to the pairing of locking with RCU. • For portability: use HTM when applicable, but fall back to STM otherwise. Reduce STM overheads of indirection, dynamic allocation, data copying, and memory reclamation by relaxing the non-blocking property

Combined Approach • Transactions perform well when critical regions do not interfere with each other, while locks usually perform better for highly contended critical sections. • Use locks for partitionable data which can be assigned to different CPUs • Use RCU/hazard pointers for read heavy workloads • Use TM for • update-heavy workloads using large non-partitionable data structures • Atomic operations spanning multiple data structures • TM should be made easily usable with locking so that the best approach is usable

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Micha

Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory By McKenney, Micha

Presentation Transcript

Transactional Locking

On the Other Side of the Hill

The Grass is Always Greener?

Leadership on the Other Side

Assessment: Greener side of the fence

Software Transactional Memory Should Not Be Obstruction Free

The grass isn’t always greener on the other side …

A Whack on the Other Side of the Head

“The grass is always greener on the other side.”

On the COST of concurrency In Transactional Memory

It is not always greener on the other side of the fence

Transactional Locking II

On the other side of the world

Transactional Locking

On the other side of the 337

Why build transactional memory workloads?

Kachiguda-On the other side of Charminar

Spotify vs Tidal: Side by Side Comparison of the Music Services

“The Grass is Not Always Greener” Exodus 20:17

Greener Grass

PWA vs. Native App: a Side-by-Side Comparison