Toward High Performance Nonblocking Software Transactional Memory

Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs

Nonblocking Progress & Transactional Memory • Nonblocking Progress – arbitrary delays in some threads do not prevent others from making forward progress • TM research began for nonblocking concurrent algorithms [Herlihy&Moss ISCA’93] • Early software TMs (STMs) were nonblocking, but slow • Recent shift toward blocking STMs • Significant performance improvements • General argument – nonblocking STMs are fundamentally slow • We were not convinced

Agenda • Why is nonblocking progress important? • Background on STM Implementations • What makes nonblocking STMs slow? • Making nonblocking STMs fast • Experimental Results • Conclusions

The Virtues of Nonblocking Progress • Tolerance from arbitrary delays due to • Preemption, • Page faults, • Thread faults • External scheduler support mitigates some problems, but • Not portable • Ideally contain the problem within the STM • Environments where blocking is unacceptable • TxLinux interrupt handler transactions

STM Implementations • Transactions execute speculatively • Reads and writes use STM metadata • Speculative writes typically acquire ownership of locations (using atomic ops. e.g. CAS) • Reads are typically logged in a private read set for validation at commit time • Post-commit/abort cleanup • Make speculative updates non-speculative, or rollback speculative updates • Release ownership of locations This forces waiting in blocking STMs

STM Implementations • Two types of implementations for speculative writes: • Redo Log – • writes made to private buffer, • and flushed out on commit • ownership acquisition can be done at first write (eager acquire) or commit time (lazy acquire) • Undo Log – • writes are made directly to memory (need eager acquire), • old values are logged in a private buffer, and • old values are restored in case of an abort • Read set validation to ensure isolation • Several schemes (e.g. incremental, commit counter, timestamp, etc.)

What makes nonblocking STMs slow? • In Blocking STMs • Transaction waits for a conflicting transaction in its post-commit/abort cleanup phase • Nonblocking STMs avoid waiting with • Indirection (object-based STMs) • Copying and Cloning • Helping • Stealing (Harris & Fraser; also our approach) • These usually lead to overheads in the (contention-free) common case

What makes blocking STMs fast? • Significantly less overhead in the common case • Simple metadata structure • Streamlined fast path • Performance optimizations • Timestamp based validation • We need to incorporate all these features in a nonblocking STM to make it competitive

Our Contributions • Keep the common case simple • Resort to complicated case only when cleanup is delayed • More streamlined common case execution path • Incorporate recent optimizations (timestamp based validation)

STM Data Structures • Word-based STM • Conflict detection at granularity of contiguous blocks of memory • Appropriate for unmanaged languages – C, C++ • A table of ownership records (orecs) • Each heap location hashes into a single orec • Each orec indicates if currently owned or free, and identifies the owner • Transaction Descriptor • Read set • Write set (redo log) – a 2D list, each row corresponds to an acquired orec • Status – Active/Aborted/Committed

Common Case Execution • Algorithm behaves like a blocking STM in the absence of contention • Log reads, writes of transaction • Acquire ownership of write set locations via their orecs • Ensure that reads are still consistent (read set validation) • Flush out updates after commit/abort • Release orecs

Uncommon Case: Stealing • Two flags in the orec for the stealing process • stolen_orec: for orec’s stolen/unstolen state • copier_exists: indicates if there exists an owner in cleanup phase

Stealing Example Copyback complete Copyback in progress locX’s logical value OWNER Write Set Clear C T1 COMMITTED locX:11 ID, flags ver# hashing 0 1 0 0 1 1 0 0 o1 locX 11 10 12 o2 STEALER 1 Write Set T2 ACTIVE T2 COMMITTED o3 locX:11 locX:12 o4 o5 S C STEALER 2 Write Set T3 ACTIVE locX:12 Redo Copyback Shared Heap Ownership Records (orec) third owner (stealer 2)

Stealing Complexity • Stealing mechanism quite complex • Several corner case race conditions need to be handled (read the paper for further details) • Overhead of accessing stolen locations is quite high, requiring a lookup in the last stealer’s write set • However, we can throttle stealing and make it an uncommon case

Streamlining Common Case • To release acquired orecs prior nonblocking STMs required • Expensive synch. instructions (e.g. CAS) • Indirection & garbage collection • Blocking STMs use store instruction • So do we(details in the paper)

Timestamps and Validation • A significant optimization to read set validation (e.g. TL2) • Log time at which orec was modified (done when owner releases orec) • A reader checks if the orec was modified after it began execution, and if so, aborts conservatively

Adding Timestamps • Recall: orec contains a pointer to the owner • Superimpose a timestamp on this pointer • A writer releases orec by storing back the current global time • Timestamps lowered the cost of read set validation significantly

Undo Log Variant • We have developed the first nonblocking undo log STM through simple modifications to a redo log variant • Stealing of orecs happens in the redo log STM when a committed owner is delayed • In undo log STMs stealing largely happens when an aborted owner is delayed • Logical values of locations are in aborted owner’s undo log

Experimental Platform • Implementation of all STMs done in C • Throughput tests conducted on microbenchmarks • Scalable workloads: hash table, binary search tree • Torture tests (no scaling): counter, array of counters • Tests conducted on a 16 processor Sun Fire machine • We compared the following STMs • TL2, • TL2 with schedctl calls to avoid preemption pathologies, • Harris and Fraser’s word-based nonblocking STM • Our Base blocking and nonblocking variants (do not contain store-based release and optimizations), and • 3 variants of our Optimized STM (eager redo log, lazy redo log, undo log)

Binary Search Tree Our Optimized STMs TL2 Base NB HF-STM

Hash Table TL2-Sched TL2 Our Optimized STMs

Array of Counters Undo Log TL2 TL2-Sched Redo Log

Array of Counters – Stealing rate Undo Log Redo Log

Conclusion • We presented several variants of a new STM that • Effectively decouples the common case from nonblocking infrastructure • Enables a more streamlined fast path (comparable to state-of-the-art blocking STMs) • Enables integration of key optimizations such as • Timestamp-based transaction validation • We have shown that common case performance of nonblocking STMs can be made competitive with state-of-the-art blocking STMs

Thank You! Questions?

Common Case Example Copyback complete Copyback in progress locX’s logical value Write Set Release Store T1 ACTIVE T1 COMMITTED locX:11 ID, flags ver# hashing 0 0 o1 locX 10 11 o2 o3 o4 o5 S C Shared Heap Ownership Records (orec) third owner (stealer 2)

Basic Idea • Transaction steals ownership of the location under conflict • Inspired by Harris & Fraser’s WSTM • Stealing • Requires complex metadata management • Leads to high latency reads and writes • Switch the stolen location back to unstolen state as quickly as possible

Phase-I STM: Switching orec back to Unstolen state • If an orec is stolen, logical values of mapping locations may be in the last stealer’s write set (pointed by the orec) • Stealer will reuse such a write set row (for a new transaction) only after it is reclaimed • Subsequent stealer that comes across a stolen orec with (copier_exists == false) switches orec to unstolen state • Stealing-releasing is a complex process

Phase-I STM: Illustration First owner T1 COMMITTED Clear C ID, flags ver# hashing 1 1 0 0 1 0 0 0 o1 Second owner (stealer 1) o2 T2 ACTIVE o3 Third owner (stealer 2) o4 T3 ACTIVE o5 S C Shared Heap Ownership Records (orec) third owner (stealer 2)

STM API • stm_begin(my_txn): Initializes a transacation • stm_read(my_txn,loc): Speculative read of location loc • stm_write(my_txn,loc,val): Speculative write val to loc • stm_commit(my_txn): Attempt to commit transaction

Phase-I STM: Example Copyback complete Copyback in progress locX’s logical value First owner Write Set Clear C T1 COMMITTED locX:11 ID, flags ver# hashing 0 1 0 0 1 0 1 0 o1 locX 11 10 o2 Second owner (stealer 1) Write Set T2 ACTIVE o3 locX:11 o4 o5 S C Third owner (stealer 2) Write Set T3 ACTIVE locX:11 Redo Copyback Shared Heap Ownership Records (orec) third owner (stealer 2)

Phase-I STM: Stealing Mechanism • Steal orec when transaction encounters orec acquired by a committed transaction • The committed transaction is copying back its speculative updates • Stealing done in two steps: • Merge speculative updates of victim to the orec’s locations into stealer’s write set • Acquire the orec with an atomic op • This involves setting some special flags that indicate to the system that the orec is stolen

Phase-I STM: Stolen orec state • Logical values of stolen locations are always in the stealer’s write set • Subsequent accesses to these locations must lookup the stealer’s write set • Quite expensive • We use some flags to indicate when it is safe for a new stealer to switch the orec back to the unstolen state

Toward High Performance Nonblocking Software Transactional Memory

Toward High Performance Nonblocking Software Transactional Memory

Presentation Transcript

Transactional memory

Software Transactional Memory

Software Transactional Memory

Guilt-free Nonblocking Software Transactional Memory

Adaptive Software Transactional Memory

Software Transactional Memory

Software Transactional Memory

Software Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Algorithmics for Software Transactional Memory

Software Transactional Memory

Software Transactional Memory

Dynamic Software Transactional Memory

Transactional Memory

Transactional Memory

Transactional Memory

Software Transactional Memory

Software Perspectives on Transactional Memory

Toward High Performance Nonblocking Software Transactional Memory

Transactional Memory