200 likes | 303 Vues
This document provides a comprehensive overview of the POWER HTM (Hardware Transactional Memory) system, highlighting its key features, basic transactional instructions, and transactional behavior. It covers the functionality of TBEGIN and TEND instructions, transaction abort scenarios, and provides a deep dive into transactional registers and their significance in failure handling. Additionally, it explores key use cases, such as transactional lock elision. Acknowledgments are made to IBM colleagues from various locations for their contributions.
E N D
Overview of POWER HTM Maged Michael IBM T J Watson Research Center WTTM 2014 15 July 2014
Outline • POWER HTM features • Use cases • Performance results Acknowledgment of IBM colleagues in Austin, Yorktown, Tokyo, and Toronto Any errors in describing POWER HTM features and performance in this presentation are my own. WTTM 2014 - POWER HTM
POWER HTM Features WTTM 2014 - POWER HTM
Basic Transactional Instructions • TBEGIN: Begins an outermost transaction (or increments nesting level) • TEND: Commits an outermost transaction (or decrements nesting level) • TBEGIN sets a condition register to indicate success or failure • TEND sets a condition register to indicate whether it was executed in a transaction or not (i.e., extraneous TEND) • Transaction failure transfers control to the instruction following TBEGIN • Basic example tbegin. # begin transaction beq failure_handler # branch to failure handler if failure code is set ... tend. bgt was_not_in_a_transaction # (optional) check if tend was extraneous WTTM 2014 - POWER HTM
Features of Basic Transactions • No hardware progress guarantee. Failure handlers must include an alternative non-HTM software path. • Strong isolation. Hardware detection of conflicts with non-transactional accesses. • Flat nesting. Transaction failure transfers control to the instruction following the outermost TBEGIN. • Order guarantee for successful transactions among three groups of (cacheable write-back) memory accesses: • Before TBEGIN • Inside the transaction • After TEND Example: Initially X == Y == 0. r1 == r2 == 0 not allowed st X = 1 tbegin. ld r1 = Y tend. st Y = 1 tbegin. ld r2 = X tend. WTTM 2014 - POWER HTM
Transaction Abort • TABORT: Causes transaction failure • Unconditional variants with and without 8-bit code • Conditional variants with 32/64-bit register or immediate parameters • Example: Transactional lock elision entry tbegin. beq- tle_failure_handler ld r=LOCK # load lock cmpi r==FREE # compare with free value beq+ $+8 # if free, start critical section tabort. # if not free, abort TLE transaction <critical section> tbegin. beq- tle_failure_handler ld r=LOCK # load lock tabort[wd]ci. r!=FREE # If not free, abort TLE transaction <critical section> WTTM 2014 - POWER HTM
Transactional Registers and Failure Causes • TFHAR: Address of failure handler, i.e., outermost TBEGIN + 4 • TFIAR: Address of failure instruction when applicable • TEXASR: Transaction exception and status register. Includes cause of transaction failure. • TEXASR register contains a summary bit that provides a hint of whether the cause of failure is likely to be persistent or transient • TEXASR register also contains an 8-bit software code that may have been provided with a TABORT instruction • Failure causes include conflicts, abort instructions, footprint overflow , I/O, access to non-write-back memory, nesting level overflow, disallowed instructions (e.g., sleep, cache invalidation). WTTM 2014 - POWER HTM
Suspending/Resuming Transactional State • TSUSPEND: Suspends the current transaction. I.e., transitions from transactional state to suspended • TRESUME: Resumes the suspended transaction. • Loads and stores in suspended state are performed non-speculatively as they occur and do not use hardware transactional resources • No new transactions can be initiated in suspended state • Transaction failure is recorded but failure handling is deferred until the transaction is resumed • Load instructions of location written transactionally return the written values as long as the transaction has not failed • Stores in suspended state to locations accessed transactionally cause transaction failure • TCHECK: Checks for transaction failure and validity of prior memory operations. (May be used in transactional state too) WTTM 2014 - POWER HTM
Rollback Only Transactions (ROT) • Intended for single thread speculation • Not intended for shared data • No conflict detection • Keeps track only of transactional stores • No order guarantees • May be nested with atomic transactions WTTM 2014 - POWER HTM
Use Cases WTTM 2014 - POWER HTM
Transactional Lock Elision • Transactional lock elision - Entry pthread_mutex_lock(mutex) { if (do_tle(mutex)) { // Check TLE state and collect stats if needed attempts = 0; // Count TLE attempts for current TRY_TLE: if (__TM_begin()) { // Inside HW transaction if (!is_free(mutex)) __TM_abort(); // If mutex is busy abort HW transaction return 0; // return SUCCESS } // HW transaction failed // Failure handler: // Decide to retry TLE or fallback on conventional implementation // based on number of failed attempts, cause of failure, and lock recursion // May update TLE stats for the mutex if (decide_to_try_TLE_again(mutex,++attempts,__TM_is_failure_persistent())) { wait_until_free(mutex); backoff(attempts); goto TRY_TLE; } } <Fallback on conventional non-TLE lock acquisition implementation> } WTTM 2014 - POWER HTM
Transactional Lock Elision • Transactional lock elision - Exit pthread_mutex_unlock(mutex) { if (is_free(mutex)) if (__TM_end() // End TLE transaction return 0; // return success <Follow conventional non-TLE path> } WTTM 2014 - POWER HTM
Path Length Reduction • Example: java.util.concurrent ConcurrentLinkedQueue.offer() critical path of CAS-based implementation No TM 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 l t=[tail] isync l s=[t.next] isync l r=[tail] isync cmp r,t bne start_over cmpi s,0 bne fix_tail hwsync L1: larx r=[t.next] cmp r,s bne start_over stcx [t.next]=n bne- L1 hwsync L2: larx r=[tail] cmp r,t bne skip_stcx stcx [tail]=n bne- L2 isync WTTM 2014 - POWER HTM
Path Length Reduction • CLQ with TM TM 1 2 3 4 5 6 7 8 9 tbegin beq- failure_handler l t=[tail] l s=[t.next] cmpi s,0 beq+ L1 # skip next instruction mr t=s # not common case L1: st [t.next]=n st [tail]=n tend • Fallback on conventional CAS-based implementation in case of TM failure • Aggregation of memory barriers WTTM 2014 - POWER HTM
Other Use Case Examples • Hybrid HW/SW high-level transactions. E.g., HTM commit acceleration, spin-waiting in suspended state. • Thread-level speculation with commit ordering using suspended-mode accesses • Single thread speculation using Rollback-Only Transaction. Assume safe optimization and rollback if optimization was unsafe. WTTM 2014 - POWER HTM
Performance WTTM 2014 - POWER HTM
Single Thread • An empty Pthreads TLE critical section is 6% faster than a conventional Pthreads critical section. • 71% reduction in execution time (warm caches) of CLQ offer()/poll() pairs using TM path length reduction and memory barrier aggregation • The execution time of an empty transaction with suspend/resume is 3.4x that of an empty transaction without suspend/resume WTTM 2014 - POWER HTM
Pthreads TLE - Microbenchmarks • Pattern 1: high contention, no conflicts, data set fits in TM capacity • Pattern 2: high contention, data set that overflows TM capacity • Pattern 3: Mixed pattern 80% high contention, no conflict, fits in TM capacity 20% medium contention, overflows TM capacity WTTM 2014 - POWER HTM
Pthreads TLE - Memcached • Memcached server with varying number of threads • Client running on the same machine. • 96 hardware threads. 12 cores. SMT 8 • Best TLE throughput (on 16 threads) is 26.9% higher than best locking throughput (on 12 threads) • On 16 threads, TLE is higher by 37.5% WTTM 2014 - POWER HTM
Summary • POWER HTM Instruction Set • Suspend / Resume • Rollback Only Transactions • Low HTM overheads • Caution not to learn wrong lessons from specific implementations of specific HTM architectures. E.g., POWER HTM and BG/Q HTM Thank You WTTM 2014 - POWER HTM