1 / 40

Transactional Memory : Hardware Proposals Overview

Transactional Memory : Hardware Proposals Overview. Manu Awasthi Architecture Reading Club Fall 2006. Why do we care?. The rise of multicore architectures, CMP’s (Support for) Lots of cheap threads available Synchronization will be an issue Concurrent updates on shared memory

ciro
Télécharger la présentation

Transactional Memory : Hardware Proposals Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transactional Memory : Hardware Proposals Overview Manu Awasthi Architecture Reading Club Fall 2006

  2. Why do we care? • The rise of multicore architectures, CMP’s • (Support for) Lots of cheap threads available • Synchronization will be an issue • Concurrent updates on shared memory • Today’s methodologies (Locks) • Are not scalable • Fail to exploit concurrency to the fullest

  3. Why Locks are EVIL? • Locks: objects only one thread can hold at a time • Organization: lock for each shared structure • Usage: (block)  acquire  access  release • Correctness issues • Under-locking  data races • Acquires in different orders  deadlock • Performance issues • Conservative serialization • Overhead of acquiring • Difficult to find right granularity • Blocking

  4. Example of evil Locks struct Shared_Structure{ int shared_var1; int shared_var2; int shared_var3; : : : };

  5. Example of evil Locks struct Shared_Structure{ int shared_var1; int shared_var2; int shared_var3; : : : };

  6. Example of evil Locks struct Shared_Structure{ int shared_var1; int shared_var2; int shared_var3; : : : };

  7. Example of evil Locks struct Shared_Structure{ int shared_var1; int shared_var2; int shared_var3; : : : };

  8. Coarse-Grained Locking Easily made correct … But not scalable.

  9. Fine-Grained Locking • more scalable • High overhead in acquire and release • Increased complexity

  10. Enter Transactions… • Code segments with three features: • Atomicity • Serialization only on conflicts • Rollback support <begin_transaction> { statement_1; statement_2; statement_3;….. } <end_transaction> • Generally, critical section = transaction atomic instructions

  11. Agenda • Transactions: what all the hoopla’s about • Research Proposals • Usages • Implementations Disclaimer 1:Covering only hardware support Disclaimer 2: Purely an overview

  12. Hardware Overview • Exploit Cache coherence protocols • Already do almost what we need • Invalidation • Consistency checking • Exploit Speculative execution • Branch prediction = optimistic synchro!

  13. Execution Strategy • Four main components: • Logging/buffering (Speculative Execution) • Conflict detection • Abort/rollback • Commit • All papers present different methods of doing the above.

  14. T HW Transactional Memory read active caches Interconnect memory

  15. T T Transactional Memory read active active caches memory

  16. T T Transactional Memory active committed active caches memory

  17. D T Transactional Memory write committed active caches memory

  18. D T T Rewind write aborted active active caches memory

  19. Transaction Commit • At commit point • If no cache conflicts, we win. • Mark transactional entries • Read-only: valid • Modified: dirty (eventually written back)

  20. But…. • Limits to • Transactional cache size • Scheduling quantum • Transaction cannot commit if it is • Too big • Too slow • Actual limits platform-dependent

  21. [Rajwar & Goodman, ASPLOS ‘02] TLR/SLE • Transactional execution of critical sections. • Locks define scope of a transaction • Doesn’t change the programming model • H/W identifies and speculatively executes critical sections. • Timestamps provide serializabilty.

  22. SLE • Mechanism to identify lock acquires and releases • Enabling mechanism for TLR • Concept of silent stores

  23. SLE Algo

  24. SLE Algo

  25. Livelocks

  26. TLR Algo..

  27. [Hammond+, ISCA ‘04 & ASPLOS ‘04] TCC @ Stanford • Again, speculative transaction execution • Identify transaction start and end • Read set, write set. • Save architectural state • Check for conflicts on memory references • Snoop over system bus to check for violations • Fold the commit state in a packet • Send over sys bus, commit • Centralized bus arbiter => scalability limits!!

  28. TCC – Programming Model • Divide into transactions • Here, its programmer’s job • However, easier to do than locks. • Why? • Specify order • In case relative ordering of transaction commit matters • e.g.? • Assign phase numbers to transactions.

  29. TCC Node

  30. Some Results • Small read state (6-12 kB) • Write state (4-8 kB) • Both of above per benchmark, per processor • Significant speedup • Not so modest bandwidth requirements

  31. UTM/LTM @ Stanford • Most transactions are small • 99.9% touch 54 cache lines or less • BUT, some go upto 8000 lines (!!!!!) • Thesis : transaction footprint should be unbounded • Added ISA support for the same • Book-keeping, in memory, transaction log • Helps survive interrupts, process migration

  32. So, What’s New? • ISA support • XBEGIN pc • XEND • Rollback Support : Rename Tables snapshot. • Xstate data structure for memory state • has log records of all active transactions • Log = commit record + log entry vector • Log pointer • RW bit

  33. Processor Modifications

  34. The Xstate DS

  35. Interesting Results

  36. LogTM @ UW-Madison • Motivation : Make the common case fast • Commits are more frequent than aborts • Basic Strategy : similar to UTM • Store new values in place, old values in log • Log properties • Per thread log • Cacheable in virtual memory • i.e. part of thread address space reserved for logging. • Log writes mostly cache hits (small transactions) • Low TLB translation overhead (small transactions)

  37. Conflict Detection • Directory based protocol • Send request to directory • Directory forwards requests to processors • Each processors checks for conflicts • Ack (No conflict), Nack (Conflict) • Resolve conflict based on responses. • Extended Directory states • For taking care of transactional line overflow

  38. More Work @ UW-Madison • VTM (Rajwar+) • Thread Level TM (Goodman +) • Goal: persistent transactions with less overhead • Approach: group transactions by process • Implementation: buffer in cache + overflow table in virtual memory + various interesting optimizations

  39. Summary • Transactions: Promising approach to synchronization • Simple interface + efficient implementation • Uses: optimistic lock removal, lock-free data structures, general-purpose synchronization, parallelization, ?? • Challenges • Implementation • Interface • OS involvement • I/O + rollback

More Related