1 / 111

Hardware Support for Efficient Transactional and Supervised Memory Systems

Hardware Support for Efficient Transactional and Supervised Memory Systems. Jayaram Bobba Dissertation Defense 1/14/2010. Overview: 1) Research Area 2) Challenges/ Contributions 3) Big Picture. Dept. of Computer Sciences University of Wisconsin–Madison. Research Area. Device Scaling.

eyal
Télécharger la présentation

Hardware Support for Efficient Transactional and Supervised Memory Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware Support for Efficient Transactional and Supervised Memory Systems JayaramBobba Dissertation Defense 1/14/2010 Overview: 1) Research Area 2) Challenges/ Contributions 3) Big Picture Dept. of Computer Sciences University of Wisconsin–Madison

  2. Research Area Device Scaling • Emergence of CMPs • Hard to Program Abundant Transistors Hardware Support to Improve Productivity Empty/full-bits Transactional Memory MemTracker Supervised Systems Deterministic Memory Wisconsin Multifacet Project

  3. Challenges • Supervised Systems • Sequential-consistency only • Ad hoc hardware • Lack of formalism • Transactional Memory • “Most transactions are small” • Self-fulfilling • Limited applicability • Contribution 1: • Supervised Memory • TSOdata,Safe Supervision • Contribution 2: TokenTM Contribution 3: StealthTest Wisconsin Multifacet Project

  4. Big Picture Applications Software Tools • StealthTest Supervised Systems • TokenTM • TSOdataand • Safe Supervision Supervised Memory Hardware Wisconsin Multifacet Project

  5. Outline Slide Count • Motivation • Supervised Memory • TokenTM • StealthTest • Conclusion Wisconsin Multifacet Project

  6. On Software Productivity More Productivity Yannis’s “Law”: Programmer Productivity doubles every 6 years More Performance Moore’s Law Moore’s Law will continue But Yannis’s Law? Wisconsin Multifacet Project

  7. What has changed? • “A Fundamental Turn towards Concurrency in Software” [Herb Sutter, 2005] • Moore’s Law -> Better Computers • Sequential Computers (Past) • Memory wall, Power wall etc. • Attack of the killer CMPs* (Current) • How to program? Expose parallelism to software • Parallel programs hard to write * Adapted from “Attack of the killer micros” by Eugene Brooks Wisconsin Multifacet Project

  8. Who solves the productivity issue? • Why, Of course, hardware architects! • Long live Moore’s Law • Spend some transistors on productivity issues • Architectural Support for Enhancing Productivity • for language features • for bug avoidance • for debugging • for performance feedback • and so on… Wisconsin Multifacet Project

  9. Seriously, Who should solve it? • HW Architects or SW Engineers? • ‘software crisis’ in the past too… • Why HW architects? • More bang for the buck (Economic) • Software/IT (1,152 billion) vs Hardware (138 billion) [Wen Mei Hwu, Micro-39 Keynote] • SW cannot do it alone (Technical) • Decades of automatic parallelization efforts • Virtual Memory, Tagged Memory for LISP-like languages “We must now reconsider the balance of hardware and software and to provide more specialized function in hardware than we have previously, in order to drastically simplify the programming process” Edward A. Feustel, IEEE TOC, July 1973 in support of Tagged Memory Wisconsin Multifacet Project

  10. Outline • Motivation • Supervised Memory • Background/Motivation • Explore relaxed supervised systems • Define Supervised Memory • Propose formal models • TokenTM • StealthTest • Conclusion Wisconsin Multifacet Project

  11. Why Supervised Systems? • Synchronization • Hardware TM systems • Empty/Full-bits • [Berry et al 2006] Graph processing algorithms on 4 processor MTA > 64K BG/L • Controlled non-determinism • Deterministic/Interleaving Constrained Multiprocessing • Debugging • Log-based architectures • Safety • Heap checkers, Bounds checkers • Language Features • Hardware-assisted Garbage Collection Wisconsin Multifacet Project

  12. What are Supervised Systems? • out-of-band metadata per data block • monitor & control (supervise) memory accesses to data • execute handlers on specific metadata states • pure software possible, but inefficient • shadow memory E.g., Valgrind. Mean Slowdown 22X [Nethercote et al., VEE2007] Wisconsin Multifacet Project

  13. State-of-the-Art • Expect Sequentially-Consistent (SC) hardware • Most hardware is not • Ad hoc • Whither primitives? • Informal treatment of memory consistency • Ambiguous/Incorrect Wisconsin Multifacet Project

  14. Contributions • Expect Sequentially-Consistent (SC) hardware • Most hardware is not • Ad hoc • Whither primitives? • Informal treatment of memory consistency • Ambiguous/Incorrect Explore relaxed supervised systems Define Supervised Memory Propose formal memory models Wisconsin Multifacet Project

  15. Outline • Motivation • Supervised Memory • Background/Motivation • Explore relaxed supervised systems • Define Supervised Memory • Propose formal models • TokenTM • StealthTest • Conclusion Wisconsin Multifacet Project

  16. Explore relaxed supervised systems TSO-lite: A TSO-compliant system PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 Processor ST 0x10, C 0x01 STA LDB r1 r1 r2 r2 r3 r3 0x10 Store Buffer Memory Wisconsin Multifacet Project

  17. Explore relaxed supervised systems Empty/Full-Bits on TSO-lite PC PC ST 0x01, A ST 1, [A] LD [B], r1 ST 2,[C] LD [C], r3 ST Processor ST 0x10, C 0x01 r1 r1 Full Empty r2 r2 r3 r3 LD Store Buffer I1: NO LOAD BYPASS ST LD Exception EXCEPTION LD/ST None Memory I2: LATE EXCEPTIONS Wisconsin Multifacet Project

  18. Explore relaxed supervised systems Deterministic Shared Memory (DMP)[Devietti et al., ASPLOS 2009] “depending upon the consistency model of the underlying hardware, threads must perform a memory fence at the edge of a quantum” • Insert a fence after the last operation in the quantum • Insert a fence before the first shared operation in the quantum I3: Reordered metabit-reads Wisconsin Multifacet Project Illustration

  19. Outline • Motivation • Supervised Memory • Background/Motivation • Explore relaxed supervised systems • Define Supervised Memory • Propose formal models • TokenTM • StealthTest • Conclusion Wisconsin Multifacet Project

  20. Define Supervised Memory What is Supervised Memory? • Each memory location A, • data (A.d) • metadata (A.m) • New operations • Supervised Load (sLD A) • Supervised Store (sST A) • Jump on reading special metadata (Optionally) • Hardware exception Wisconsin Multifacet Project

  21. Define Supervised Memory Supervised Operations sLD A => Start: atomic{ curm = Val[RA.m] // Read metadata nextm = NEXT(Load, curm) // Check software- // specified FSM If nextm == EXCEPTION then Jump to Handler RA.d // Read data If (nextm != curm) then WA.m,nextm // Update metadata } Handler: … Wisconsin Multifacet Project

  22. Define Supervised Memory Using Supervised Memory • Software assigns semantics to metadata • Metastates stored as metadata • E.g., Initialized, Uninitialized • Metastate transition function (NEXT) • Use supervised operations to monitor/control data operations • E.g., catch read access to uninitialized data Wisconsin Multifacet Project

  23. Outline • Motivation • Supervised Memory • Background/Motivation • Explore relaxed supervised systems • Define Supervised Memory • Propose formal models • TokenTM • StealthTest • Conclusion Wisconsin Multifacet Project

  24. Propose formal models TSO Axioms [Hangal et al., ISCA 2004] Wisconsin Multifacet Project

  25. Propose formal models TSO Axioms [Hangal et al., ISCA 2004] Reordering Axioms Rd A Rd B Rd A Wr B Wr A Wr B Wr A Rd B Allows store buffers Wisconsin Multifacet Project

  26. Propose formal models TSOall: A Consistency Model for Supervised Memory TSO axioms applied to all accesses—data and metadata + (Simple) Like TSO — (Slow) Prohibits optimizations Thread: sST A sLD B => Store buffers ineffective • Tension • Ease of Reasoning vs Performance ->[Rd A.m, WrA.d, WrA.m] ->[Rd B.m, Rd B.d] Wisconsin Multifacet Project

  27. Propose formal models Blast from the Past[Adve and Hill, ISCA1990] • Ease of Reasoning (SC) vs Performance (RC) • Observation: • Simple programs rely only on certain SC orders • Ignore non-essential orders. Still appears as SC • Challenge:Simple? Non-essential orders? • Solution:Data-race-freedom • For data-race-free programs, RC = SC Wisconsin Multifacet Project

  28. Propose formal models Safe SupervisionMotivation • Ease of Reasoning (TSOall) vs Performance (?) • Observation: • Simple supervised programs rely only on certain TSOall orders • Ignore non-essential orders. Still appears as TSOall • Challenge: Simple? Non-essential orders? • Solution: Safe Supervision • For safely supervised programs, ? = TSOall Wisconsin Multifacet Project Examples

  29. Safe Supervision • metadata accesses to location A not used to order operations to a different location B • Most uses of supervision are safely supervised. E.g., • Heap Checker: Initialized/Uninitialized values • Transactional Memory: Conflict Detection information Initially, A.m = Empty, B.d = 0 Thread 1: B.d = 1 A.m = Full Thread 2: While (A.m == Empty); Read B.d Wisconsin Multifacet Project Definition

  30. Propose formal models TSOdata: Fast Yet Simple Thread: sST B sLDA Reordering Axioms ->[Rd A.m, WrA.d, WrA.m] • Store buffers can • be used ->[Rd B.m, Rd B.d] • For safely supervised programs, TSOdata = TSOall Wisconsin Multifacet Project

  31. TSOdata on OpenSPARC T2 • Goal: Explore low-level issues on a real design • Late Exceptions with deferred handlers • Dump store buffer entries on exception • Enhance store buffer to carry Virtual Address (VA) • ~200 cycles to read out 4 entries • Disable store buffer bypassing for supervised loads • Low space overhead for adding metabits (~4%) Wisconsin Multifacet Project

  32. Supervised Memory Summary • Expects Sequentially-Consistent (SC) hardware • Most hardware is not • Ad hoc • Whither primitives? • Informal treatment of memory consistency • Ambiguous/Incorrect Explore relaxed memory systems Define Supervised Memory Propose formal memory models Wisconsin Multifacet Project

  33. Outline • Motivation • Supervised Memory • TokenTM[ISCA 2008] • StealthTest • Conclusion Longer Version Wisconsin Multifacet Project

  34. TokenTM Summary • Current Hardware TMs • Most Transactions Small & Short Running • Penalize large/long transactions • Too restrictive for wide-spread TM use? • Hypothesis • Must Support Efficient Large/Long Transactions As Well • Is such an HTM even possible? • Yes! TokenTM 1. LogTM’s Log to buffer unbounded values 2. Transactional Tokens for unbounded conflict detection • Conflict state in memory metabits Wisconsin Multifacet Project

  35. Transactional Tokens • Challenge: How to efficiently track Read/Write sets? • Token Coherence [Martin03] • Read/Write sets for cache coherence • Solution: Transactional Tokens • T tokens per memory block • At least one token to read, All T tokens to write (token conflict detection) • Token Metadata <c0,c1,…,ci,…>where 0≤ci≤Tis count of tokens held by thread with TID i. Wisconsin Multifacet Project

  36. Tokens and Supervised Memory • Challenge:Where to store Unbounded, Globally Accessible Token Metadata? • unbounded and globally accessible • Solution • Supervised Memory’s Metadata • Piggyback on existing Virtual Memory and Cache Coherence mechanisms Skip Animation Wisconsin Multifacet Project

  37. TokenTM: a Large-Transaction TM • New Conflict Detection Mechanism • Transactional Tokens in Supervised Memory • Token Coherence [Martin03] at different level • Version Management • Save old/new values for unbounded Write set • LogTM [Moore06] undo log Wisconsin Multifacet Project

  38. Outline • Motivation • Supervised Memory • TokenTM • StealthTest [PACT 2009] • Conclusion Wisconsin Multifacet Project

  39. StealthTest Summary (1/2)The Problem: fork Overhead • Software testing hard • Multithreading makes harder • Online software testing can help • Run tests on deployed software E.g., Delta Execution for patch testing [Tucek et al., ASPLOS 2009] • Non-intrusive mechanisms • fork(existing) Low Overhead Functionally Hidden Good Scaling fork Wisconsin Multifacet Project

  40. StealthTest Summary (2/2)Solution: TM for testing • Leverage Transactional Memory for online testing • Non-Intrusive? • transaction { test(); abort} • Fast TM mechanisms Low Overhead Functionally Hidden Good Scaling • Demonstrate two uses • Delta Execution • In vivo Testing StealthTest Wisconsin Multifacet Project

  41. Outline • Motivation • Supervised Memory • TokenTM • StealthTest • Online Software Testing • E.g., Patch Validation • StealthTest: TM for online testing • Delta Execution using StealthTest • In vivo Testing using StealthTest (Optionally) • Conclusion Wisconsin Multifacet Project

  42. Online Patch Validation • Bug fixes can introduce more bugs • Patches must be validated • Online Validation [Nagaraja et al., OSDI 2004] • Increased resource usage • Lockstep execution Output Production Input Testing Diff Wisconsin Multifacet Project

  43. Delta Execution[Tucek et al., ASPLOS 2009] • Online Patch Validation Most patches are small Patched and Un-patched executions similar • Delta Execution • Run together except when they differ Wisconsin Multifacet Project

  44. Delta Execution using fork Patched execution Install D data Testing Production fork Isolate D data Merged execution Compute D data Unpatched execution Time Wisconsin Multifacet Project

  45. Multi-threading and fork ‘Park’ all other threads Patched execution Install D data Testing Production fork Isolate D data Compute D data Unpatched execution Merged execution Time Stop all threads to get a consistent memory snapshot Wisconsin Multifacet Project

  46. fork Poor Performance ~9.8ms for split/~106ms for merge [Tucek et al, ASPLOS 2009] Poor Scalability Web-server response rate reduced by 43% Want an alternate mechanism Wisconsin Multifacet Project

  47. Outline • Motivation • Supervised Memory • TokenTM • StealthTest • Online Software Testing • E.g., Patch Validation • StealthTest: TM for online testing • Delta Execution using StealthTest • In vivo Testing using StealthTest (Optionally) • Conclusion Wisconsin Multifacet Project

  48. Delta Execution using StealthTest Isolate patched execution Introspect patched execution Monitor delta data access Delta Execution StealthTest transaction{…} Version Management Tracks new/old values Conflict Detection Monitor accesses Transactional Memory Execute on child process Page diffing mprotect fork Wisconsin Multifacet Project

  49. StealthTest Interface Isolate patched execution Introspect patched execution Monitor delta data access Delta Execution ST_begin_transaction ST_abort_transaction ST_get_old ST_get_new ST_protect_set ST_protect_clear StealthTest transaction{…} Version Management Tracks new/old values Conflict Detection Monitor accesses Transactional Memory Wisconsin Multifacet Project

  50. Requirements from TM • Strong Atomicity [Martin et al., CAL 2006] Transactions isolated from non-transactions => Test transactions isolated from application code • Flexible Conflict Resolution Can abort transactions if necessary => Abort tests if they block application • Communication from within transactions => Expose result of a test Wisconsin Multifacet Project

More Related