1 / 23

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance. Anastassia Ailamaki Carnegie Mellon. David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison. L1 I-CACHE. L1 D-CACHE. MAIN MEMORY. L2 CACHE. $$$. Cache misses are extremely expensive. Memory Hierarchies.

xuxa
Télécharger la présentation

Weaving Relations for Cache Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison

  2. L1 I-CACHE L1 D-CACHE MAIN MEMORY L2 CACHE $$$ Cache misses are extremely expensive Memory Hierarchies PROCESSOR EXECUTION PIPELINE MAIN MEMORY

  3. Processor/Memory Speed Gap 1 access to memory  1000 instruction opportunities

  4. Breakdown of Memory Delays • PII Xeon running NT 4.0, 4 commercial DBMSs: A,B,C,D • Memory-related delays: 40%-80% of execution time Data accesses on caches: 19%-86% of memory stalls

  5. Data Placement on Disk Pages • Slotted Pages: Used by all commercial DBMSs • Store table records sequentially • Intra-record locality (attributes of record rtogether) • Doesn’t work well on today’s memory hierarchies • Alternative: Vertical partitioning [Copeland’85] • Store n-attribute table as nsingle-attribute tables • Inter-record locality, saves unnecessary I/O • Destroys intra-record locality => expensive to reconstruct record • Contribution: Partition Attributes Across • … have the cake and eat it, too Inter-record locality + low record reconstruction cost

  6. Outline • The memory/processor speed gap • What’s wrong with slotted pages? • Partition Attributes Across (PAX) • Performance results • Summary

  7. Current Scheme: Slotted Pages Formal name: NSM (N-ary Storage Model) PAGE HEADER RH1 1237 R Jane 30 RH2 4322 John 45 RH3 1563 Jim 20 RH4 7658 Susan 52     • Records are stored sequentially • Offsets to start of each record at end of page

  8. Jane 30 RH block 1 45 RH3 1563 block 2 Jim 20 RH4 block 3 52 2534 Leon block 4 Predicate Evaluation using NSM PAGE HEADER RH1 1237 Jane 30 RH2 4322 John 45 RH3 1563 Jim 20 RH4 7658 Susan 52 2534 Leon CACHE     selectname fromR whereage > 50 MAIN MEMORY NSM pushes non-referenced data to the cache

  9. Need New Data Page Layout • Eliminates unnecessary memory accesses • Improves inter-record locality • Keeps a record’s fields together • Does not affect I/O performance and, most importantly, is… low-implementation-cost, high-impact

  10. Partition Attributes Across (PAX) NSM PAGE PAX PAGE PAGE HEADER RH1 1237 PAGE HEADER 1237 4322 Jane 30 RH2 4322 John 1563 7658 45 RH3 1563 Jim 20 RH4 7658 Susan 52 Jane John Jim Susan     30 52 45 20     Partition data within the page for spatial locality

  11. 30 52 45 20 block 1 Predicate Evaluation using PAX PAGE HEADER 1237 4322 1563 7658 Jane John Jim Suzan CACHE     30 52 45 20 selectname fromR whereage > 50 MAIN MEMORY Fewer cache misses, low reconstruction cost

  12. A Real NSM Record FIXED-LENGTH VALUES HEADER VARIABLE-LENGTH VALUES offsets to variable- length fields null bitmap, record length, etc NSM: All fields of record stored together + slots

  13. 4 v 4 1237 4322 1 1 Jane John  30 45 1 1 PAX: Detailed Design # records free space # attributes attribute sizes } Page Header } pid 3 2 f F - Minipage presence bits } V - Minipage v-offsets } F - Minipage presence bits PAX: Group fields + amortizes record headers

  14. Outline • The memory/processor speed gap • What’s wrong with slotted pages? • Partition Attributes Across (PAX) • Performance results • Summary

  15. Sanity Check: Basic Evaluation • Main-memory resident R, numeric fields • Query: selectavg(ai) fromR whereaj >= Loandaj <= Hi • PII Xeon running Windows NT 4 • 16KB L1-I, 16KB L1-D, 512 KB L2, 512 MB RAM • Used processor counters • Implemented schemes on Shore Storage Manager • Similar behavior to commercial Database Systems

  16. Why Use Shore? • Compare Shore query behavior with commercial DBMS • Execution time & memory delays (range selection) We can use Shore to evaluate DSS workload behavior

  17. Effect on Accessing Cache Data • PAX saves 70% of NSM’s data cache penalty • PAX reduces cache misses at both L1 and L2 • Selectivity doesn’t matter for PAX data stalls

  18. Time and Sensitivity Analysis • PAX: 75% less memory penalty than NSM (10% of time) • Execution times converge as number of attrs increases

  19. Evaluation Using a DSS Benchmark • 100M, 200M, and 500M TPC-H DBs • Queries: • Range Selections w/ variable parameters (RS) • TPC-H Q1 and Q6 • sequential scans • lots of aggregates (sum, avg, count) • grouping/ordering of results • TPC-H Q12 and Q14 • (Adaptive Hybrid) Hash Join • complex ‘where’ clause, conditional aggregates • 128MB buffer pool

  20. TPC-H Queries: Speedup • PAX improves performance even with I/O • Speedup differs across DB sizes

  21. Updates • Policy: Update in-place • Variable-length: Shift when needed • PAX only needs shift minipage data • Update statement: update R set ap=ap + b where aq > Lo and aq < Hi

  22. Updates: Speedup • PAX always speeds queries up (7-17%) • Lower selectivity => reads dominate speedup • High selectivity => speedup dominated by write-backs

  23. Summary • PAX: a low-cost, high-impact DP technique • Performance • Eliminates unnecessary memory references • High utilization of cache space/bandwidth • Faster than NSM (does not affect I/O) • Usability • Orthogonal to other storage decisions • “Easy” to implement in large existing DBMSs

More Related