1 / 22

Performance Tradeoffs in Read-Optimized Databases

Performance Tradeoffs in Read-Optimized Databases. Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden. massachusetts institute of technology. 1 Joe 45. 2 Sue 37. … … …. Read-optimized databases. 1. 2. Joe. …. SQL Server DB2 Oracle.

iokina
Télécharger la présentation

Performance Tradeoffs in Read-Optimized Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts institute of technology

  2. 1 Joe 45 2 Sue 37 … … … Read-optimized databases 1 2 Joe … SQL Server DB2 Oracle Sybase IQ MonetDB CStore Sue 45 … 37 … row stores column stores Read optimizations: Materialized views, multiple indices, compression How does column-orientation affect performance? massachusetts institute of technology

  3. Joe 45 Joe 45 reconstruct 45 project Joe 1 2 … Joe Sue 45 37 … 3 files … Rows vs. columns row data column data seek 1 Joe 45 2 Sue 37 single file … … … Study performance tradeoffs solely in data storage massachusetts institute of technology

  4. Performance study • Methodology • Built storage manager from scratch • Sequential scans • Analyze CPU, disk, memory • Findings • Columns are generally more I/O efficient • Competing traffic favors columns • Conditions where columns are CPU-constrained • Conditions where rows are MemBW-constrained massachusetts institute of technology

  5. Talk outline • System architecture • Workload and Experiments • Analysis • Conclusions massachusetts institute of technology

  6. System architecture • Block-iterator operators • Single-threaded, C++, Linux AIO • No buffer pool • Use filesystem, bypass OS cache • Compression • Dense-pack 100% full 60% full massachusetts institute of technology

  7. Compression methods • Dictionary • Bit-pack • Pack several attributes inside a 4-byte word • Use as many bits as max-value • Delta • Base value per page • Arithmetic differences … ‘low’ … … ‘high’ … … ‘low’ … … ‘normal’ … … 00 … … 10 … … 00 … … 01 … massachusetts institute of technology

  8. Storage engine row scanner column scanner SELECT name, age WHERE age > 40 Joe 45 … … Joe 45 … … S apply predicate(s) S #POS 45 #POS … name apply predicate #1 S age massachusetts institute of technology

  9. direct IO read 128 bytes 100ms read L2 cache prefetching 10ms seek Platform CPU L2 RAM DISKS (striped) 3.2 GB/sec 3.2GHz 180 MB/sec 1GB 1MB prefetching: massachusetts institute of technology

  10. Workload • LINEITEM (wide) • 60m rows → 9.5 GB • ORDERS (narrow) • 60m rows → 1.9 GB • Query 150 bytes 50 bytes 32 bytes 12 bytes SELECT a1, a2, a3, … WHERE a1 yields variable selectivity massachusetts institute of technology

  11. 25B 10B 69B text text text int 4B char 1B Wide tuple: 10% selectivity Column • Large prefetch hides disk seeks in columns Row time (sec) Column (CPU only) Row (CPU only) selected bytes per tuple massachusetts institute of technology

  12. time (sec) row store Wide tuple: 10% sel. (CPU) # attributes selected column store • Row-CPU suffers from memory stalls massachusetts institute of technology

  13. time (sec) row store Wide tuple: 10% sel. (CPU) 0.1% # attributes selected column store • Column-CPU efficiency with lower selectivity massachusetts institute of technology

  14. Narrow tuple: 10% selectivity • Memory stalls disappear in narrow tuples • Compression: similar to narrow (not shown) time (sec) row store column store selected bytes per tuple # attributes selected massachusetts institute of technology

  15. Varying prefetch size no competingdisk traffic • No prefetching hurts columns in single scans Column 2 Column 8 time (sec) Column 16 Column 48 (x 128KB) Row (any prefetch size) selected bytes per tuple massachusetts institute of technology

  16. Varying prefetch size no competingdisk traffic with competing disk traffic • No prefetching hurts columns in single scans • Under competing traffic, columns outperform rows for any prefetch size time (sec) selected bytes per tuple massachusetts institute of technology

  17. Analysis • Central parameter in analysis: cycles per disk byte (cpdb) • What can it model: • More / fewer disks • More / fewer CPUs • CPU / disk competing traffic • Trends in cpdb: • 10 → 30 from 1995 to 2006 • Further increase with multicore chips massachusetts institute of technology

  18. Analysis speedup of cols over rows • Rows favored by narrow tuples and low cpdb • Disk-bound workloads have higher cpdb 2 10% selectivity 50% projection 1.6 – 2 1.2 – 1.6 (cpdb) cycles per disk byte 0.8 – 1.2 0.4 – 0.8 tuple width massachusetts institute of technology

  19. See our paper for the rest • CPU time breakdowns, L2 prefetcher • Disk prefetching implementation • Compression results • Non-pipelined column scanner • Analysis massachusetts institute of technology

  20. Conclusions • Given enough space for prefetching, columns outperform rows in most workloads • Competing traffic favors columns • Memory-bandwidth bottleneck in rows • Future work • Column scanners, random I/O, write performance massachusetts institute of technology

  21. Thank you db.csail.mit.edu/projects/cstore massachusetts institute of technology

  22. Analysis parameter what it can model massachusetts institute of technology

More Related