Memory System Characterization of Commercial Workloads by L.A. Barroso, K. Gharachorloo, and E. Bugnion

Memory System Characterization of Commercial WorkloadsL.A. Barroso, K. Gharachorloo and E. Bugnion Western Research LaboratoryDigital Equipment Corporation Presented by: Eric Carty-Fickes

Introduction • commercial workloads > engineering • but most still using scientific benchmarks (in 1998) • difficult to create commercial benchmarks • large, expensive, proprietary, changing • paper uses commercial workloads to study current trends

Database Workloads • first two run on Oracle DB server • OLTP • small r/w queries on part of DB • models banking req’s in dedicated mode • more kernel time; hides I/O • DSS (decision support systems) • long read-only queries on much of DB • models wholesaler’s SQL queries • fewer context-switches

Database Workloads • Web Index Search • doesn’t require DB server • multiple threads hide misses • read-only req’s and cached recent searches

Test Systems • 4 processor AlphaServer 4100 and 8 processor 8400 for hardware testing • IPROBE tool for event counting • DCPI for profiling • ATOM for studying ORACLE • SimOS for testing architectural changes • models Alpha 21164 • simplified, but still with some detail

Aspects of Testing • 3 issues: memory size, I/O bandwidth, runtime • scale down DB • change block buffer cache sizes • OLTP and DSS: need to warm up SGA before testing; need to scale DB to be resident • Web Index: no scaling – same system

Hardware Results • OLTP – higher CPI, maybe due to TPC-B • long secondary cache latency • lots of primary cache misses, esp Icache • dirty miss latency significant, lots of communication • DSS – lower CPI means this config works • only suggestion is larger 1st level caches • AltaVista – use 8400 just like original • good CPI, well written code • 1st level caches important

Simulator Results • simulator like hardware, some cache and consistency differences = different timing • close cycle counts, miss rates • OLTP – test assoc and Bcache size • idle time increase when servers can’t hide I/O • lots of cache intricacies… • bigger caches = fewer replacemt, inst misses – more important for OLTP than DSS • bigger lines = more true sharing, less cold missing

Conclusions • scaled OLTP and DSS give a decent estimate of real performance • fairly narrow range of architectural issues explored • more processes/processor = less I/O latency, fewer dirty misses • simulators gloss over important details for ease of use (timing, OS, etc.)

Questions • Can you get enough information by scaling down the DB and playing tricks with block buffer sizes?

Memory System Characterization of Commercial Workloads by L.A. Barroso, K. Gharachorloo, and E. Bugnion