100 likes | 217 Vues
This paper presents an innovative simulation methodology that allows for cost-effective modeling of a $2 million commercial server on a $2,000 PC. Key contributions include tuned benchmarks, workload tweaking, and addressing variability. The case study focuses on OLTP simulations based on TPC-C with IBM DB2, demonstrating techniques for scaling, concurrency, and memory optimization. Results reveal effective strategies for reducing simulation times while maintaining accuracy, enabling significant insights into multiprocessor system performance and bottlenecks.
E N D
A.R. Alameldeen, M.M.K. Martin, C.J. Mauer, K.E. Moore, M. Xu, D.J. Sorin, M.D. Hill, D.A. Wood Presented By: Derek Hower Simulating a $2M Commercial Server on a $2K PC
Contributions • Develop a cost and time efficient simulation methodology for multiprocessor systems. • Tuned and scaled benchmarks • Dealing with variability • Extended timing simulator
Workload Tweaking • Wisconsin Commercial Workload Suite • OLTP – On-Line Transaction Processing • SPECjbb – Java Middleware • Apache – Static Web Server • Slashcode – Dynamic Web Server • Scaled to reduce memory and disk usage • Tuned on an actual multiprocessor server to discover bottlenecks
Case Study: OLTP • Based on TPC-C v3.0, using IBM DB2 V7.2 EEE • Scaled to 3 sales districts per warehouse, 30 customers per district, and 100 items per warehouse • Compared to 10, 30,000 and 100,000 required by TPC • Set up on a Sun E5000 • Disk images were moved to simulator
Case Study: OLTP cont • Initial Scaling - • Reduced entire simulation to fit in 1GB of memory (10 100MB warehouses) • Kernel/device tuning • Changed limits on semaphore usage, threads, locks, etc • Database separated from kernel and spread out over 5 physical disks • Reducing contention • increased # of warehouses, keeping db size constant
Case Study: OLTP cont • Additional Concurrency • Added more users
Simulation • Shorten simulations as much as possible while still maintaining accuracy • Start with warm workloads using snapshots • Fixed simulation length based on # of transactions • Account for variability by introducing random memory access delays and by averaging multiple simulation runs
Timing • Added proc and memory timing models to Simics • Timing-first simulation • Memory model: • cache coherence • cache latencies and bandwidth • memory • interconnection network
Evaluation • Simulated system using Bandwidth Adaptive Snooping Hybrid (BASH)
Thoughts • Validation • Mentioned briefly but skirted the issue • Can we trust the data? • Is there a loss of generality when scaling and tuning workloads?