SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS Presented by Ankit Patel Authors: Stijn Everman Lieven Eeckhout

Summary of this paper • Creates theoretical foundation for performance measurement of a given system, from a mathematical standpoint • From whose perspective should we measure the performance of a given system? • User • System • Combination of both

Current performance measurement • Researchers have reached the consensus that the performance metric of choice for assessing a single program’s performance is its execution time • For single-threaded programs, execution time is proportional to CPI (Cycles Per Instructions) or inversely proportional to IPC (Instructions Per Cycle)

Performance for multithreaded programs • Only CPI calculation is poor performance metrics. • It should use total execution time while measuring performance.

How should I measure system performance?

System-level performance criteria • The criteria for evaluating multiprogram computer systems are based on the user’s perspective and the system’s perspective. • What is User’s perspective? • How fast a single program is executed • What is system’s perspective? • Throughput

Its Time For Some Terminologies

Terminologies • Turnaround time: Quantifies the time between submitting a job and its completion. • Response time: Measures the time between submitting a job and receiving its first response; this metric is important for interactive applications. • Throughput: quantifies the number of programs completed per unit of time. at

Continues…. • Single-program mode: A single program has exclusive access to the computer system. It has all system resources at its disposal and is never interrupted or preempted during its execution. • Multiprogram mode: Multiple programs are coexecuting on the computer system.

Its Time For Some Mathematics and few more terminologies

Turnaround Time • Normalized Turnaround Time(NTT): • Average NTT • Max NTT

System throughput • Normalized Progress: • System Throughput

Practical (Why I say practical???) • Adjusted ANTT: • Adjusted STP:

IPC Throughput (…keep this in mind…): • Weighted Speedup: • Harmonic Average (Hmean):

Co-executing programs in multiprogram mode experience equal relative progress with respect to single-program mode • Fairness: • Proportional Progress (for different priorities):

So….fairness becomes…

Enough theories……..How can I apply this in real world performance measurements?

OK……..Then lets do a case study ….

Case study: Evaluating SMT fetch policies • What should be used in performance measurements? • Researchers should use multiple metrics for characterizing multiprogram system performance. • Combination of ANTT and STP provides a clear picture of overall system performance as a balance between user-oriented program turnaround time and system-oriented throughput. • Involves user level single-threaded workloads, does not affect the general applicability of the multiprogram performance metrics. • ANTT-STP characterization is applicable to multithreaded and full-system workloads. • Used ANTT and STP metrics to evaluate performance and for multithreaded full-system workloads, used the cycle-count-based equations.

Ooops…I have to introduce few more terminologies !!!

Six SMT fetch policies • Icount: • Strive to have an equal # of instructions from all co-executing programs • Stall fetch: • Stalls the fetch of a program that experiences a long-latency load until data returns from memory. • Predictive stall fetch: • Extends the stall fetch policy by predicting long-latency loads in the front-end pipeline • MLP-aware stall fetch: • Predicts long latency loads and their associated memory-level parallelism • Flush: • Flushes on long-latency loads • MLP-aware flush: • Extends the MLP aware stall fetch policy by flushing instructions if more than m instructions have been fetched since the first burst of long-latency loads.

….And this was the last theory …I promise !!!

Simulation environment • Software used: SimPoint • 36 two program workload • 30 four program workload • Simulation points are shosen for SPEC 2000 benchmarks (200 million instructions each) • Four-wide superscaler, out-of-order SMT processor with an aggressive hardware data prefetcher with eight stream buffers

MLPaware flush policy outperforms Icount for both the two- and four-program workloads • That is, it achieves a higher system throughput and a lower average normalized turnaround time, while achieving a comparable fairness level.

The same is true when we compare MLP-aware flush against flush for the two-program workloads; for the four-program workloads, MLP-aware flush achieves a much lower normalized turnaround time than flush at a comparable system throughput. • MLP-aware stall fetch achieves a smaller ANTT, whereas predictive stall fetch achieves a higher STP.

Interesting…….So what are you trying to conclude here???

What does this show? • Delicate balance between user-oriented and system-oriented views of performance. • If user-perceived performance is the primary objective, MLP-aware stall fetch is the better fetch policy. • If system perceived performance is the primary objective, predictive stall fetch is the policy of choice.

While I was introducing terminologies, IPC throughput, I said……keep this in mind…….remember?

IPC Throughput as performance measurement is misleading Using IPC throughput as a performance metric, you would conclude that the MLP-aware flush policy is comparable to the flush policy. However, it achieves a significantly higher system throughput (STP). Thus, IPC throughput is a potentially misleading performance metric.

Summary • Gives theoretical foundation for measuring system performance • Don’t judge the system performance for multicore systems merely based on IPC throughput or CPI • Use quantitative approach for performance measurements for multicore systems. Few of those are mentioned in this paper

Questions, Comments, Concerns ???

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS

Presentation Transcript

Five-Level Performance Management System

PERFORMANCE METRICS

Crescando : Predictable Performance for Unpredictable Workloads

Article Level Metrics

Predicting System Performance for Multi-tenant Database Workloads

Performance Metrics for June 2009

B. QoS parameters - system performance metrics

SYSTEM-LEVEL PERFORMANCE METRICS FOR MULTIPROGRAM WORKLOADS

Performance Metrics for Resilient Networks

Performance Metrics

System-level Performance Management

System Performance Metrics and Current Performance Status

Performance Metrics

Health Workforce System and Performance Metrics

Performance Metrics for Weatherization

Performance Metrics for Weatherization

Performance Metrics

Performance Metrics

Performance metrics for caches