Evaluating Performance with Benchmarks

Benchmarks • Programs specifically chosen to measure performance • Must reflect typical workload of the user • Benchmark types • Real applications • Small benchmarks • Benchmark suites • Synthetic benchmarks

Real Applications • Workload: Set of programs a typical user runs day in and day out. • To use these real applications for metrics is a direct way of comparing the execution time of the workload on two machines. • Using real applications for metrics has certain restrictions: • They are usually big • Takes time to port to different machines • Takes considerable time to execute • Hard to observe the outcome of a certain improvement technique

Comparing & Summarizing Performance • A is 100 times faster than B for program 1 • B is 10 times faster than A for program 2 • For total performance, arithmetic mean is used:

Arithmetic Mean • If each program, in the workload, are not run equal # times, then we have to use weighted arithmetic mean: • Suppose that the program 1 runs 10 times as often as the program 2. Which machine is faster?

Small Benchmarks • Small code segments which are common in many applications • For example, loops with certain instruction mix • for (j = 0; j<8; j++) S = S + Aj  Bi-j • Good for architects and designers • Since small code segments are easy to compile and simulate even by hand, designers use these kind of benchmarks while working on a novel machine • Can be abused by compiler designers by introducing special-purpose optimizations targeted at specific benchmark.

Benchmark Suites • SPEC (Standard Performance Evaluation Corporation) • Non-profit organization that aims to produce "fair, impartial and meaningful benchmarks for computers” • Began in 1989 - SPEC89 (CPU intensive) • Companies agreed on a set of real programs and inputs which they hope reflect a typical user’s workload best. • Valuable indicator of performance • Can still be abused • Updates are required as the applications and their workload change by time

SPEC Benchmark Sets • CPU Performance (SPEC CPU2006) • Graphics (SPECviewperf) • High-performance computing (HPC2002, MPI2007, OMP2001) • Java server applications (jAppServer2004) • a multi-tier benchmark for measuring the performance of Java 2 Enterprise Edition (J2EE) technology-based application servers. • Mail systems (MAIL2001, SPECimap2003) • Network File systems (SFS97_R1 (3.0)) • Web servers (SPEC WEB99, SPEC WEB99 SSL) • More information: http://www.spec.org/

SPECInt

SPECfp

SPEC CPU2006 – Summarizing • SPEC ratio: the execution time measurements are normalized by dividing the measured execution time by the execution time on a reference machine • Sun Microsystems Fire V20z, which has anAMD Opteron 252 CPU, running at 2600 MHz. • 164.gzip benchmark executes in 90.4 s. • The reference time for this benchmark is 1400 s, • benchmark is 1400/90.4 × 100 = 1548 (a unitless value) • Performances of different programs in the suites are summarized using “geometric mean” of SPEC ratios.

Pentium III & Pentium 4

Comparing Pentium III and Pentium 4 Implementation efficiency?

SPEC WEB99

Power Consumption Concerns • Performance studied at different levels: • Maximum power • Intermediate level that conserves battery life • Minimum power that maximizes battery life • Intel Mobile Pentium & Pentium M: two available clock rates • Maximum • Reduced clock rate • Pentium M @ 1.6/0.6 GHz • Pentium 4-M @ 2.4/1.2 GHz • Pentium III-M @ 1.2/0.8 GHz

Three Intel Mobile Processors

Energy Efficiency

Synthetic Benchmarks • Artificial programs constructed to try to match the characteristics of a large set of program. • Goal: Create a single benchmark program where the execution frequency of instructions in the benchmark simulates the instruction frequency in a large set of benchmarks. • Examples: • Dhrystone, Whetstone • They are not real programs • Compiler and hardware optimizations can inflate the improvement far beyond what the same optimization would do with real programs

Amdahl’s Law in Computing • Improving one aspect of a machine by a factor of n does not improve the overall performance by the same amount. • Speedup = (Performance after imp.) / (Performance before imp.) • Speedup = (Execution time before imp.)/(Execution time after imp.) • Execution Time After Improvement = Execution Time Unaffected +(Execution Time Affected/n)

Amdahl’s Law • Example: Suppose a program runs in 100 s on a machine, with multiplication responsible for 80 s of this time. • How much do we have to improve the speed of multiplication if we want the program to run 4 times faster? • Can we improve the performance by a factor 5?

Amdahl’s Law • The performance enhancement possible due to a given improvement is limited by the amount that the improved feature is used. • In previous example, it makes sense to improve multiplication since it takes 80% of all execution time. • But after certain improvement is done, the further effort to optimize the multiplication more will yield insignificant improvement. • Law of Diminishing Returns • A corollary to Amdahl’s Law is to make a common case faster.

Examples • Suppose we enhance a machine making all floating-point instructions run five times faster. If the execution time of some benchmark before the floating-point enhancement is 10 seconds, what will the speedup be if half of the 10 seconds is spent executing floating-point instructions? • We are looking for a benchmark to show off the new floating-point unit described above, and want the overall benchmark to show a speedup of 3. One benchmark we are considering runs for 90 seconds with the old floating-point hardware. How much of the execution time would floating-point instructions have to account for in this program in order to yield our desired speedup on this benchmark?

Remember • Total execution time is a consistent summary of performance • Execution Time = (IC  CPI)/f • For a given architecture, performance increases come from: • increases in clock rate (without too much adverse CPI effects) • improvements in processor organization that lower CPI • compiler enhancements that lower CPI and/or IC

Evaluating Performance with Benchmarks

Evaluating Performance with Benchmarks

Presentation Transcript

BENCHMARKS

Safety Benchmarks

Energy Benchmarks

TPC Benchmarks

Benchmarks

Presidential Benchmarks

Benchmarks

Benchmarks

Broadband Benchmarks

BLAST benchmarks

Industry Benchmarks

Benchmarks

Historical Benchmarks

TPC Benchmarks

SERVICE BENCHMARKS

Benchmarks

Benchmarks

Measurement Benchmarks

TPC Benchmarks

Historical Benchmarks

Historical Benchmarks

RAMDISK Benchmarks