Performance Evaluation of Architectures

Performance Evaluationof Architectures Vittorio Zaccaria

Performance Evaluation • From the client perspective: • response time (or latency): time to run the task. • From the server perspective: • Throughput (or bandwidth): tasks executed per second.

Speedup • X is n% faster than Y if: ExTime(y) Speedup(x,y)= -------------- = 1+n/100 ExTime(x)

Performance and Speedup • Performance(A)=1/ExTime(A). • Speedup(x,y)= Performance(x)/Performance(y)

Excercise: • A executes a task in 10 secs. • B executes the same task in 15 secs • What is true? • A is 50% faster than B • A is 33% faster than B

Excercise (15 min) • Linpack and Dhrystone benchmarks on several VAX models:

Excercise: • Calculate: • In the Linpack case: • Total speedup and average per-year speedup from VAX8600 to VAX780 • The same for VAX8550 and VAX8600 • In the Dhrystone case: • Total speedup and average per-year speedup from VAX8600 to VAX780 • The same for VAX8550 and VAX8600

Excercise speedup Average per Year speedup

Amdahl's Law

Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced If speedup-enhanced goes to infinity, speedup-oveall reaches 1/(1-fraction_enhanced)

Excercise on Amdhal’s Law • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedupoverall = ?

Excercise on Amdhal’s Law Solution: ExTimenew= ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95

2nd Excercise on Amdhal’s Law • Suppose to improve the CPU speed 5X (with a 5X cost) • Suppose that the CPU is used 50% of the time and that the base CPU cost is 1/3 of the entire system • It is worth to upgrade the CPU? Compare speedup and costs!

2nd Excercise on Amdhal’s Law • Speedup=1/(0.5+0.5/5)=1.67 • Increased= (2/3)+(1/3)*5=2.33  It is not worth to upgrade the CPU!

Performance Indexes • Response time = latency due to the completion of a taskincluding disk accesses, memory accesses, I/O Activity and other parallel tasks. • CPU time = does not include I/O wait time and corresponds to CPU user time and the CPU system time (OS)

CPU time • CPUtime(P)= Clock Cycles needed to exec P ------------------------------------- clock frequency

Average CPI The average Clock Cycles per Instruction (CPI) can be defined as: clock cycles needed to exec. P CPI(P)= --------------------------------------- number of instructions CPUtime= Tclock*CPI*Ninst = (CPI*Ninst)/f

CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU performance

Aspects of CPU performance • The CPI can vary among instructions: • CPI_i is the number of clock cycles needed by instruction type i • IC_i is the number of times that instruction i is executed. n Σ CPU time = CycleTime CPI * IC * i i = 1 i

Overall CPI • The overall CPI can be expressed as (CPU clock cycles)/Instructions: n Σ CPI = CPI * ( I / instructions) i i = 1 i Invest Resources where time is Spent!

Excercise A RISC processor shows the following statistics: Base Machine (Reg / Reg) Op Freq Cycles ALU 50% 1 Load 20% 5 Store 10% 3 Branch 20% 2 • Calculate the average CPI and the speedup w.r.t.: • The same machine with an improved D$ (Load Cycles=2) • The same machine with a branch CPI=1 • The same machine with 2 ALUs working in parallel.

Solution • Average CPI: 0.5x1+0.2x5+0.1x3+0.2x2=2.2 • Use Amdhal’s law to compute overall speedup: • Cache improved Speedup: 1.13 • Branch improved Speedup: 1.11 • ALU improved Speedup: 1.33

Excercise • Procedure calls in architecture A are very expensive. • Suppose to introduce a new architecture B similar to A such that: • A has a clock 5% faster than B. • The fraction of loads/stores of A is 30%. • B executes 30% loads/stores less than A • Loads/stores require 1 clock cycle. • Compare CPU times of A and B.

Solution • Number of instr. of B NB = [1-(0.3x0.3)]*NA=0.9*NA • Clock Period of B: TB=TA*1.05 • CPUtimeA=1*NA*TA • CPUtimeB=0.9*NA*TA*1.05*1 =0.945*CPUtimeA

MIPS • MIPS= millions of instructions per second. number of instructions frequency of the clock ------------------------------------ = -------------------------------- execution time(in sec) * 10^6 CPI * 10^6

MIPS (cont.) • Problem: depends heavily on the ISA. Difficult to compare different ISAs • It depends on the program • It can be the inverse of the performance!! A complex instruction set can have a MIPS lower than a simple instruction set but can execute in less time programs.

Relative MIPS • Relative MIPS of an architecture A: TCPU_A ------------------ x MIPS_reference_arch TCPU_reference_arch • In the 80’s the reference architecture was the VAX_11/780

Performance Evaluation of Architectures

Performance Evaluation of Architectures

Presentation Transcript

Performance Analysis of Software Architectures

Evaluation of Student Performance:

Evaluation of IR Performance

Performance Analysis of Multiprocessor Architectures

GlobeTraff A traffic workload generator for the performance evaluation of ICN architectures

Performance Evaluation of Packet Classiﬁcation on FPGA-based TCAM Emulation Architectures

Evaluation of Pupil Performance

Simulation Evaluation of Web Caching Architectures

Evolution of High Performance Cluster Architectures

Performance and Productivity of Emerging Architectures

Evaluation of Financial Performance

Performance Evaluation

Evaluation of Modern Parallel Vector Architectures

Evaluation of Investment Performance

Design and Performance Evaluation of Networked Storage Architectures

Simulation Evaluation of Web Caching Architectures

Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures

Evaluation of Financial Performance

Evaluation of Model Performance

Evolution of High Performance Cluster Architectures