1 / 27

Performance Evaluation of Architectures

Performance Evaluation of Architectures. Vittorio Zaccaria. Performance Evaluation. From the client perspective: response time (or latency): time to run the task. From the server perspective: Throughput (or bandwidth): tasks executed per second. Speedup. X is n% faster than Y if:

kacia
Télécharger la présentation

Performance Evaluation of Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Evaluationof Architectures Vittorio Zaccaria

  2. Performance Evaluation • From the client perspective: • response time (or latency): time to run the task. • From the server perspective: • Throughput (or bandwidth): tasks executed per second.

  3. Speedup • X is n% faster than Y if: ExTime(y) Speedup(x,y)= -------------- = 1+n/100 ExTime(x)

  4. Performance and Speedup • Performance(A)=1/ExTime(A). • Speedup(x,y)= Performance(x)/Performance(y)

  5. Excercise: • A executes a task in 10 secs. • B executes the same task in 15 secs • What is true? • A is 50% faster than B • A is 33% faster than B

  6. Excercise (15 min) • Linpack and Dhrystone benchmarks on several VAX models:

  7. Excercise: • Calculate: • In the Linpack case: • Total speedup and average per-year speedup from VAX8600 to VAX780 • The same for VAX8550 and VAX8600 • In the Dhrystone case: • Total speedup and average per-year speedup from VAX8600 to VAX780 • The same for VAX8550 and VAX8600

  8. Excercise speedup Average per Year speedup

  9. Amdahl's Law

  10. Amdahl’s Law ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 ExTimeold ExTimenew Speedupoverall = = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced If speedup-enhanced goes to infinity, speedup-oveall reaches 1/(1-fraction_enhanced)

  11. Excercise on Amdhal’s Law • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP Speedupoverall = ?

  12. Excercise on Amdhal’s Law Solution: ExTimenew= ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold 1 Speedupoverall = = 1.053 0.95

  13. 2nd Excercise on Amdhal’s Law • Suppose to improve the CPU speed 5X (with a 5X cost) • Suppose that the CPU is used 50% of the time and that the base CPU cost is 1/3 of the entire system • It is worth to upgrade the CPU? Compare speedup and costs!

  14. 2nd Excercise on Amdhal’s Law • Speedup=1/(0.5+0.5/5)=1.67 • Increased= (2/3)+(1/3)*5=2.33  It is not worth to upgrade the CPU!

  15. Performance Indexes • Response time = latency due to the completion of a taskincluding disk accesses, memory accesses, I/O Activity and other parallel tasks. • CPU time = does not include I/O wait time and corresponds to CPU user time and the CPU system time (OS)

  16. CPU time • CPUtime(P)= Clock Cycles needed to exec P ------------------------------------- clock frequency

  17. Average CPI The average Clock Cycles per Instruction (CPI) can be defined as: clock cycles needed to exec. P CPI(P)= --------------------------------------- number of instructions CPUtime= Tclock*CPI*Ninst = (CPI*Ninst)/f

  18. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU performance

  19. Aspects of CPU performance • The CPI can vary among instructions: • CPI_i is the number of clock cycles needed by instruction type i • IC_i is the number of times that instruction i is executed. n Σ CPU time = CycleTime CPI * IC * i i = 1 i

  20. Overall CPI • The overall CPI can be expressed as (CPU clock cycles)/Instructions: n Σ CPI = CPI * ( I / instructions) i i = 1 i Invest Resources where time is Spent!

  21. Excercise A RISC processor shows the following statistics: Base Machine (Reg / Reg) Op Freq Cycles ALU 50% 1 Load 20% 5 Store 10% 3 Branch 20% 2 • Calculate the average CPI and the speedup w.r.t.: • The same machine with an improved D$ (Load Cycles=2) • The same machine with a branch CPI=1 • The same machine with 2 ALUs working in parallel.

  22. Solution • Average CPI: 0.5x1+0.2x5+0.1x3+0.2x2=2.2 • Use Amdhal’s law to compute overall speedup: • Cache improved Speedup: 1.13 • Branch improved Speedup: 1.11 • ALU improved Speedup: 1.33

  23. Excercise • Procedure calls in architecture A are very expensive. • Suppose to introduce a new architecture B similar to A such that: • A has a clock 5% faster than B. • The fraction of loads/stores of A is 30%. • B executes 30% loads/stores less than A • Loads/stores require 1 clock cycle. • Compare CPU times of A and B.

  24. Solution • Number of instr. of B NB = [1-(0.3x0.3)]*NA=0.9*NA • Clock Period of B: TB=TA*1.05 • CPUtimeA=1*NA*TA • CPUtimeB=0.9*NA*TA*1.05*1 =0.945*CPUtimeA

  25. MIPS • MIPS= millions of instructions per second. number of instructions frequency of the clock ------------------------------------ = -------------------------------- execution time(in sec) * 10^6 CPI * 10^6

  26. MIPS (cont.) • Problem: depends heavily on the ISA. Difficult to compare different ISAs • It depends on the program • It can be the inverse of the performance!! A complex instruction set can have a MIPS lower than a simple instruction set but can execute in less time programs.

  27. Relative MIPS • Relative MIPS of an architecture A: TCPU_A ------------------ x MIPS_reference_arch TCPU_reference_arch • In the 80’s the reference architecture was the VAX_11/780

More Related