1 / 58

Performance

Performance. What is Performance?. Terminology. Response Time : Time to do a task Throughput : Work done per second Latency : Time required to start a process Throughput vs Latency: https://what-if.xkcd.com/31/. Terminology. Elapsed time Total response time, including all aspects

oshafer
Télécharger la présentation

Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance

  2. What is Performance?

  3. Terminology • Response Time: Time to do a task • Throughput: Work done per second • Latency: Time required to start a process • Throughput vs Latency: https://what-if.xkcd.com/31/

  4. Terminology • Elapsed time • Total response time, including all aspects • Processing, I/O, OS overhead, idle time • Determines system performance • CPU time • Time spent processing a given job • Discounts I/O time, other jobs’ shares • Can be split into “user” time and “system” time

  5. Terminology • A program used to take 20 minutes to run. Now it takes 15. What is the speedup? 1.33 times or 33% or

  6. CPU Time • Performance: Response Time - How fast you can run programs Performance =

  7. CPU Time • Performance: Response Time - How fast you can run programs Performance =

  8. CPU Clocking • CPU governed by clock Clock period Clock (cycles) Data transferand computation Update state Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz

  9. Clock • Different subsystems, different clocks Pentium II i7

  10. GHz Myth • Different processors = different work/clock

  11. Clocks vs Instructions • Different jobs take different amounts of time • Common for different instructions to take differing number of clocks:

  12. CPI • CPI = Clocks Per Instruction • Inverse of instructions per cycle • Clocks = clock ticks = machine cycles • Instructions / cycle – maximize • CPI – minimize

  13. CPI • Programs involve different mixes of types:

  14. CPI • CPI: Weighted average clocks per instruction • Sequence 1: IC = 5 • Clock Cycles= 2×1 + 1×2 + 2×3= 10 • Avg. CPI = 10/5 = 2.0

  15. CPI • CPI: Weighted average clocks per instruction • Sequence 2: IC = 6 • Clock Cycles= 4×1 + 1×2 + 1×3= 9 • Avg. CPI = 9/6 = 1.5

  16. CPI • A program is: 40% 1 cycle data ops 20% 2 cycle data ops 25% 3 cycle loads 15% stores • What is CPI?

  17. CPI • A program is: 40% 1 cycle data ops 20% 2 cycle data ops 25% 3 cycle loads 15% stores • What is CPI?= .4 x 1 + .2 x 2 + .25 x 3 + .15 x 1= .4 + .4 + .75 + .15= 1.6

  18. CPI Myths • Compiler A builds program:2 million 1-cycle + 1 million 2-cycle= 3 million instruction over 4 million cycles • Compiler B builds program:1.5 million 1-cycle + 1.2 million 2-cycle= 2.7 million instructions over 3.9 million cycles Faster program

  19. CPI Myths • Compiler A:= 3 million instruction & 4 million cycles3 / 4 = .75 instruction per cycle • Compiler B:= 2.7 million instructions & 3.9 million cycles2.7 / 3.9 = .69 instructions per cycle Faster program Worse IPS

  20. Instruction Count • Static Instruction Count : Number of instructions in compiled program • Dynamic Instruction Count : Number of instructions executed while running

  21. Instruction Count • Static Instruction Count : Number of instructions in compiled program • Dynamic Instruction Count : Number of instructions executed while running • Real run time • Loops, skipped instructions, etc…

  22. Measurement • Reliable performance measurement must measure all three factors Performance =

  23. Real World • SPEC CPU Benchmark

  24. Example • Example 1: Calculate execution time ???

  25. Example • Example 1: Calculate execution time ???

  26. Example • Example 1: Calculate execution time ???

  27. Example • Example 2: Calculate CPI ???

  28. Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A?

  29. Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance =

  30. Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance = Performance = =

  31. Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance = Performance = =

  32. Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Speedup = Speedup = 1.022 or 2.2%

  33. Pitfalls

  34. Pitfall 1 • Clock speed, CPI and instruction count all interact

  35. Clock Speedup Issue • Clock speed up not guaranteed to increase performance

  36. Limiting Circuits • Increasing clock may outpace time required by some circuits • This clock is too fast for the memory access:

  37. Limiting Circuits • Increasing clock may outpace time required by some circuits • Memory access would need two cycles:

  38. Clock Speedup Issue • Situation 1 : 100 ms per clock • 300 ms total

  39. Clock Speedup • Situation 2 : 60 ms per clock • 180ms total

  40. Clock Speedup • Situation 3 : 50 ms cycles • 2 cycles for part 3 • 200 ms total

  41. MIPS • MIPS = Millions of Instructions Per Second • Blends CPI and clockspeed

  42. MIPS Issues Different architecture have different power instructions:x += y * z; //x = r1, y = r2, z = r3 Computer A Computer B MLA r1, r2, r3, r1 MUL r1, r2, r3 ADD r1, r1, r4

  43. MIPS Issues • MIPS can't compare different architectures Performance =

  44. Other Components

  45. The Complete System • Performance depends on much besides CPU:

  46. System Issues • Many technologies have followed exponential pattern like Moore’s law:

  47. System Issues • But not all:

  48. Amdahl's Law

  49. Amdahl's Law • Describes overall speedup of a system when we speed up one part of a system f : fraction of time part is limiting factor k : speedup of that part 1 – f : fraction of time doing other stuff S : speed up Version 1

  50. Amdahl's Law • Describes overall speedup of a system when we speed up one part of a system Version 2

More Related