580 likes | 585 Vues
Performance. What is Performance?. Terminology. Response Time : Time to do a task Throughput : Work done per second Latency : Time required to start a process Throughput vs Latency: https://what-if.xkcd.com/31/. Terminology. Elapsed time Total response time, including all aspects
E N D
Terminology • Response Time: Time to do a task • Throughput: Work done per second • Latency: Time required to start a process • Throughput vs Latency: https://what-if.xkcd.com/31/
Terminology • Elapsed time • Total response time, including all aspects • Processing, I/O, OS overhead, idle time • Determines system performance • CPU time • Time spent processing a given job • Discounts I/O time, other jobs’ shares • Can be split into “user” time and “system” time
Terminology • A program used to take 20 minutes to run. Now it takes 15. What is the speedup? 1.33 times or 33% or
CPU Time • Performance: Response Time - How fast you can run programs Performance =
CPU Time • Performance: Response Time - How fast you can run programs Performance =
CPU Clocking • CPU governed by clock Clock period Clock (cycles) Data transferand computation Update state Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Clock • Different subsystems, different clocks Pentium II i7
GHz Myth • Different processors = different work/clock
Clocks vs Instructions • Different jobs take different amounts of time • Common for different instructions to take differing number of clocks:
CPI • CPI = Clocks Per Instruction • Inverse of instructions per cycle • Clocks = clock ticks = machine cycles • Instructions / cycle – maximize • CPI – minimize
CPI • Programs involve different mixes of types:
CPI • CPI: Weighted average clocks per instruction • Sequence 1: IC = 5 • Clock Cycles= 2×1 + 1×2 + 2×3= 10 • Avg. CPI = 10/5 = 2.0
CPI • CPI: Weighted average clocks per instruction • Sequence 2: IC = 6 • Clock Cycles= 4×1 + 1×2 + 1×3= 9 • Avg. CPI = 9/6 = 1.5
CPI • A program is: 40% 1 cycle data ops 20% 2 cycle data ops 25% 3 cycle loads 15% stores • What is CPI?
CPI • A program is: 40% 1 cycle data ops 20% 2 cycle data ops 25% 3 cycle loads 15% stores • What is CPI?= .4 x 1 + .2 x 2 + .25 x 3 + .15 x 1= .4 + .4 + .75 + .15= 1.6
CPI Myths • Compiler A builds program:2 million 1-cycle + 1 million 2-cycle= 3 million instruction over 4 million cycles • Compiler B builds program:1.5 million 1-cycle + 1.2 million 2-cycle= 2.7 million instructions over 3.9 million cycles Faster program
CPI Myths • Compiler A:= 3 million instruction & 4 million cycles3 / 4 = .75 instruction per cycle • Compiler B:= 2.7 million instructions & 3.9 million cycles2.7 / 3.9 = .69 instructions per cycle Faster program Worse IPS
Instruction Count • Static Instruction Count : Number of instructions in compiled program • Dynamic Instruction Count : Number of instructions executed while running
Instruction Count • Static Instruction Count : Number of instructions in compiled program • Dynamic Instruction Count : Number of instructions executed while running • Real run time • Loops, skipped instructions, etc…
Measurement • Reliable performance measurement must measure all three factors Performance =
Real World • SPEC CPU Benchmark
Example • Example 1: Calculate execution time ???
Example • Example 1: Calculate execution time ???
Example • Example 1: Calculate execution time ???
Example • Example 2: Calculate CPI ???
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A?
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance =
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance = Performance = =
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Performance = Performance = =
Example • Processor A runs at 3GHz. Program P runs on it with a dynamic instruction count of 1.2E8 with a CPI of 1.8. • Processor B runs at 3.4GHz. Program P runs on it with a dynamic instruction count of 1.5E8 with a CPI of 1.6. • How much faster is the program on machine A? Speedup = Speedup = 1.022 or 2.2%
Pitfall 1 • Clock speed, CPI and instruction count all interact
Clock Speedup Issue • Clock speed up not guaranteed to increase performance
Limiting Circuits • Increasing clock may outpace time required by some circuits • This clock is too fast for the memory access:
Limiting Circuits • Increasing clock may outpace time required by some circuits • Memory access would need two cycles:
Clock Speedup Issue • Situation 1 : 100 ms per clock • 300 ms total
Clock Speedup • Situation 2 : 60 ms per clock • 180ms total
Clock Speedup • Situation 3 : 50 ms cycles • 2 cycles for part 3 • 200 ms total
MIPS • MIPS = Millions of Instructions Per Second • Blends CPI and clockspeed
MIPS Issues Different architecture have different power instructions:x += y * z; //x = r1, y = r2, z = r3 Computer A Computer B MLA r1, r2, r3, r1 MUL r1, r2, r3 ADD r1, r1, r4
MIPS Issues • MIPS can't compare different architectures Performance =
The Complete System • Performance depends on much besides CPU:
System Issues • Many technologies have followed exponential pattern like Moore’s law:
System Issues • But not all:
Amdahl's Law • Describes overall speedup of a system when we speed up one part of a system f : fraction of time part is limiting factor k : speedup of that part 1 – f : fraction of time doing other stuff S : speed up Version 1
Amdahl's Law • Describes overall speedup of a system when we speed up one part of a system Version 2