1 / 23

CHAPTER 2

CHAPTER 2. THE ROLE OF PERFORMANCE. Performance. Measure, Report, and Summarize Make intelligent choices

ova
Télécharger la présentation

CHAPTER 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER 2 THE ROLE OF PERFORMANCE

  2. Performance • Measure, Report, and Summarize • Make intelligent choices • Why is some hardware better than others for different programs?What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?)How does the machine's instruction set affect performance?

  3. Objectives: Performance and Benchmarks • What do we mean by the performance of a computer and why are we concerned with it? • What's the best way to compare the performance of two machines? • What are benchmarks? How useful are they? • Performance can be used to: • Guide design decisions • Compare architectures/implementations/compilers • However, performance is in the eye of the beholder! • Response/Execution time - time between start and completion of a task Throughput - total amount of work done in a given time (number of job processes per unit time)

  4. Computer Performance: TIME, TIME, TIME • Response Time (latency)— How long does it take for my job to run?— How long does it take to execute a job?— How long must I wait for the database query? • Throughput— How many jobs can the machine run at once?— What is the average execution rate?— How much work is getting done? • If we upgrade a machine with a new processor what do we increase? • If we add a new machine to the lab what do we increase?

  5. Measuring Performance • Factors that affect performance: • How well the program uses the instructions of the machine • How well the underlying hardware implements the instructions • How well the memory and I/O systems perform • We will compare performance of different machines on the same task • Performance of machine X for a given program is defined as: Performance (X) = 1 / Execution Time(X) • If performance of X is better than Y: Execution Time (Y) > Execution Time (X) Performance (X) > Performance (Y) because: 1 / Execution Time(X) > 1 / Execution Time(Y) • Speedup of architecture X over YPerformance(X) / Performance(Y) = Execution Time(Y) / Execution Time(X) = n meaning: X is n times faster than Y

  6. Examples • Example 1: Machine A does a task in 20s, machine B does the same task in 25s. • What is the performance of each machine? (PA = 1/20,PB = 1/25) • How much faster is A than B? (what is the speedup?) (5/4) • Is "performance" a meaningful metric? (NO: depends on task) • Example 2:Machine A executes a program in 10s. • If machine B is 1.3x faster than A, what is the execution time on machine B? (1.3 = PB/PA = TA/TB: TB = 10/1.3) • If machine C is 1.5x slower than A, what is the execution time on machine C? (1.5 = PA/PC = TC/TA: TC = 15) • But how do we measure time?

  7. Measuring Computer Time • Unix time command output on a program provides: • Real time – time from invocation to termination • User CPU time - time CPU executes within this task • System CPU time - O/S tasks performed on behalf of this task • These measures (especially elapsed time) are what users perceive. Is this response time or throughput? • How do you measure portions of a program? How do you measure time on Windows?

  8. Clock cycles, Clock Rate and Execution Time • Computers are constructed using a clock that runs at a constant rate and determines when events take place in hardware. These discrete time intervals are called: clock cycles/ticks /clock periods/cycles. • The length of a clock period is the time for a complete clock cycle (e.g., 2 nanoseconds, 2 ns). • Clock rate is the number of cycles per second, often expressed in megahertz (MHz). Clock rate is the inverse of clock period: 1/cycle time. • What is the clock rate for a 2 ns cycle? 1/(2×10-9) = 500×106 = 500 MHz • What is the clock period for a machine with a clock rate of 800 MHz? • What is the clock period for a machine with a clock rate of 400 MHz? (Answer: 1/(800×106) = 1.25×10-9 sec; 1/(400×106) = 2.5×10-9 sec) • Relationship: faster clock rate, lower clock period.

  9. time Clock cycles, Clock Rate and Execution Time • Instead of reporting execution time in seconds, we often use cycles • Clock “ticks” indicate when to start activities (one abstraction): • cycle time (clock period) = time between ticks = seconds per cycle • clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) • A 200 MHz clock ticks • A 200 MHz. clock has cycle time:

  10. Clock cycles, Clock Rate and Execution Time • How do we calculate execution time? • Factors: • How many cycles to do all the work? • How long each cycle takes (Clock Period)? • Calculation of Time using Clock Period (cycle period, cycle length) CPU Exec Time =# clock cycles × clock period [Units] seconds = cycle × seconds/cycle • Example: Assume a program requires 200 × 106 cycles on a machine where each cycle takes 2 ns. What is the execution time? (200 × 106 × 2 × 10-9 = 0.4 sec) • Calculation of Time using Clock Rate (cycle frequency, clock frequency) Clock period = 1/Clock Rate Therefore: Execution Time = # clock cycles/clock rate [Units] seconds = cycles / (cycles/second) • Example: Assume a program requires 200 × 106 cycles on a machine with clock rate of 500 MHz. What is the execution time? (200 × 106/(500 × 106) = 0.4 sec)

  11. Examples • Example 1: Machine A runs at 500 MHz. Machine B runs at 650 MHz. Program1 requires 100 x 106 clock cycles on machine A and 1.2 times that many on machine B. Which machine is faster? By how much? Exec(A) = 100 × 106 / (500 × 106) = .2 seconds OR 100 × 106× 2 × 10-9 = 200 × 10-3 = .2 s Exec(B) = 120 × 106 / (650 × 106) = .18 seconds Machine B is .2/.18 = 1.11 times faster than A Compare: 650/500 = 1.3 times clock rate • Example 2: If a program takes 10 seconds on a 500 MHz machine. • How many cycles must it require? Cycles = 10 seconds × 500 × 106 cycles/second = 5000 × 106 cycles • What clock rate would be needed to achieve a 1.2 times speedup? (assuming clock cycles can stay the same) Target Execution: 10/1.2 = 8.3 sec 5000 × 106 / 8.33 = 602 MHz

  12. 1st instruction 2nd instruction 3rd instruction ... 4th 5th 6th How many cycles are required for a program? • Could assume that # of cycles = # of instructions • This assumption is incorrect: Different instructions take different amounts of time on different machines.Why? hint: remember that these are machine instructions, not lines of C code time

  13. Different numbers of cycles for different instructions • Multiplication takes more time than addition • Floating point operations take longer than integer ones • Accessing memory takes more time than accessing registers • Important point: changing the cycle time often changes the number of cycles required for various instructions (more later) time

  14. Cycles per Instruction, (CPI) • The number of Cycles per Instruction, CPI helps software designers avoid Instructions with a high CPI in favor of those with a low CPI. • Program CPI = Average number of clock cycles per instruction. • CPI depends on hardware implementation and instruction mix. We may calculate based on instruction counts OR based on relative instruction frequencies. • Example 1: Assume 3 types of instructions: • Arithmetic (=,+,-,*,/) takes 4 cycles • Conditional (if) takes 3 cycles • I/O takes 5 cycles Consider the following code segment: cin >> num1; cin >> num2; num3 = num1 + num2; if (num3 > 10)  cout << "yes"; else  cout << "no"; a) How many cycles to complete? (5+5+8+3+5=26 cycles)b) What's the average number of cycles per instruction?(26/4 = 5.2 cycles)

  15. Program Cycles per Instruction, (CPI) • CPI Calculation with Instruction Count:Assume CPI = CPU Clock Cycles/Instruction Count then overall program CPU Clock Cycles = Σ(CPIi× Counti)so that CPI = Overall Program Cycles/#Instructions • Example 2: Assume Class A CPI=1, Class B CPI=2, Class C CPI=3 Program requires 5 A, 3 B, 2 C instructions. What is the CPI? # CPU Cycles = 5 × 1 + 3 × 2 + 2 × 3 = 17 # Instructions = 5 + 3 + 2 = 10 ThereforeCPI = 17 cycles/10 instructions = 1.7 cycles/instruction • CPI Calculation with Relative Frequencies:Let fi be the relative frequency of instruction set i with CPIi cycles per instruction. Then Program CPI = Σ(CPIi× fi) • Example 3: Assume Class A CPI=1, Class B CPI=2, Class C CPI=3 and Program uses 50% A, 30% B, 20% C instructions. What is the CPI? CPI = .5 × 1 + .3 × 2 + .2 × 3 = 1.7

  16. Program Cycles per Instruction, (CPI) • Why is: CPI = Σ(CPIi× fi) true? CPI = CPU Clock Cycles/Instr. Count = Σ(CPIi× Counti)/Instr. Count = Σ(CPIi× Counti/Instr. Count) = Σ(CPIi× fi). Execution Time Execution Time = #Cycles × cycle time = (CPI × Instr. Count) × cycle time = Instruction Count × CPI × cycle time = (Instruction Count × CPI)/Clock Rate Example 1: How long would it take to execute a program with 100 × 106 instructions if CPI is 3 and clock rate is 500 MHz? (Answer: Time = 100 × 106× 3/(500 × 106) = 3/5 = 0.6 sec)

  17. Improving Computer Performance • Time = Instruction Count × CPI × cycle time Time = (Instructions / Program)×(# Cycles / Instruction)×(Seconds / Cycle) • For a given instruction set architecture, increases in CPU performance come from three sources: • Increases in clock rate • Improvements in processor organization that lower the CPI • Compiler enhancements that lower instruction count or generate lower average CPI • Which source was used to improve performance by: • Using Intel Pentium III 933 MHz instead of Intel Pentium III 800 MHz. • Using Intel Pentium IV instead of Intel Pentium III. • Using release versions instead of debug versions of programs. • Very important: When comparing two machines, you must consider all three components of execution time. If some factors are identical, then comparison can be based on just non-identical factors.

  18. Improving Computer Performance: RISC vs. CISC • Time = (Instructions / Program)×(# Cycles / Instruction)×(Seconds / Cycle) • Computer Architectures can be categorized s RISC or CISC (Reduced Instruction Set Computer vs. Complex Instruction Set Computer). • The CISC approach attempts to minimize the number of instructions per program, sacrificing the number of cycles per instruction. • Emphasizes improving hardware • Includes multi-clock complex instructions • RISC does the opposite, reducing the cycles per instruction at the cost of the number of instructions per program. • Emphasis on software • Includes single-clock reduced instruction only • Modern architectures emphasizes RISC

  19. Improving Computer Performance • Example 2: Machine 1and Machine 2 both have clock speeds of 500 MHz On Machine 1, program P requires 100 × 106 instructions & has a CPI of 2.5 On Machine 2, program P requires 90 × 106 instructions & has a CPI of 3 Which machine is faster? By how much?(T1 = 0.5 sec, T2 = 0.54 sec, Machine 1 is 1.08 times faster) Evaluating Computer Performance: • A company that uses the same set of programs day in, day out uses the same programs (workload) to compare systems (e.g. old vs. new) • What if a company does not fall in these categories?Use some kind of rating.

  20. Evaluating Computer Performance: Goal: simple metric where higher rating means better performance. Some ratings are: • Native MIPS • Peak MIPS • Relative MIPS • MOPS, MFLOPS For all these measures, there is a tendency to generalize, which is not valid. • Benchmarks: Programs specifically chosen to measure performance. Organization in charge of Benchmarks is: System Performance Evaluation Cooperative (SPEC). The rating is the SPEC ratio with respect to some standard machine. • The higher the SPEC ratio, the better the machine.

  21. SPEC ’89 for IBM Powerstation 550 • Compiler “enhancements” and performance

  22. Summary • Performance of a computer can be measured by: Response/Execution time - time between start and completion of a task and Throughput - total amount of work done in a given time. • Factors determining execution time are: Number of cycles to do all the work and how long each cycle takes (Clock Period). • CPI helps software designers avoid Instructions with a high CPI in favor of those with a low CPI where possible. • Program CPI can be obtained from Instruction Count or from the instruction relative frequencies. • Improving Performance means decreasingTime = Instruction Count × CPI × cycle time = (Instr. / Program)×(# Cycles / Inst.)×(Seconds / Cycle) by • Increases in clock rate • Improvements in processor organization that lower the CPI • Compiler enhancements that lower instruction count or generate lower average CPI • Ratings of Computer Performances are: MIPS, MOPS, MFLPOS and by using Benchmarks.

  23. Performance Formulas

More Related