1 / 26

CpE 442 Introduction to Computer Architecture The Role of Performance

CpE 442 Introduction to Computer Architecture The Role of Performance. Instructor: H. H. Ammar. Overview of Today’s Lecture: The Role of Performance. Review from Last Lecture Definition and Measures of Performance Summarizing Performance and Performance Pitfalls .

ehren
Télécharger la présentation

CpE 442 Introduction to Computer Architecture The Role of Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CpE 442Introduction to Computer ArchitectureThe Role of Performance Instructor: H. H. Ammar

  2. Overview of Today’s Lecture: The Role of Performance • Review from Last Lecture • Definition and Measures of Performance • Summarizing Performance and Performance Pitfalls

  3. Review: What is "Computer Architecture" ° Co-ordination of levels of abstraction Application Operating System Compiler Instruction Set Architecture Instr. Set Proc. I/O system Digital Design Circuit Design ° Under a set of rapidly changing Forces

  4. Review: Levels of Representation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) Compiler Assembly Language Program Assembler 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Language Program Machine Interpretation Control Signal Specification

  5. Review: Levels of Organization SPARCstation 20 Computer SPARC Processor Memory Devices Control Input Datapath Output

  6. Computer Architecture Simulation Tools 1. The HASE Architecture Simulation Environment2. The New Compiler Technology simulation (shown in class)3. MIPS Assembly Language Simulators a. SPIM A MIPS32 Simulatorhttp://pages.cs.wisc.edu/~larus/spim.html b. MARS (MIPS Assembler and Runtime Simulator)http://courses.missouristate.edu/kenvollmar/mars/

  7. Review: Summary from Last Lecture • All computers consist of five components • Processor: (1) datapath and (2) control • (3) Memory • (4) Input devices and (5) Output devices • Not all “memory” are created equally • Cache: fast (expensive) memory are placed closer to the processor • Main memory: less expensive memory--we can have more • Input and output (I/O) devices has the messiest organization • Wide range of speed: graphics vs. keyboard • Wide range of requirements: speed, standard, cost ... etc. • Least amount of research (so far)

  8. Metrics of performance Response time, Answers per month Operations per second Application Programming Language Compiler (millions) of Instructions per second – MIPS (millions) of (F.P.) operations per second – MFLOP/s ISA Datapath Megabytes per second Control Function Units Cycles per second (clock rate) Transistors Wires Pins

  9. Relating Processor Metrics • CPU execution time = CPU clock cycles/pgm X clock cycle time • or CPU execution time = CPU clock cycles/pgm ÷ clock rate • Define CPI = the avg. clock cycles per instruction, CPI tells us something about the Instruction Set Architecture, the Implementation of that architecture, and the program being measured • CPU clock cycles/pgm = Instructions/pgm X CPI • or CPI = CPU clock cycles/pgm ÷ Instructions/pgm

  10. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU Performance, instr. count CPI clock rate Program Compiler Instr. Set Arch. Organization Technology

  11. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU Performance instr count CPI clock rate Program X (x) Compiler X (x) Instr. Set. X X Organization X X Technology X

  12. Organizational Trade-offs Application Programming Language Compiler ISA Instruction Mix Single-Cycle Processor Design CPI=1, large cycle time-Slow clock Multi-cycle Processor Design CPI > 1, smaller cycle time- Faster clock Datapath CPI Control Function Units Transistors Wires Pins Cycle Time

  13. CPI “Average cycles per instruction” Invest Resources where time is Spent! • CPI = (CPU Time * Clock Rate) / Instruction Count • = Clock Cycles / Instruction Count n CPU time = ClockCycleTime * S CPI * I i i i = 1 n "instruction frequency" CPI = S CPI * F where F = I i i i i i = 1 Instruction Count

  14. Example Base Machine (Reg / Reg) Op Freq(Fi) CPI(i) % Time ALU 50% 1 .5 33% Load 20% 2 .4 27% Store 10% 2 .2 13% Branch 20% 2 .4 27% 1.5 Typical Mix The CPI = 1.5 cycles per instruction

  15. Assume a program of 1 million instructions, Compare the performance of Base Machine (B) with the above CPI, 1 GHZ clock, and Enhanced Machine (E) with 1.333 GHZ and a one cycle increase for L/S And branch instructions Enhanced Machine (Reg / Reg) Op Freq CPI(i) % Time ALU 50% 1 .5 25% Load 20% 3 .6 30% Store 10% 3 .3 15% Branch20% 3 .6 30% 2.0

  16. Comparing the performance of two machines • Perf. of machine X = 1 / exec. Time of prog. on machine X • Perf. of E / Perf. of B = exec. Time of B / exec. Time of E • = 1.5 * 1 / 2 * 0.75 = 1 • Performance of B is similar to that of E, • No gain in performance

  17. Rate Metrics • MIPS = Instruction Count / (Time * 10^6) • = Clock Rate / (CPI * 10^6) • machines with different instruction sets ? • programs with different instruction mixes ? • dynamic frequency of instructions • uncorrelated with performance • MFLOP/S= FP Operations / (Time * 10^6) • machine dependent • often not where time is spent

  18. Example showing why MIPS can failCompare performance with Compilers 1 and 2 for a given program on a given machine Instruction Count in Billion for instruction classes A B CCompiler 1 5 1 1Compiler 2 10 1 1clock cycles 1 2 3Clock cycles using compiler1 = 10 BillionClock cycles using compiler2 = 15 Billionassuming 1GHZ clockCPU Time 1 = 5x1+1x2 +1x3 = 10 secsCPU Time 2 = 10x1 + 1x2 + 1x3 = 15 secsyet the MIPS rating isMIPS 1 = (instr. Count/cpu time in sec x 10^6) = 700MIPS 2 = 12/15 * 1000 = 800

  19. Why Do Benchmarks? • How we evaluate differences • Different systems • Changes to a single system • Provide a target • Benchmarks should represent large class of important programs • Improving benchmark performance should help many programs • For better or worse, benchmarks shape a field • Good ones accelerate progress • good target for development • Bad benchmarks hurt progress • help real programs v. sell machines/papers? • Inventions that help real programs don’t help benchmark

  20. Programs to Evaluate Processor Performance • (Toy) Benchmarks • 10-100 line • e.g.,: sieve, puzzle, quicksort • Synthetic Benchmarks • attempt to match average frequencies of real workloads • e.g., Whetstone, dhrystone • Kernels • Time critical excerpts Real programs • e.g., gcc, spice

  21. Successful Benchmark: SPEChttp://www.spec.org/benchmarks.htmlhttp://mrob.com/pub/comp/benchmarks/spec.html#CPU_06 • EE Times + 5 companies band together to form the Systems Performance Evaluation Committee (SPEC): Sun, MIPS, HP, Apollo, DEC • Create standard list of programs, inputs, reporting: some real programs, includes OS calls, some I/O

  22. SPEC second round, SPEC95 • 8 integer benchmarks in C and 10 floating pt benchmarks in Fortran

  23. Amdahl's Law Speedup due to enhancement E: ExTime w/o E Performance w/ E Speedup(E) = -------------------- = --------------------- ExTime w/ E Performance w/o E Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then, ExTime(with E) = ((1-F) + F/S) X ExTime(without E) Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E) <= 1/(1-F) speed up is bounded by this factor

  24. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Performance Evaluation Summary • Time is the measure of computer performance! • Good products created when have: • Good benchmarks • Good ways to summarize performance • If not good benchmarks and summary, then choice between improving product for real programs vs. improving product to get more sales=> sales almost always wins • Remember Amdahl’s Law: Speedup is limited by unimproved part of program

More Related