CSC 237 Comparative Computer Architecture

1. CSC � 237 Comparative Computer Architecture Introduction

2. Some History Pre-1970 � Computers (mainframes) composed from many components (several ICs, several boards) Progress depended on both technology and better designs, but slowed by necessity of connecting all the components 1970 � microprocessor � cpu on a chip Progress became tied closely to technology Technology improved rapidly � 35% performance increases per year Start by reviewing the past of computer designStart by reviewing the past of computer design

3. More History 1980s Compilers advanced, less dependence on assembly language programming Operating systems became standardized RISC microprocessors Cache Instruction level parallelism Improvements of over 50% per year in performance since mid-1980s More recentlyMore recently

4. Benchmark Performance � Relative to VAX 11/780 To see how things have changed since the 80s..To see how things have changed since the 80s..

5. Computer Markets Desktop price range of under $1000 - $10,000 optimize price-performance Servers optimize availability and throughput (i.e. transactions per minute) scalability is important Embedded computers computers that are just one component of a larger system (cell phones, printers, network switches, etc) widest range of processing power and cost real-time performance requirements often power and memory must be minimized Three basic markets that computer architects design forThree basic markets that computer architects design for

6. Embedded Computers Memory can be a substantial component of system cost more memory requires more power code size is important Power battery life less expensive (plastic) packaging lack of cooling fans Embedded computers often designed using a processor �core� and application specific hardware on a single chip. Embedded computers have a number of special issues to considerEmbedded computers have a number of special issues to consider

7. Designing Computers Instruction Set Design Functional Organization Logic Design Implementation (IC design, packaging, power, cooling) So, in this context, we think about the steps needed to design a computer.So, in this context, we think about the steps needed to design a computer.

8. Some Terminology Computer Organization High-level aspects of design (memory, bus, CPU) Computer Hardware Detailed logic design, packaging technology. Computers with the same ISA can have very different organizations Computers with the same ISA and same organization can have different hardware implementations Computers with the same ISA can have very different organizations Computers with the same ISA and same organization can have different hardware implementations

9. Design � Functional Requirements Application area (desktop, servers, embedded computers) Level of software compatibility (programming level, binary compatible) Operating system requirements (address space size, memory management) Standards (floating point, I/O, networks, etc) At the top of the design process is determining the functional requirements for an architectureAt the top of the design process is determining the functional requirements for an architecture

10. Design - Technology Trends Integrated Circuit Technology Semiconductor DRAM Magnetic Disk Technology Network technology At the implementation end of the process, it is important to understand technology trends.At the implementation end of the process, it is important to understand technology trends.

11. IC Technology Constant improvements in density

12. IC Technology Reaching certain thresholds are important 1980s � 25K-50K transistors on a chip enabled single-chip 32-bit processors Late 1980s � first level cache on chip

13. DRAM Technology Dynamic Random Access Memory Density increases about 50% per year Cycle time decreases about 30% in 10 years DRAM access time very slow compared to CPU clock.

14. Magnetic Disk Technology 2000�s - density improving more than 100% per year 1990�s � density improvements of ~30% per year Access time improving only 30% over 10 years.

15. Network Technology Performance depends on switches and transmission system Performance mostly measured by bandwidth � bits/sec 10Mb/sec to 100Mb/sec took about 10 years 100Mb/sec to 1Gb/sec took about 5 years

16. Economic trends Price � amount paid for an item Cost � amount spent to produce an item (including overhead) Major cost factors learning curve � manufacturing costs decrease as a process for a new item is optimized yield � percentage of manufactured devices that are good Note that yield is closely related to the learning curveNote that yield is closely related to the learning curve

17. Six generations of DRAMS shown � note the sharp decrease in price shortly after a module is introduced. Learning curve. 64Mbit low price due to serious oversupply.Six generations of DRAMS shown � note the sharp decrease in price shortly after a module is introduced. Learning curve. 64Mbit low price due to serious oversupply.

18. Similar trends in price occur for processors � this shows Intel prices over a 15 month period. Similar trends in price occur for processors � this shows Intel prices over a 15 month period.

19. Other Cost Factors Volume cost decreases with increased volume (quicker learning curve, less development costs per chip) Commodification when virtually identical products are sold by multiple vendors in large volumes competition decreases gap between cost and price

20. IC Costs Important component of overall computer cost. Depends on several factors: Cost of die Cost of testing die Cost of packaging and final test Final test yield

21. IC Manufacturing Process

22. Cost Formulas

23. Overall Computer Cost Example $1000 PC in 2001

24. How are Prices Determined? Overall cost includes Component costs Direct costs (labor costs, warranty, etc) Gross margin - overhead, or indirect costs (research, marketing, sales, taxes, etc) Average discount � amount added for retail markup

25. Example: $1000 PC in 2001

26. Computer Performance How do we measure it? Application run time Throughput � number of jobs per second Response time Importance of each term depends on the application. Application run time � normal PC user Throughput � server applications Response time � real time applications, transaction processingApplication run time � normal PC user Throughput � server applications Response time � real time applications, transaction processing

27. Measuring Performance Execution time � the actual time between the beginning and end of a program. Includes I/O, memory access, everything. Performance � reciprocal of execution time We will focus on execution timeWe will focus on execution time

28. Other Performance Measures CPU time � only the time when cpu is processing a task � not including I/O or disk access, etc. User CPU time � only the cpu time spent on the task being measured. System CPU time � cpu time spent in the operating system.

29. Example � Linux �time� command >time ls real 0m1.335s user 0m0.007s sys 0m0.023s

30. What Programs to Measure? Real applications (C compilers, photoshop, etc) � generally hard to port for comparisons Modified applications - modified for portability or to emphasize some aspect Kernels � small, key pieces of real programs Toy benchmarks � small programs with known results Synthetic benchmarks � code created solely to measure performance � try to match average frequency of operations in real applications.

31. Benchmarks Some typical benchmarks: Whetstone Dhrystone Benchmark suites � collections of benchmarks with different characteristics SPEC � Standard Performance Evaluation Corporation (www.spec.org) Many types (desktop, server, transaction processing, embedded computer)

32. PC Benchmarks Business Winstone � Script that runs Netscape and office suite programs CC Winstone � simulates several programs focused on content creation (Photoshop, audio-editing, etc) Winbench � a variety of scripts that test CPU performance, video system performance, disk, etc.

33. Desktop Benchmarks CPU-intensive SPEC (SPEC89 -> SPEC92 -> SPEC95 -> SPEC2000 -> SPEC2004(under development)) Graphics-intensive

34. SPEC CPU2000 CINT2000 � integer benchmark 11 integer programs (mostly in C) gzip, vpr, gcc, mcf, crafty, parser, eon, perlmbk, gap, vortex, bzip2, twolf Modified slightly for portability and to minimize the role of I/O in performance CFP2000 � floating point benchmark

35. Performance: Total Execution Time Average execution time (several programs) 1 n Weighted execution time � when some programs are found more frequently in a workload How do we summarize if we have the total execution time of many different programs?How do we summarize if we have the total execution time of many different programs?

36. Example

37. Normalized Execution Time Normalize each execution time to a reference machine. Example: Normalized: P1 P2 P1 P2 MachA: 1s 5s 1 1 MachB: .1s 10s .1 2 Now how to summarize? (average)

38. Normalized execution time Arithmetic Mean Geometric Mean

39. Geometric Mean

40. More Detailed Performance Analysis Simulation techniques Profile-based, static modeling Hardware counters included in newer processors that count instructions executed and clock cycles Include instrumentation code in the program to gather time, # instructions executed. Interpret program at instruction set level. Trace-driven simulation (useful for modeling memory system performance). Trace of memory references is created as program runs (by simulation or instrumented execution). Execution-driven simulation. A detailed simulaton of memory systems and cpu (including pipelining details) are executed concurrently. Allows exact modeling of interaction between memory and cpu.

41. Design Guidelines Make the common case fast when making design tradeoffs Amdahl�s Law:

42. Amdahl�s Law � Other forms To use Amdahl�s Law we must know: The fraction of the computation time in the original machine that can be enhanced (Fractionenhanced) The improvement gained by enhancement (Speedupenhanced)

43. Example Suppose we can enhance the performance of a CPU by adding a �vectorization mode� that can be used under certain conditions, that will compute 10x faster than normal mode. What percent of the run time must be in �vectorization mode� to achieve overall speedup of 2?

44. CPU Performance Equations For a particular program: CPU time = CPU clock cycles x clock cycle time. clock cycle time = Considering instruction count: cycles per instruction (CPI) =

45. CPU Performance Equation CPU time = Instruction count x clock cycle time x CPI or CPU time = A 2x improvement in any of them is a 2x improvement in the CPU timeA 2x improvement in any of them is a 2x improvement in the CPU time

46. MIPS as Performance Measure MIPS � Millions Instructions Per Second Used in conjunction with benchmarks Dhrystone MIPS Can be computed as follows:

47. Example CPUA: RISC processor with floating point unit (CPI = 10) CPUB: Embedded RISC processor � no floating point unit (CPI=6) CPUs both run at 500MHz Benchmark compiled separately for CPUA, CPUB Execution time CPUA = 1.08s Execution time CPUB = 13.6s What is total number of instructions executed for each CPU? #instructions = Execution time * Clock freq What is the MIPS rating for the two CPUs?

48. Locality Program property that is often exploited by architects to achieve better performance. Programs tend to reuse data and instructions. (For many programs, 90% of execution time is spent running 10% of the code) Temporal locality � recently accessed items are likely to be accessed in the near future. Spatial locality � items with addresses near one another tend to be referenced close together in time.

49. Parallelism One of the most important methods for improving performance. System level � multiple CPUs, multiple disks CPU level � pipelining, multiple functional modules Digital design level � carry-lookahead adders

50. Price Performance SPEC CPU benchmarks used in text (independent of OS and architecture) Uses a geometric mean normalized to Sun system, with larger numbers indicating higher performance. Important that machines are configured with similar memory, disk, graphics boards and ethernet connections.

51. Example Benchmark Results for CINT2000 for several desktops Shows both performance and performance per costShows both performance and performance per cost

52. Results for Transaction Processing Benchmarks for several servers

53. Performance of Embedded Processors for 3 Benchmark Suites

56. Common Mistakes Using only clock rate to compare performance Even if the processor has the same instruction set, and same memory configuration, performance may not scale with clock speed.

57. P4(1.7GHz) vs P3(1.0GHz) Pentiums for 5 benchmarks

58. Common Mistakes Comparing hand-coded assembly and compiler-generated, high-level language performance. Huge performance gains can be obtained by hand-coding critical loops of benchmark programs. Important to understand how benchmark code is generated when comparing performance.

59. Peak Performance Peak performance � the performance that a machine is guaranteed not to exceed. Observed performance does not necessarily track peak performance Gap may be as large as a factor of 10.

60. Fallacy: Synthetic Benchmarks Predict Real Program Performance Whetstone and Dhrystone � most popular synthetic benchmark examples Compiler and hardware optimizations can artificially inflate performance for these benchmarks. Inflated performance does not necessarily apply to real programs.

61. Fallacy: MIPS is accurate for comparing performance MIPS is dependent on the instruction set, so machines with different ISAs cannot be compared. MIPS varies between programs on the same computer. MIPS can vary inversely with performance Example: programs using more efficient instructions, but fewer of them, can outperform programs using several instructions for the same task, but have a lower MIPS rating.

62. Review History of processor development Different processor markets Issues in cpu design Economics of cpu design and implementation Computer performance � measures, formulas, results Design guidelines � Amdahl�s law, locality, parallelism

CSC 237 Comparative Computer Architecture

CSC 237 Comparative Computer Architecture

Presentation Transcript

Computer Architecture

Computer Architecture

CSC 317 Computer Organization and Architecture

CSC 520 Computer Architecture

CSC 317 Computer Organization and Architecture

CSC: 345 Computer Architecture

CSC: 345 Computer Architecture

CSC: 345 Computer Architecture

CSC 237 - Data Structures, Fall, 2008

CSC 3650 Introduction to Computer Architecture

CSC 7080 Graduate Computer Architecture Lecture 13 - Storage

CSC 237 - Data Structures, Fall, 2008

CSC 3650 Introduction to Computer Architecture

Computer Architecture

CSC 3650 Introduction to Computer Architecture

CSC 2224: Parallel Computer Architecture and Programming

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture