620 likes | 805 Vues
Some History. Pre-1970 Computers (mainframes) composed from many components (several ICs, several boards)Progress depended on both technology and better designs, but slowed by necessity of connecting all the components1970 microprocessor cpu on a chipProgress became tied closely to technol
E N D
1. CSC 237 Comparative Computer Architecture Introduction
2. Some History Pre-1970
Computers (mainframes) composed from many components (several ICs, several boards)
Progress depended on both technology and better designs, but slowed by necessity of connecting all the components
1970 microprocessor cpu on a chip
Progress became tied closely to technology
Technology improved rapidly 35% performance increases per year Start by reviewing the past of computer designStart by reviewing the past of computer design
3. More History 1980s
Compilers advanced, less dependence on assembly language programming
Operating systems became standardized
RISC microprocessors
Cache
Instruction level parallelism
Improvements of over 50% per year in performance since mid-1980s
More recentlyMore recently
4. Benchmark Performance Relative to VAX 11/780 To see how things have changed since the 80s..To see how things have changed since the 80s..
5. Computer Markets Desktop
price range of under $1000 - $10,000
optimize price-performance
Servers
optimize availability and throughput (i.e. transactions per minute)
scalability is important
Embedded computers
computers that are just one component of a larger system (cell phones, printers, network switches, etc)
widest range of processing power and cost
real-time performance requirements
often power and memory must be minimized
Three basic markets that computer architects design forThree basic markets that computer architects design for
6. Embedded Computers Memory
can be a substantial component of system cost
more memory requires more power
code size is important
Power
battery life
less expensive (plastic) packaging
lack of cooling fans
Embedded computers often designed using a processor core and application specific hardware on a single chip. Embedded computers have a number of special issues to considerEmbedded computers have a number of special issues to consider
7. Designing Computers Instruction Set Design
Functional Organization
Logic Design
Implementation (IC design, packaging, power, cooling) So, in this context, we think about the steps needed to design a computer.So, in this context, we think about the steps needed to design a computer.
8. Some Terminology Computer Organization
High-level aspects of design (memory, bus, CPU)
Computer Hardware
Detailed logic design, packaging technology.
Computers with the same ISA can have very different organizations
Computers with the same ISA and same organization can have different hardware implementations
Computers with the same ISA can have very different organizations
Computers with the same ISA and same organization can have different hardware implementations
9. Design Functional Requirements Application area (desktop, servers, embedded computers)
Level of software compatibility (programming level, binary compatible)
Operating system requirements (address space size, memory management)
Standards (floating point, I/O, networks, etc) At the top of the design process is determining the functional requirements for an architectureAt the top of the design process is determining the functional requirements for an architecture
10. Design - Technology Trends Integrated Circuit Technology
Semiconductor DRAM
Magnetic Disk Technology
Network technology At the implementation end of the process, it is important to understand technology trends.At the implementation end of the process, it is important to understand technology trends.
11. IC Technology Constant improvements in density
12. IC Technology Reaching certain thresholds are important
1980s 25K-50K transistors on a chip enabled single-chip 32-bit processors
Late 1980s first level cache on chip
13. DRAM Technology Dynamic Random Access Memory
Density increases about 50% per year
Cycle time decreases about 30% in 10 years
DRAM access time very slow compared to CPU clock.
14. Magnetic Disk Technology 2000s - density improving more than 100% per year
1990s density improvements of ~30% per year
Access time improving only 30% over 10 years.
15. Network Technology Performance depends on switches and transmission system
Performance mostly measured by bandwidth bits/sec
10Mb/sec to 100Mb/sec took about 10 years
100Mb/sec to 1Gb/sec took about 5 years
16. Economic trends Price amount paid for an item
Cost amount spent to produce an item (including overhead)
Major cost factors
learning curve manufacturing costs decrease as a process for a new item is optimized
yield percentage of manufactured devices that are good
Note that yield is closely related to the learning curveNote that yield is closely related to the learning curve
17. Six generations of DRAMS shown note the sharp decrease in price shortly after a module is introduced. Learning curve. 64Mbit low price due to serious oversupply.Six generations of DRAMS shown note the sharp decrease in price shortly after a module is introduced. Learning curve. 64Mbit low price due to serious oversupply.
18. Similar trends in price occur for processors this shows Intel prices over a 15 month period. Similar trends in price occur for processors this shows Intel prices over a 15 month period.
19. Other Cost Factors Volume
cost decreases with increased volume (quicker learning curve, less development costs per chip)
Commodification
when virtually identical products are sold by multiple vendors in large volumes
competition decreases gap between cost and price
20. IC Costs Important component of overall computer cost.
Depends on several factors:
Cost of die
Cost of testing die
Cost of packaging and final test
Final test yield
21. IC Manufacturing Process
22. Cost Formulas
23. Overall Computer Cost Example $1000 PC in 2001
24. How are Prices Determined? Overall cost includes
Component costs
Direct costs (labor costs, warranty, etc)
Gross margin - overhead, or indirect costs (research, marketing, sales, taxes, etc)
Average discount amount added for retail markup
25. Example: $1000 PC in 2001
26. Computer Performance How do we measure it?
Application run time
Throughput number of jobs per second
Response time
Importance of each term depends on the application. Application run time normal PC user
Throughput server applications
Response time real time applications, transaction processingApplication run time normal PC user
Throughput server applications
Response time real time applications, transaction processing
27. Measuring Performance Execution time the actual time between the beginning and end of a program. Includes I/O, memory access, everything.
Performance reciprocal of execution time
We will focus on execution timeWe will focus on execution time
28. Other Performance Measures CPU time only the time when cpu is processing a task not including I/O or disk access, etc.
User CPU time only the cpu time spent on the task being measured.
System CPU time cpu time spent in the operating system.
29. Example Linux time command >time ls
real 0m1.335s
user 0m0.007s
sys 0m0.023s
30. What Programs to Measure? Real applications (C compilers, photoshop, etc) generally hard to port for comparisons
Modified applications - modified for portability or to emphasize some aspect
Kernels small, key pieces of real programs
Toy benchmarks small programs with known results
Synthetic benchmarks code created solely to measure performance try to match average frequency of operations in real applications.
31. Benchmarks Some typical benchmarks:
Whetstone
Dhrystone
Benchmark suites collections of benchmarks with different characteristics
SPEC Standard Performance Evaluation Corporation (www.spec.org)
Many types (desktop, server, transaction processing, embedded computer)
32. PC Benchmarks Business Winstone Script that runs Netscape and office suite programs
CC Winstone simulates several programs focused on content creation (Photoshop, audio-editing, etc)
Winbench a variety of scripts that test CPU performance, video system performance, disk, etc.
33. Desktop Benchmarks CPU-intensive
SPEC (SPEC89 -> SPEC92 -> SPEC95 -> SPEC2000 -> SPEC2004(under development))
Graphics-intensive
34. SPEC CPU2000 CINT2000 integer benchmark
11 integer programs (mostly in C)
gzip, vpr, gcc, mcf, crafty, parser, eon, perlmbk, gap, vortex, bzip2, twolf
Modified slightly for portability and to minimize the role of I/O in performance
CFP2000 floating point benchmark
35. Performance: Total Execution Time Average execution time (several programs)
1
n
Weighted execution time when some programs are found more frequently in a workload
How do we summarize if we have the total execution time of many different programs?How do we summarize if we have the total execution time of many different programs?
36. Example
37. Normalized Execution Time Normalize each execution time to a reference machine.
Example: Normalized:
P1 P2 P1 P2
MachA: 1s 5s 1 1
MachB: .1s 10s .1 2
Now how to summarize? (average)
38. Normalized execution time Arithmetic Mean
Geometric Mean
39. Geometric Mean
40. More Detailed Performance Analysis Simulation techniques
Profile-based, static modeling
Hardware counters included in newer processors that count instructions executed and clock cycles
Include instrumentation code in the program to gather time, # instructions executed.
Interpret program at instruction set level.
Trace-driven simulation (useful for modeling memory system performance). Trace of memory references is created as program runs (by simulation or instrumented execution).
Execution-driven simulation. A detailed simulaton of memory systems and cpu (including pipelining details) are executed concurrently. Allows exact modeling of interaction between memory and cpu.
41. Design Guidelines Make the common case fast when making design tradeoffs
Amdahls Law:
42. Amdahls Law Other forms To use Amdahls Law we must know:
The fraction of the computation time in the original machine that can be enhanced (Fractionenhanced)
The improvement gained by enhancement (Speedupenhanced)
43. Example Suppose we can enhance the performance of a CPU by adding a vectorization mode that can be used under certain conditions, that will compute 10x faster than normal mode. What percent of the run time must be in vectorization mode to achieve overall speedup of 2?
44. CPU Performance Equations For a particular program:
CPU time = CPU clock cycles x clock cycle time.
clock cycle time =
Considering instruction count:
cycles per instruction (CPI) =
45. CPU Performance Equation CPU time = Instruction count x clock cycle time x CPI
or
CPU time = A 2x improvement in any of them is a 2x improvement in the CPU timeA 2x improvement in any of them is a 2x improvement in the CPU time
46. MIPS as Performance Measure MIPS Millions Instructions Per Second
Used in conjunction with benchmarks
Dhrystone MIPS
Can be computed as follows:
47. Example CPUA: RISC processor with floating point unit (CPI = 10)
CPUB: Embedded RISC processor no floating point unit (CPI=6)
CPUs both run at 500MHz
Benchmark compiled separately for CPUA, CPUB
Execution time CPUA = 1.08s
Execution time CPUB = 13.6s
What is total number of instructions executed for each CPU?
#instructions = Execution time * Clock freq
What is the MIPS rating for the two CPUs?
48. Locality Program property that is often exploited by architects to achieve better performance.
Programs tend to reuse data and instructions. (For many programs, 90% of execution time is spent running 10% of the code)
Temporal locality recently accessed items are likely to be accessed in the near future.
Spatial locality items with addresses near one another tend to be referenced close together in time.
49. Parallelism One of the most important methods for improving performance.
System level multiple CPUs, multiple disks
CPU level pipelining, multiple functional modules
Digital design level carry-lookahead adders
50. Price Performance SPEC CPU benchmarks used in text (independent of OS and architecture)
Uses a geometric mean normalized to Sun system, with larger numbers indicating higher performance.
Important that machines are configured with similar memory, disk, graphics boards and ethernet connections.
51. Example Benchmark Results for CINT2000 for several desktops Shows both performance and performance per costShows both performance and performance per cost
52. Results for Transaction Processing Benchmarks for several servers
53. Performance of Embedded Processors for 3 Benchmark Suites
56. Common Mistakes Using only clock rate to compare performance
Even if the processor has the same instruction set, and same memory configuration, performance may not scale with clock speed.
57. P4(1.7GHz) vs P3(1.0GHz) Pentiums for 5 benchmarks
58. Common Mistakes Comparing hand-coded assembly and compiler-generated, high-level language performance.
Huge performance gains can be obtained by hand-coding critical loops of benchmark programs.
Important to understand how benchmark code is generated when comparing performance.
59. Peak Performance Peak performance the performance that a machine is guaranteed not to exceed.
Observed performance does not necessarily track peak performance
Gap may be as large as a factor of 10.
60. Fallacy: Synthetic Benchmarks Predict Real Program Performance Whetstone and Dhrystone most popular synthetic benchmark examples
Compiler and hardware optimizations can artificially inflate performance for these benchmarks.
Inflated performance does not necessarily apply to real programs.
61. Fallacy: MIPS is accurate for comparing performance MIPS is dependent on the instruction set, so machines with different ISAs cannot be compared.
MIPS varies between programs on the same computer.
MIPS can vary inversely with performance
Example: programs using more efficient instructions, but fewer of them, can outperform programs using several instructions for the same task, but have a lower MIPS rating.
62. Review History of processor development
Different processor markets
Issues in cpu design
Economics of cpu design and implementation
Computer performance measures, formulas, results
Design guidelines Amdahls law, locality, parallelism