200 likes | 512 Vues
Computer Architecture. Dr. R. Venkatesan Fall 2005. PREREQUISITES. Digital Logic: basic building blocks, design Computer programming: object-oriented Computer Organization: Microprocessors Basic Instruction Set: Assembly Language Computer Interfacing: Microprocessors
E N D
Computer Architecture Dr. R. Venkatesan Fall 2005
PREREQUISITES • Digital Logic: basic building blocks, design • Computer programming: object-oriented • Computer Organization: Microprocessors • Basic Instruction Set: Assembly Language • Computer Interfacing: Microprocessors • Computer Design: Digital Systems Design • HDL: concurrency, delay: VHDL • HLL compilers R. Venkatesan, Memorial University
What makes a better architecture • Higher performance: speed, throughput • Elegance: symmetry, simplicity, orthogonality • Flexibility: scalability – upwards/downwards • Power-efficiency • Low cost: mostly depends on above factors However, a sleeker architecture need not be the most popular architecture, as marketing skills and market lead are perhaps the two most important factors to achieve popularity or business success. Having said that, we should remember that even market leaders have to improve their architecture in order to retain their popularity and success. R. Venkatesan, Memorial University
Computers and Processors • Old classification of computers: Micro, mini, mainframe, super • Classification based on use: general-purpose, servers, embedded systems, special-purpose: DSP; numerical coprocessors for division, convolution, FFT, etc.; graphic/video processors; audio/speech processors; data processors; communication processors including network processors, security processors, codecs for error control; and so on. • Cluster processors; distributed computers; shared memory multiprocessors; supercomputers; array processors including systolic arrays R. Venkatesan, Memorial University
Enabling Technologies • IC technology: CMOS feature size, integration limits, Moore’s Law • Memory (DRAM) technology: memory size, memory cost, memory speed: lags behind processor speed and the gap is getting progressively wider • Mass/Secondary storage technology: portability • Network technology: LAN, MAN, WAN, Ethernet, ATM, Internet, Bluetooth, Wireless technologies, WiFi, WiMax, 2G, 2.5G, 3G, 4G, 5G, UWB, adhoc, sensor networks R. Venkatesan, Memorial University
Measuring Performance • Performance is inversely related to the execution time of the application • Possible measures: clock-on-the-wall time, response time, CPU time for the user application (without or with the system call times for the application), processor clock speed, metrics such as MIPS, GFLOPS, TOPS, MPolygons/s and kTrans/s, • If we can measure the actual CPU time to execute application on the target computer: best but possible? • Benchmarks and benchmark suites: compare the performance with a “standard” computer/processor R. Venkatesan, Memorial University
Processor Benchmarks • Real applications: e.g. GCC, MS-Word, LaTex • Modified (scripted) applications: to enhance a particular aspect of the processor like multiuser access • Kernels: small repeated code; e.g. Livermore loops • Toy Benchmarks: Puzzle, QuickSort, Sieve of Eratosthenes • Synthetic Benchmarks: Whetstone (numeric), Dhrystone (data I/O), Dhampstone • Benchmark Suites: combinations of the above for selected focus: SPECCPU2000, SPECint1997, SPECfp1992, SPECWeb, SPECSFS, TPC-C, etc. R. Venkatesan, Memorial University
Performance comparison – example 1 COMPUTERS Weightings A B C W(1) W(2) W(3) Program P1 (s) 1 10 20 0.991 0.909 0.50 Program P2 (s) 1000 100 20 0.009 0.091 0.50 Total time (s) 1001 110 40 Arith. Mean W(1) 2.00 10.09 20.00 Arith. Mean W(2) 91.91 18.19 20.00 Arith. Mean W(3) 500.30 55.00 20.00 R. Venkatesan, Memorial University
Example contd.. Normalized to A Normalized to B Normalized to C A B C A B C A B C Program P1 1.0 10.0 20.0 0.1 1.0 2.0 0.05 0.5 1.0 Program P2 1.0 0.1 0.02 10.0 1.0 0.2 50.0 5.0 1.0 Arithmetic Mean 1.0 5.05 10.01 5.05 1.0 2.0 25.03 2.75 1.0 Geometric Mean 1.0 1.0 0.63 1.0 1.0 0.63 1.58 1.58 1.0 Total Time 1.0 0.11 0.04 9.1 1.0 0.36 25.03 2.75 1.0 A B C (reproduced here for ease) P1 1 10 20 P2 1000 100 20 R. Venkatesan, Memorial University
Improving performance of computer • Use faster material: silicon, GaA, InP • Use faster technology: photochemical lithography • Employ better architecture within one processor • Selection of instruction set: RISC/CISC, VLIW • Cache (levels of cache): higher throughput • Virtual memory: relocatability, security • Pipelining: k stages gives a maximum speedup of k • Superpipelining, • Superscalar (multiple pipelines) with dynamic scheduling • Branch prediction • Use multiple processors: emphasis of this course • Scalability, level of parallelism • Shared memory, array processing, multicomputers, MPP • Employ better software: compilers, etc. R. Venkatesan, Memorial University
Speedup • Any (architectural) enhancement will hopefully lead to better performance, and speedup is a measure of this improvement. • Performance improvement should be based on the total CPU time taken to execute the application, and not just any of the component times like memory access time or clock period. • If the whole processor is replicated, then the fraction enhanced is 100%, as the whole computation will be impacted. • If an enhancement affects only a part of the computation, then we need to determine the fraction of the CPU time impacted by the enhancement. R. Venkatesan, Memorial University
Amdahl’s Law • The following simple, but important law tells us that we need to always aim at making enhancements that will affect a large fraction of the computation, if not the whole computation. R. Venkatesan, Memorial University
CPU (Computation) time • CPU time is the product of three quantities: • Number of instructions executed or Instruction Count (IC): remember this is not the code (program) size • Average number of clock cycles per instruction (CPI): if CPI varies for different instruction, a weighted average is needed • Clock period (τ) • CPU time = IC * CPI * t • An architectural (or compiler-based) enhancement that is aimed to decrease one of the above two factors might end up increasing one or both of the other two. It is the product of the three quantities after applying the enhancement that gives us the new CPU time. R. Venkatesan, Memorial University
CPU Performance Equation CPU Time = IC * CPI avg. * t CPU Time = CPU clock cycles for a program * t CPU Time = CPU clock cycles for a program / f CPI avg. = CPU clock cycles for a program / IC CPU Time = IC * CPI avg. / f # inst./pgm. * clk. cylces/instn. * secs./clk. cycle= secs./pgm. = CPU time MIPS = IC / (execution time * 106) = f / (CPI * 106) Execution time = IC / (MIPS * 106) R. Venkatesan, Memorial University
Speedup example • Three enhancements for different parts of computation are contemplated, with speedups of 40, 20 and 5, respectively. E1 improves 20%, E2 improves 30% and E3 improves 70% of the computation. Assuming both cost the same, which is a better choice? • Speedup due to E1 = 1 / ((1-0.2) + 0.2/40) = 1.242 • Speedup due to E2 = 1 / ((1-0.3) + 0.3/20) = 1.399 • Speedup due to E3 = 1 / ((1-0.7) + 0.7/5) = 2.272 • So, a higher fraction enhanced is more beneficial than a huge speedup for a small fraction. • So, frequency of execution of different instructions becomes important – statistics. R. Venkatesan, Memorial University