Computer Architecture

Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) (gaz@cs.uct.ac.za) 2005

The Grand Scheme • Abstractions • Role of Performance • Language of the Machine • Arithmetic • [3] Performance Issues • (6) Processor: Datapath & Control • (3) Pipelining • (4) Memory hierarchy • (4) Peripherals CS1 CS2

Review • Chapter 1: Abstractions • HLL -> Assembly -> Binary • Application (system s/w (hardware)) • 5 classic components • Input, output, memory, datapath, control • Processor gets inst. & data from memory • Input for placing items in memory • Output reads from memory • Control co-ordinates • Abstraction / Interfaces

Review II • Chapter 2: The Role of Performance • CPU time = • Popular measures • MIPS • Depends on inst. Set | Varies between progs. | Can vary inversely with performance • GFLOPS • Better, perhaps, as instructions are similar • FLOPS can differ across machines Instruction Count x CPI Clock Rate

Review III • Chapter 3 - Machine Language • R type and I type (distinguished by first field) • Registers • Address modes • Register, base, immediate, PC relative • P1: Simplicity favours regularity • P2: Smaller is faster • P3: Good design demands compromise • P4: Make the common case fast • Chapter 4 - Arithmetic • ALU construction

New Chapter 4 - Performance • What is a fast computer? • Clock speed? • Responsiveness? • Data processing? • It depends…

What is performance? • There are so many elements to a computer system that it is pretty much impossible to determine, in advance, the performance level of a system • Computer vendors have their own idea of ‘performance’ • Macintosh users have the ‘bounce’ test • Performance means different things to different people • 747 or F-15

Two key measures • Users of personal computers are interested in ‘response time’ • Also called execution time • F-15 • Data centre managers are interested in ‘throughput time’ • Total work done in a given time • 747 • This implies at least two different performance ‘metrics’

Response time • As we are looking at microprocessor design, response time is of primary interest • Start by considering execution time • Longer is bad; shorter is good • So we can say that

What is time? • “Time is an illusion; lunch time doubly so” • We kind of cheated when we said “time” • Time can be thought of in two ways • Wall time / Response time /Elapsed time: all mean the number of seconds passed since the task was started until it ends • CPU time: In multi-user systems, it makes more sense to count the time the CPU spends on that user’s task (‘system’ time confuses issue) • Try the Unix ‘time’ command

CPU performance • In personal computers, most people focus on clock speed - e.g. 4 Ghz processor • 4 GHz is the clock rate; 0.25 ns is the cycle length • CPU execution time = Clock cycles for a program • So increasing clock speed, or decreasing the number of cycles for a program will improve performance • We need to think a bit more about ‘cycles for a program’ Clock Rate

Program length • The number of cycles require for a program will depend on the number of instructions contained in a program and the number of cycles it takes to execute each of these instructions • We average this to CPI (Cycles per Instruction) • So… • CPU clock cycles = Number of instructions x CPI • Therefore • CPU execution time = Number of instructions x CPI Clock Rate

A note on CPI • Different instructions take different numbers of clock cycles to complete • A lot more on this later • By using hardware and software monitoring tools, one can calculate a sufficiently useful CPI value

So how can I make my program faster? • Algorithm: Instruction count (and possibly CPI) • Algorithm choice affects number and type of instructions • Language: Instruction count and CPI • Some languages require more statements per expression and may require more high-cycle instructions • Compiler: Instruction count and CPI • Compilers can have a huge effect on both these measures; too complex to deal with here • Processor Instruction Set: CPI, clock rate & instruction count • We will get in to this in the next chapter

Comparative Performance • So we know the elements, but how do we compare systems? • For situations where there is only one application to run, this is straightforward • Most of us, however, run multiple applications • Could test the application on different platforms, but only if it is available for target platforms • Alternatively create a simple application to create an idealised usage • The Benchmark!

Benchmarks • There are many of these created to indicate performance for different types of task • A good place to look is • www.spec.org • However, some compilers are created to optimise for specific benchmarks • Hence the need for reproducability

You may have head of MIPS? • MIPS (Millions of Instructions Per Second) was, for a long time, the standard benchmark, but has now been replaced • On the plus side, bigger numbers usually mean faster computers • On the down side • Different instruction sets do different things • MIPS ratings vary per program (different MIPS for same computer) • MIPS can vary inversely with performance.

Some performance laws • Moore’s law: “The complexity of an integrated circuit will double every 18 months” • What are the implications of that? • Amdahl’s Law: Even dramatic improvements in part of a task will not have significant effects on the overall task (diminishing returns) • Implication is that the common task should be fast

Computer Architecture – Introduction