Lecture 1: Introduction

Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2005 Zhao Zhang

Traditional “Computer Architecture” The term architecture is used here to describe the attribute of a system as seen by the programmer, i.e., theconceptual structure and functional behavior as distinct from the organization of the data flow and controls, the logic design, and the physical implementation. • Gene Amdahl, IBM Journal R&D, April 1964

Contemporary “Computer Architecture” • Instruction set architecture • Microarchitecture: • Pipeline structures • Cache memories • Implementations • Logic design and synthesis

Fundamentals • Technology trends • Performance evaluation methodologies • Instruction Set Architecture

Technology Drives for High-Performance VLSI technology: faster transistors and larger transistor budget

CPU Performance For sequential program: CPU time = #Inst  CPI  Clock cycle time To improve performance • Faster clock time • Reduce #inst • Reduce CPI or increase IPC

How to use one billion transistors? • Bit-level parallelism • Move from 32-bit to 64-bit • Instruction-level parallelism • Deep pipeline • Execute multiple instructions per cycle • Program locality • Large caches, more branch prediction resouces • Thread-level parallelism

Instruction-Level Parallelism Pipeline + Multi-issue IF IF IF IF IF ID ID ID ID ID EX EX EX EX EX MEM MEM MEM MEM MEM WB WB WB WB WB

for (i=0; i<N; i++) X[i] = a*X[i]; // let R3=&X[0],R4=&X[N] // and F0=a LOOP:LD.D F2, 0(R3) MUL.D F2, F2, F0 S.D F2, 0(R3) DADD R3, R3, 8 BNE R3, R4, LOOP What instructions are parallel? How to schedule those instructions? Instruction-level Parallelism

Instruction-Level Parallelism Find independent instructions through dependence analysis • Hardware approaches => Dynamically scheduled superscalar • Most commonly used today: Intel Pentium, AMD, Sun UltraSparc, and MIPS families • Software approaches => (1) Static scheduled superscalar, or (2) VLIW

Modern Superscalar Processors Example: Intel Pentium, IBM Power/PowerPC, Sun UltraSparc, SGI MIPS … • Multi-issue and Deep pipelining • Dynamic scheduling and speculative execution • High bandwidth L1 caches and large L2/L3 caches

Modern Superscalar Processor Challenges: Complexity!!! • How • Understand how it brings high performance • Will see wield designs • Will use Verilog, simulation to help understanding • Have big pictures

Modern Superscalar Processor Maintain register data flow • Register renaming • Instruction scheduling Maintain control flow • Branch prediction • Speculative execution and recovery Maintain memory data flow • Load and store queues • Memory dependence speculation

Memory System Performance Memory Stall CPI = Miss per inst × miss penalty = % Mem Inst × Miss rate × Miss Penalty Assume 20% memory instruction, 2% miss rate, 400-cycle miss penalty. How much is memory stall CPI?

Memory System Performance • A typical memory hierarchy today: • Here we focus on L1/L2/L3 caches, virtual memory and main memory Proc/Regs L1-Cache Bigger Faster L2-Cache L3-Cache (optional) Memory Disk, Tape, etc.

Cache Design Many applications are memory-bound • CPU speeds increases fast; memory speed cannot match up Cache hierarchy: exploits program locality • Basic principles of cache designs • Hardware cache optimizations • Application cache optimizations • Prefetching techniques Also talk about virtual memory

High Performance Storage Systems What limits the performance of web servers? Storage! • Storage technology trends • RAID: Redundant array of inexpensive disks

Multiprocessor Systems Must exploit thread-level parallelism for further performance improvement Shared-memory multiprocessors: Cooperating programs see the same memory address How to build them? • Cache coherence • Memory consistency

Emerging Techniques • Low power design • Multicore and multithreaded processors • Secure processor • Reliable design

Why Study Computer Architecture As a hardware designer/researcher – know how to design processor, cache, storage, graphics, interconnect, and so on As a system designer – know how to build a computer system using the best components available As a software designer – know how to get the best performance from the hardware

Class Web Site www.ece.iastate.edu/~zzhang/cpre585/ • Syllabus • Schedule • Homework assignments • Readings WebCT: Grades, Assignments and Discussions

Lecture 1: Introduction