180 likes | 314 Vues
This lecture introduces the principles of advanced computer architecture, focusing on the conceptual structure and functional behavior as seen by programmers. It covers instruction set architecture, organization, and implementation details such as pipelining, parallel processing, and memory systems. Significant topics include instruction-level parallelism, cache design, and high-performance storage systems. The discussion emphasizes the impact of technology trends and performance evaluation methodologies, providing insights for hardware and software designers on optimizing performance in contemporary computing environments.
E N D
Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang
Traditional “Computer Architecture” The term architecture is used here to describe the attribute of a system as seen by the programmer, i.e., theconceptual structure and functional behavior as distinct from the organization of the data flow and controls, the logic design, and the physical implementation. • Gene Amdahl, IBM Journal R&D, April 1964
Contemporary “Computer Architecture” • Instruction set architecture: program-visible instruction set • Instruction format, memory addressing modes, architectural registers, endian type, alignment, … • EX: RISC, CISC, VLIW, EPIC • Organization: high-level aspects of a computer’s design • Pipeline structure, instruction scheduling, cache, memory, disks, buses, etc. • Implementations: the specifics of a machine • Logic design, packaging technology
Fundamentals • ISA design principles and performance evaluation • The impacts of technology trends and market factors • Performance evaluation methodologies
High Performance Computer Architecture Given a huge number of transistors, how to run programs as rapid as possible? • Sequential Programs • Parallel and multiprogramming programs
Instruction Level Parallelism Sequential program performance: Execution Time = #inst × CPI × Cycle time • Pipelining works well for sequential programs • But best Performance limited by CPI >= 1.0 • Pipeline hazards draws back performance
Multi-issue Pipeline Naïve extension to multi-issue IF IF IF IF IF ID ID ID ID ID EX EX EX EX EX MEM MEM MEM MEM MEM WB WB WB WB WB
for (i=0; i<N; i++) X[i] = a*X[i]; // let R3=&X[0],R4=&X[N] // and F0=a LOOP:LD.D F2, 0(R3) MUL.D F2, F2, F0 S.D F2, 0(R3) DADD R3, R3, 8 BNE R3, R4, LOOP How much parallelism exist in the program? What’s the problem with the naïve multi-issue pipeline? Data hazards Control hazards Pipeline Efficiency
How to Exploit ILP? Find independent instructions through dependence analysis • Hardware approaches => Dynamically scheduled superscalar • Most commonly used today: Intel Pentium, AMD, Sun UltraSparc, and MIPS families • Software approaches => (1) Static scheduled superscalar, or (2) VLIW
Dynamically Scheduled Superscalar Important features: • Multi-issue and Deep pipelining • Dynamic scheduling • Speculative execution • Branch prediction • Memory dependence speculation • Non-blocking caches • High bandwidth caches
Dynamically Scheduled Superscalar Challenges: Complexity!!! Key issues: • Understand why it is correct • Know dependences • Will prove that dynamic execution is “correct” • Understand how it brings high performance • Will see wield designs • Will use Verilog, simulation to help understanding • Have big pictures
Memory System Performance • A typical memory hierarchy today: • Here we focus on L1/L2/L3 caches, virtual memory and main memory Proc/Regs L1-Cache Bigger Faster L2-Cache L3-Cache (optional) Memory Disk, Tape, etc.
Memory System Performance Memory Stall CPI = Miss per inst × miss penalty = % Mem Inst × Miss rate × Miss Penalty Assume 20% memory instruction, 2% miss rate, 400-cycle miss penalty. How much is memory stall CPI?
Cache Design Many applications are memory-bound • CPU speeds increases fast; memory speed cannot match up Cache hierarchy: exploits program locality • Basic principles of cache designs • Hardware cache optimizations • Application cache optimizations • Prefetching techniques Also talk about virtual memory
High Performance Storage Systems What limits the performance of web servers? Storage! • Storage technology trends • RAID: Redundant array of inexpensive disks
Multiprocessor Systems Must exploit thread-level parallelism for further performance improvement Shared-memory multiprocessors: Cooperating programs see the same memory address How to build them? • Cache coherence • Memory consistency
Other Topics • VLIW basics and modern VLIW processors • Simultaneous multithreading and chip-level multiprocessing • Low power processor design • Circuit issues in high-performance processor • Other selected topics
Why Study Computer Architecture As a hardware designer/researcher – know how to design processor, cache, storage, graphics, interconnect, and so on As a system designer – know how to build a computer system using the best components available As a software designer – know how to get the best performance from the hardware