1 / 21

Lecture 1: Introduction

Lecture 1: Introduction. CprE 58 1 Computer Systems Architecture, Fall 2005 Zhao Zhang. Traditional “Computer Architecture”.

jag
Télécharger la présentation

Lecture 1: Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2005 Zhao Zhang

  2. Traditional “Computer Architecture” The term architecture is used here to describe the attribute of a system as seen by the programmer, i.e., theconceptual structure and functional behavior as distinct from the organization of the data flow and controls, the logic design, and the physical implementation. • Gene Amdahl, IBM Journal R&D, April 1964

  3. Contemporary “Computer Architecture” • Instruction set architecture • Microarchitecture: • Pipeline structures • Cache memories • Implementations • Logic design and synthesis

  4. Fundamentals • Technology trends • Performance evaluation methodologies • Instruction Set Architecture

  5. Technology Drives for High-Performance VLSI technology: faster transistors and larger transistor budget

  6. CPU Performance For sequential program: CPU time = #Inst  CPI  Clock cycle time To improve performance • Faster clock time • Reduce #inst • Reduce CPI or increase IPC

  7. How to use one billion transistors? • Bit-level parallelism • Move from 32-bit to 64-bit • Instruction-level parallelism • Deep pipeline • Execute multiple instructions per cycle • Program locality • Large caches, more branch prediction resouces • Thread-level parallelism

  8. Instruction-Level Parallelism Pipeline + Multi-issue IF IF IF IF IF ID ID ID ID ID EX EX EX EX EX MEM MEM MEM MEM MEM WB WB WB WB WB

  9. for (i=0; i<N; i++) X[i] = a*X[i]; // let R3=&X[0],R4=&X[N] // and F0=a LOOP:LD.D F2, 0(R3) MUL.D F2, F2, F0 S.D F2, 0(R3) DADD R3, R3, 8 BNE R3, R4, LOOP What instructions are parallel? How to schedule those instructions? Instruction-level Parallelism

  10. Instruction-Level Parallelism Find independent instructions through dependence analysis • Hardware approaches => Dynamically scheduled superscalar • Most commonly used today: Intel Pentium, AMD, Sun UltraSparc, and MIPS families • Software approaches => (1) Static scheduled superscalar, or (2) VLIW

  11. Modern Superscalar Processors Example: Intel Pentium, IBM Power/PowerPC, Sun UltraSparc, SGI MIPS … • Multi-issue and Deep pipelining • Dynamic scheduling and speculative execution • High bandwidth L1 caches and large L2/L3 caches

  12. Modern Superscalar Processor Challenges: Complexity!!! • How • Understand how it brings high performance • Will see wield designs • Will use Verilog, simulation to help understanding • Have big pictures

  13. Modern Superscalar Processor Maintain register data flow • Register renaming • Instruction scheduling Maintain control flow • Branch prediction • Speculative execution and recovery Maintain memory data flow • Load and store queues • Memory dependence speculation

  14. Memory System Performance Memory Stall CPI = Miss per inst × miss penalty = % Mem Inst × Miss rate × Miss Penalty Assume 20% memory instruction, 2% miss rate, 400-cycle miss penalty. How much is memory stall CPI?

  15. Memory System Performance • A typical memory hierarchy today: • Here we focus on L1/L2/L3 caches, virtual memory and main memory Proc/Regs L1-Cache Bigger Faster L2-Cache L3-Cache (optional) Memory Disk, Tape, etc.

  16. Cache Design Many applications are memory-bound • CPU speeds increases fast; memory speed cannot match up Cache hierarchy: exploits program locality • Basic principles of cache designs • Hardware cache optimizations • Application cache optimizations • Prefetching techniques Also talk about virtual memory

  17. High Performance Storage Systems What limits the performance of web servers? Storage! • Storage technology trends • RAID: Redundant array of inexpensive disks

  18. Multiprocessor Systems Must exploit thread-level parallelism for further performance improvement Shared-memory multiprocessors: Cooperating programs see the same memory address How to build them? • Cache coherence • Memory consistency

  19. Emerging Techniques • Low power design • Multicore and multithreaded processors • Secure processor • Reliable design

  20. Why Study Computer Architecture As a hardware designer/researcher – know how to design processor, cache, storage, graphics, interconnect, and so on As a system designer – know how to build a computer system using the best components available As a software designer – know how to get the best performance from the hardware

  21. Class Web Site www.ece.iastate.edu/~zzhang/cpre585/ • Syllabus • Schedule • Homework assignments • Readings WebCT: Grades, Assignments and Discussions

More Related