1 / 40

MAMAS – Computer Architecture 234367

Alex Gontmakher. MAMAS – Computer Architecture 234367. Some of the slides were taken from: (1) Lihu Rapoport (2) Randi Katz and (3) Petterson. General course information. Grade: 20% Exercise – most likely 5 assignments. 80% Final exam. Textbooks:

kenda
Télécharger la présentation

MAMAS – Computer Architecture 234367

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alex Gontmakher MAMAS – Computer Architecture234367 Some of the slides were taken from: (1) Lihu Rapoport (2) Randi Katz and (3) Petterson

  2. General course information Grade: • 20% Exercise – most likely 5 assignments. • 80% Final exam. • Textbooks: • Computer Architecture a Quantitative Approach:Hennessy & Patterson – preferably 3rd edition • Computer Organization and Design – The Hardware \ Software Interface: Patterson & Hennessy • Other course information: WEB site of the course.

  3. הערות לגבי מערך השיעורים • השיעורים והתרגולים מתואמים ביניהם, לא ניתן להבין אחד ללא השני • הגשת תרגילים • בזוגות או יחידים • עבודות מודפסות (לא בכתב יד) • במידה ויתפסו העתקות (תרגילים או מבחן) ינתן ציון סופי 0 (אפס) לכל המעורבים • בשבועות שבהם אחד המתרגלים יעדר ו/או במידה ויהיה חופש באחד התרגולים (ותרגולים האחרים באותו שבוע יתקיימו כרגיל), יושלם הקורס בהקדם האפשרי, אבל מומלץ לסטודנטים להצטרף לתרגול אחר. • החומר בקורס אינו זהה לזה שנלמד בסימסטרים קודמים לכן: • אם חומר לא נלמד (בשעור או בתרגול) לא נשאל עליו במבחן  • חומר חדש נכלל בחומר למבחן  • השקפים מערבבים אנגלית ועברית כיוון והושאלו ממקורות שונים....

  4. Before we start

  5. The paradigm (Patterson) Every Computer Scientist should master the “AAA” • Architecture • Algorithms • Applications

  6. Why Computer Architecture? We, computer architects, were lucky enough to have real impact on the computer technology We need to understand the hardware trends Most of our work are within the fields of performance evaluation and the algorithms which are implemented by the hardware. Computer Architecture The goal of Computer Architecture • To build “cost effective systems” • How do we calculate the cost of a system ? • How we evaluate the effectiveness of the system? • To optimize the system • What are the optimization points ? Fact: most of the computer systems still use Von-Neumann principle of operation, even though, internally, they are much different from the computer of that time.

  7. Introduction

  8. Cache Mem BUS Memory CPU BUS CPU Bridge I/O BUS Scsi/IDE Adap Lan Adap USB Hub Graphic Adapt Scsi Bus KeyBoard Mouse Scanner Hard Disk LAN Video Buffer Computer System Structure North South

  9. Computer systems how it looks in real systems I/O slots “North” – CPU + memory subsystem

  10. We will not focus on Low level hardware details Parallel and distributed systems (although we mention some of their basic technologies) Class Focus • Performance • How to achieve and how to measure • CPU • CPU design, pipeline, hazards • Out-of-order and speculative execution • Memory Hierarchy • Main memory • Cache • Virtual Memory • PC Architecture • Disks I/O • Advance topics • Software optimizations

  11. Trends in Computer Technologies

  12. Technology Trends CapacitySpeed Logic 2x in 3 years 2x in 3 years DRAM 4x in 3 years 1.4x in 10 years Disk 2x in 3 years 1.4x in 10 years CPU Performance Trends Logic Speed: 2x per 3 years Logic Capacity: 2x per 3 years Computing capacity: 4x per 3 years BUT: • If we could keep all the transistors busy all the time • Actual: 3.3x per 3 years X

  13. Technology and Computer Architecture

  14. Sun’s Surface 1000 Rocket Nozzle Nuclear Reactor 100 Watts2/cm Pentium III processor Hot plate Pentium II processor 10 Pentium Pro processor Pentium i386 processor i486 1           Can it last forever – or – new challenges are coming Power Power density

  15. Considerations in computer design

  16. Architecture & Microarchitecture • Architecture (ISA-Instruction Set Architecture):The collection of features of a processor (or a system) as they are seen by the “user” • User: a binary executable running on the processor, or • assembly level programmer • Microarchitecture (µarch, uarch):The collection of features or way of implementation of a processor (or a system) that do not affect the user

  17. Architecture & Microarchitecture Elements • Architecture: • Registers data width (8/16/32) • Instruction set • Addressing modes • Addressing methods (Segmentation, Paging, etc...) • Architecture: • Physical memory size • Caches size and structure • Number of execution units, number of execution pipelines • Branch prediction • TLB • Timing is considered Arch (though it is user visible!) • Processors with the same arch may have different Arch

  18. Compatibility • Backward compatibility • New hardware can run existing software • Example: Pentium 4 can run software originally written for Pentium III, Pentium II, Pentium , 486, 386, 286 • Forward compatibility • New software can run on existing hardware • Example: new software written with MMXTM must still run on older Pentium processors which do not support MMXTM • Less important than backward compatibility • New ideas: architecture independent • JIT – just in time compiler: Java and .NET • Binary translation

  19. How to compare between different systems?

  20. Benchmarks – Programs for Evaluating Processor Performance • Toy Benchmarks • 10-100 line programs • e.g.: sieve, puzzle, quicksort • Synthetic Benchmarks • Attempt to match average frequencies of real workloads • e.g., Winstone, Dhrystone • Real programs • e.g., gcc, spice • SPEC: System Performance Evaluation Cooperative • SPECint (8 integer programs) • and SPECfp (10 floating point)

  21. #cycles required to execute the program #instruction executed in the program CPI = CPI – to compare systems with same instruction set architecture (ISA) • The CPU is synchronous - it works according to a clock signal. • Clock cycle is measured in nsec (10-9 of a second). • Clock rate (= 1/clock cycle) is measured in MHz (106 cycles/second). • CPI - cycles per instruction • Average #cycles per Instruction (in a given program) • IPC (= 1/CPI) : Instructions per cycles • Clock rate is mainly affected by technology, CPI by the architecture • CPI breakdown: how many cycles (in average) the program spends for different causes; e.g., in executing, memory I/O etc.

  22. CPI (cont.) • CPIi - #cycles to execute a given type of instruction • e.g.: CPIadd= 1, CPImul = 3 • Independent of a program • Calculating the CPI of a program • ICi - #times instruction of type i was executed in the program • IC - #instruction executed in the program: • Fi - relative frequency of instruction of type i : Fi = ICi/IC • Ncyc - #cycles required to execute the program: • CPI: • This calculation does not take into account other delays such as memory, I/O

  23. CPU Time • CPU Time • The time required by the CPU to execute a given program: CPU Time = clock cycle  #cyc = clock cycle CPI IC • Our goal: minimize CPU Time • Minimize clock cycle: more MHz (process, circuit, Arch) • Minimize CPI: Arch (e.g.: more execution units) • Minimize IC: architecture (e.g.: MMXTM technology) • Speedup due to enhancement E

  24. Fractionenhanced ExTimenew = ExTimeold x (1 - Fractionenhanced) + Speedupenhanced Amdahl’s Law Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then: ExTimeold ExTimenew 1 = Speedupoverall = Fractionenhanced (1 - Fractionenhanced) + Speedupenhanced

  25. 1 Speedupoverall = = 1.053 0.95 Amdahl’s Law: Example • Floating point instructions improved to run 2X; but only 10% of actual instructions are FP ExTimenew= ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold Corollary: Make The Common Case Fast

  26. Amdahl's law: How speedup starts

  27. Amdahl's law: Where speedup ends

  28. software instruction set hardware Instruction Set Design The ISA is what the user and the compiler sees The ISA is what the hardware needs to implement

  29. Why ISA is important? • Code size • long instructions may take more time to be fetched • Requires larges memory (important in small devices, e.g., cell phones) • Number of instructions (IC) • Reducing IC reduce execution time (assuming same CPI and frequency) • Code “simplicity” • Simple HW implementation which leads to higher frequency and lower power • Code optimization can better be applied to “simple code”

  30. The impact of the ISA RISC vs CISC

  31. CISC Processors • CISC - Complex Instruction Set Computer • The idea: a high level machine language • Characteristic • Many instruction types, with many addressing modes • Some of the instructions are complex: • Perform complex tasks • Require many cycles • ALU operations directly on memory • Usually uses limited number of registers • Variable length instructions • Common instructions get short codes  save code length • Example: x86

  32. CISC Drawbacks • Compilers do not take advantage of the complex instructions and the complex indexing methods • Implement complex instructions and complex addressing modes  complicate the processor  slow down the simple, common instructions  contradict Amdahl’s law corollary: Make The Common Case Fast • Variable length instructions are real pain in the neck: • It is difficult to decode few instructions in parallel • As long as instruction is not decoded, its length is unknown  It is unknown where the instruction ends  It is unknown where the next instruction starts • An instruction may not fit into the “right behavior” of the memory hierarchy (will be discussed next lectures) • Examples: VAX, x86 (!?!)

  33. RISC Processors • RISC - Reduced Instruction Set Computer • The idea: simple instructions enable fast hardware • Characteristic • A small instruction set, with only a few instructions formats • Simple instructions • execute simple tasks • require a single cycle (with pipeline) • A few indexing methods • ALU operations on registers only • Memory is accessed using Load and Store instructions only. • Many orthogonal registers • Three address machine: Add dst, src1, src2 • Fixed length instructions • Examples: MIPSTM, SparcTM, AlphaTM, PowerPCTM

  34. RISC Processors (Cont.) • Simple architecture  Simple micro-architecture • Simple, small and fast control logic • Simpler to design and validate • Room for on die caches: instruction cache + data cache • Parallelize data and instruction access • Shorten time-to-market • Using a smart compiler • Better pipeline usage • Better register allocation • Existing RISC processor are not “pure” RISC • e.g., support division which takes many cycles

  35. RISC and Amdhal’s Law (Example) • In compare to the CISC architecture: • 10% of the static code, that executes 90% of the dynamic has the same CPI • 90% of the static code, which is only 10% of the dynamic, increases in 60% • The number of instruction being executed is increased in 50% • The speed of the processor is doubled • This was true for the time the RISC processors were invented • We get • And then

  36. So, what is better, RISC or CISC • Today CISC architectures (X86) are running as fast as RISC (or even faster) • The main reasons are: • Translates CISC instructions into RISC instructions (ucode) • CISC architecture are using “RISC like engine” • We will discuss this kind of solutions later on in this course.

  37. Virtual machines (JAVA) • Machine independent ISA • Can be run on different architectures • Each architectures has an emulation (virtual machine) that forms a “system within the system” • The code can be “compiled for the native code “on the fly” • This process is called JIT: Just-In-Time • .Net allows to combine different formats of code: • e.g., different programming languages • Pros • Portability, Flexibility • Cons • Efficiency • The JIT can apply only very basic optimization

  38. backup

  39. Integrated Circuits Costs IC cost = Die cost + Testing cost + Packaging cost Final test yield Die cost = Wafer cost Dies per Wafer * Die yield Dies per wafer = š * ( Wafer_diam / 2)2 – š * Wafer_diam – Test dies Die Area ¦ 2 * Die Area Die Yield = Wafer yield * 1 +  Defects_per_unit_area * Die_Area  } { Die Cost goes roughly with die area4

  40. Real World Examples Chip Metal Line Wafer Defect Area Dies/ Yield Die Cost layers width cost /cm2 mm2 wafer 386DX 2 0.90 $900 1.0 43 360 71% $4 486DX2 3 0.80 $1200 1.0 81 181 54% $12 PowerPC 601 4 0.80 $1700 1.3 121 115 28% $53 HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73 DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149 SuperSPARC 3 0.70 $1700 1.6 256 48 13% $272 Pentium 3 0.80 $1500 1.5 296 40 9% $417 • From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

More Related