1 / 48

CSC 2405 Computer Systems II

CSC 2405 Computer Systems II. Advanced Topics. Instruction Set Architecture. Application Program. Compiler. OS. ISA. CPU Design. Circuit Design. Chip Layout. Instruction Set Architecture. Assembly Language View Processor state Registers, memory, … Instructions

tyme
Télécharger la présentation

CSC 2405 Computer Systems II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 2405Computer Systems II Advanced Topics

  2. Instruction Set Architecture

  3. Application Program Compiler OS ISA CPU Design Circuit Design Chip Layout Instruction Set Architecture • Assembly Language View • Processor state • Registers, memory, … • Instructions • addl, movl, leal, … • How instructions are encoded as bytes • Layer of Abstraction • Above: how to program machine • Processor executes instructions in a sequence • Below: what needs to be built • Use variety of tricks to make it run fast • E.g., execute multiple instructions simultaneously Chapter 4

  4. Instruction Set Architectures Basic ISA Classes The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B. Registers are the class that won out. The more registers on the CPU, the better. Chapter 4

  5. 80x86 Instruction Frequency Chapter 4

  6. Relative Frequency of Control Instructions Design hardware to handle branches quickly, since these occur most frequently Chapter 4

  7. CISC Instruction Sets • Complex Instruction Set Computer • Dominant style through mid-80’s • Stack-oriented instruction set • Use stack to pass arguments, save program counter • Explicit push and pop instructions • Arithmetic instructions can access memory • addl %eax, 12(%ebx,%ecx,4) • requires memory read and write • Complex address calculation • Condition codes • Set as side effect of arithmetic and logical instructions • Philosophy • Add instructions to perform “typical” programming tasks Chapter 4

  8. RISC Instruction Sets • Reduced Instruction Set Computer • Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) • Fewer, simpler instructions • Might take more to get given task done • Can execute them with small and fast hardware • Register-oriented instruction set • Many more (typically 32) registers • Use for arguments, return pointer, temporaries • Only load and store instructions can access memory • Similar to Y86 mrmovl and rmmovl • No Condition codes • Test instructions return 0/1 in register Chapter 4

  9. Example RISC Instruction Formats Register-Register (R-type) ADD R1, R2, R3 6 5 11 10 31 26 25 21 20 16 15 0 Op rs1 rs2 rd func (ALI reg. operations, read/write special registers and moves) Register-Immediate (I-type) SUB R1, R2, #3 31 26 25 21 20 16 15 0 immediate Op rs1 rd (ALU imm. operations, loads and stores, conditional branch, jump (and link) Jump / Call (J-type) JUMP end 31 26 25 0 offset added to PC Op (jump, jump and link, trap and return from exception) Chapter 4

  10. CISC vs. RISC • Original Debate • Strong opinions! • CISC proponents---easy for compiler, fewer code bytes • RISC proponents---better for optimizing compilers, can make run fast with simple chip design • Current Status • For desktop processors, choice of ISA not a technical issue • With enough hardware, can make anything run fast • Code compatibility more important • For embedded processors, RISC makes sense • Smaller, cheaper, less power Chapter 4

  11. Logic Design

  12. Overview of Logic Design • Fundamental Hardware Requirements • Communication • How to get values from one place to another • Computation • Storage • Bits are Our Friends • Everything expressed in terms of values 0 and 1 • Communication • Low or high voltage on wire • Computation • Compute Boolean functions • Storage • Store bits of information Chapter 4

  13. 0 1 0 Voltage Time Digital Signals • Use voltage thresholds to extract discrete values from continuous signal • Simplest version: 1-bit signal • Either high range (1) or low range (0) • With guard range between them • Not strongly affected by noise or low quality circuit elements • Can make circuits simple, small, and fast Chapter 4

  14. a && b Computing with Logic Gates • Outputs are Boolean functions of inputs • Respond continuously to changes in inputs • With some, small delay Falling Delay Rising Delay b Voltage a Time Chapter 4

  15. Acyclic Network Primary Inputs Primary Outputs Combinational Circuits • Acyclic Network of Logic Gates • Continuously responds to changes on primary inputs • Primary outputs become (after some delay) Boolean functions of primary inputs Chapter 4

  16. Bit equal a eq b Bit Equality • Generate 1 if a and b are equal • Hardware Control Language (HCL) • Very simple hardware description language • Boolean operations have syntax similar to C logical operations • We’ll use it to describe control logic for processors HCL Expression bool eq = (a&&b)||(!a&&!b) Chapter 4

  17. b31 Bit equal eq31 a31 b30 Bit equal eq30 a30 Eq B = Eq b1 Bit equal eq1 A a1 b0 Bit equal eq0 a0 Word Equality Word-Level Representation • 32-bit word size • HCL representation • Equality operation • Generates Boolean value HCL Representation bool Eq = (A == B) Chapter 4

  18. D R Q+ Q– C S Latching Storing d !d !d !d d d !d 0 !q q 1 0 !q q d d !d 0 1-Bit Latch D Latch Data Clock Chapter 4

  19. i7 D o7 Q+ C i6 D o6 Q+ C i5 D o5 Q+ C i4 D o4 Q+ I O C i3 D o3 Q+ C i2 D o2 Q+ C i1 Clock D o1 Q+ C i0 D o0 Q+ C Clock Registers Structure • Stores word of data • Different from program registers seen in assembly code • Collection of edge-triggered latches • Loads input on rising edge of clock Chapter 4

  20. valA Register file srcA A valW Read ports W dstW Write port valB srcB B Clock Random-Access Memory • Stores multiple words of memory • Address input specifies which word to read or write • Register file • Holds values of program registers • %eax, %esp, etc. • Register identifier serves as address • ID 8 implies no read or write performed • Multiple Ports • Can read and/or write multiple words in one cycle • Each has separate address and data input/output Chapter 4

  21. Basic Logic Gates NOTE: okay to use just a circle for NOT:  Chapter 4

  22. More than 2 Inputs? • AND/OR can take any number of inputs. • AND = 1 if all inputs are 1. • OR = 1 if any input is 1. • Similar for NAND/NOR. • Can implement with multiple two-input gates Chapter 4

  23. Logical Completeness • Can implement ANY truth table with AND, OR, NOT. 1. AND combinations that yield a "1" in the truth table. 2. OR the resultsof the AND gates. Chapter 4

  24. DeMorgan's Law • Converting AND to OR (with some help from NOT) • Consider the following gate: To convert AND to OR (or vice versa), invert inputs and output. Chapter 4

  25. Decoder • n inputs, 2n outputs • exactly one output is 1 for each possible input pattern 2-bit decoder Chapter 4

  26. Sequential Processors

  27. newPC Sequential HW Structure PC valE , valM Write back valM • State • Program counter register (PC) • Condition code register (CC) • Register File • Memories • Access same memory space • Data: for reading/writing program data • Instruction: for reading instructions • Instruction Flow • Read instruction at address specified by PC • Process through stages • Update program counter Data Data Memory memory memory Addr , Data valE CC CC ALU ALU Execute Bch aluA , aluB valA , valB srcA , srcB Decode A A B B dstA , dstB M M Register Register Register Register file file file file E E icode , ifun valP rA , rB valC Instruction PC Instruction PC memory increment Fetch memory increment PC Chapter 4

  28. newPC Seqential Stages PC valE , valM Write back valM • Fetch • Read instruction from instruction memory • Decode • Read program registers • Execute • Compute value or address • Memory • Read or write data • Write Back • Write program registers • PC • Update program counter Data Data Memory memory memory Addr , Data valE CC CC ALU ALU Execute Bch aluA , aluB valA , valB srcA , srcB Decode A A B B dstA , dstB M M Register Register Register Register file file file file E E icode , ifun valP rA , rB valC Instruction PC Instruction PC memory increment Fetch memory increment PC Chapter 4

  29. Optional Optional D icode 5 0 rA rB ifun rA rB valC Instruction Decoding • Instruction Format • Instruction byte icode:ifun • Optional register byte rA:rB • Optional constant word valC Chapter 4

  30. Sequential Summary • Implementation • Express every instruction as series of simple steps • Follow same general flow for each instruction type • Assemble registers, memories, predesigned combinational blocks • Connect with control logic • Limitations • Too slow to be practical • In one cycle, must propagate through instruction memory, register file, ALU, and data memory • Would need to run clock very slowly • Hardware units only active for fraction of clock cycle Chapter 4

  31. Pipelined Processors

  32. What is Pipelining • Computers execute billions of instructions, so instruction throughput is what matters • IDEA: Divide instruction execution up into several pipeline stages. For example IF ID EX MEM WB • Simultaneously have different instructions in different pipeline stages • The length of the longest pipeline stage determines the cycle time • Desirable pipeline features (e.g., RISC): • all instructions same length • registers located in same place in instruction format • memory operands only in loads or stores Chapter 4

  33. A B C D What Is Pipelining Laundry Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold • Washer takes 30 minutes • Dryer takes 40 minutes • “Folder” takes 20 minutes Chapter 4

  34. 6 PM Midnight 7 8 9 11 10 Time 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e r A B C D What Is Pipelining Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? Chapter 4

  35. 30 40 40 40 40 20 A B C D What Is Pipelining Start work ASAP • Pipelined laundry takes 3.5 hours for 4 loads 6 PM Midnight 7 8 9 11 10 Time T a s k O r d e r Chapter 4

  36. 30 40 40 40 40 20 A B C D What Is Pipelining Pipelining Lessons • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Pipeline rate limited by slowest pipeline stage • Multiple tasks operating simultaneously • Potential speedup = Numberpipe stages • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 7 8 9 Time T a s k O r d e r Chapter 4

  37. Idea Divide process into independent stages Move objects through stages in sequence At any given times, multiple objects being processed Parallel Sequential Pipelined Real-World Pipelines: Car Washes Chapter 4

  38. OP1 A A A B B B C C C OP2 OP3 OP1 Time OP2 Time OP3 Pipeline Diagrams • Unpipelined • Cannot start new operation until previous one completes • 3-Way Pipelined • Up to 3 operations in process simultaneously Chapter 4

  39. R e g Combinational logic Clock OP1 OP2 OP3 Time Data Dependencies • System • Each operation depends on result from preceding one Chapter 4

  40. A A A A B B B B C C C C Comb. logic A R e g Comb. logic B R e g Comb. logic C R e g OP1 OP2 OP3 OP4 Time Clock Data Hazards • Result does not feed back around in time for next operation • Pipelining has changed behavior of system Chapter 4

  41. Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem ALU ALU ALU ALU ALU Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load DMem Instr 1 Instr 2 Instr 3 Ifetch Instr 4 One Memory Port/Structural Hazards Chapter 4

  42. Reg Reg Reg Reg Reg Reg Reg Reg Ifetch Ifetch Ifetch Ifetch DMem DMem DMem ALU ALU ALU ALU Bubble Bubble Bubble Bubble Bubble One Memory Port/Structural Hazards Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load DMem Instr 1 Instr 2 Stall Instr 3 How do you “bubble” the pipe? Chapter 4

  43. Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU ALU Ifetch Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem DMem EX WB MEM IF ID/RF I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Data Hazard on R1 Time (clock cycles) Chapter 4

  44. Three Generic Data Hazards • Read After Write (RAW)InstrJ tries to read operand before InstrI writes it • Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. I: add r1,r2,r3 J: sub r4,r1,r3 Chapter 4

  45. I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards • Write After Read (WAR)InstrJ writes operand before InstrI reads it • Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”. Chapter 4

  46. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Three Generic Data Hazards • Write After Write (WAW)InstrJ writes operand before InstrI writes it. • Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”. Chapter 4

  47. Data Forwarding • Naïve Pipeline • Register isn’t written until completion of write-back stage • Source operands read from register file in decode stage • Needs to be in register file at start of stage • Observation • Value generated in execute or memory stage • Trick • Pass value directly from generating instruction to decode stage • Needs to be available at end of decode stage Chapter 4

  48. Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU ALU Ifetch Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem DMem I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Forwarding to Avoid Data Hazard Time (clock cycles) Chapter 4

More Related