Midterm Review 2
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013. Midterm Review 2. Dr. Zhao Zhang Iowa State University. Announcement. No quiz today No homework this Friday Exam on Monday 9:00-9:50 HW9 deadline extended to next Friday HW8 solutions will be posted today.
Midterm Review 2
E N D
Presentation Transcript
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Midterm Review 2 Dr. Zhao Zhang Iowa State University
Announcement • No quiz today • No homework this Friday • Exam on Monday 9:00-9:50 • HW9 deadline extended to next Friday • HW8 solutions will be posted today Chapter 1 — Computer Abstractions and Technology — 2
Exam 2 Coverage • Coverage: Ch. 4, The Processor • Datapath and control • Simple MIPS pipeline • Data hazards and forwarding • Load-use hazard and pipeline stall • Control hazards • Arithmetic will NOT be covered • Will be covered in the final exam • Final exam is comprehensive Chapter 1 — Computer Abstractions and Technology — 3
Question Styles and Coverage • Short answer • True/False or multi-choice • Design and Analysis • Signal values in the datapath and control • Identify critical path • Support a new MIPS instruction • Performance analysis and optimization • Identify pipeline bubbles in program execution • Reorder instructions to improve performance • And others Chapter 1 — Computer Abstractions and Technology — 4
Nine-Instruction MIPS • They’re enough to illustrate the most aspects of CPU design, particularly datapath and control design • Some questions will use it as the baseline design Memory reference: LW and SW Arithmetic/logic: ADD, SUB, AND, OR, SLT Branch: BEQ, J Chapter 1 — Computer Abstractions and Technology — 5
Datapath With Jumps Added Chapter 4 — The Processor — 6
The Control • Control signals for the nine-instruction implementation Note: “R-” means R-format Chapter 1 — Computer Abstractions and Technology — 7
ALU Control • Truth table for ALU Control • Extend it as a secondary control unit in projects B & C, with more control signal output Chapter 4 — The Processor — 8
Extend the Single-Cycle Processor For each instruction, do we need • Any new or revised datapath element(s)? • Any new control signal(s)? Then revise, if necessary, • Datapath: Add new elements or revise existing ones, add new connections • Control Unit: Add/extend control signals, extend the truth table • ALU Control: Extend the truth table Chapter 1 — Computer Abstractions and Technology — 9
000011 address 31:26 25:0 Support JAL jal target PC = JumpAddr R[31] = PC_plus_4 PC_plus_4 = PC+4 JumpAddr = PC_plus_4[31:28] & Inst[25:0] & “00” Chapter 1 — Computer Abstractions and Technology — 10
Support JAL Make what changes tothe datapath? Chapter 4 — The Processor — 11
Support JAL • Analyze the instruction execution • Writes register $ra ($31) • Update PC with jump target • This part already done for supporting J • Analyze datapath • Needs another input, fixed at 31, to “Write register” port of register file • Needs another input, PC+4, to “Write data” port of register file • Revise control • Add a “link” signal • The (main) control unit can tell it by reading the opcode Chapter 1 — Computer Abstractions and Technology — 12
SCPv1 + JAL • Revises the two muxes • Add another input • Extend the select signals • Alternatively, use extra mux Chapter 4 — The Processor — 13
Control Signals • Control signals for the nine-instruction implementation • Add a new row for jal • Extend RegDst • Add a control line link Chapter 1 — Computer Abstractions and Technology — 14
Control Signals • Control signals for the nine-instruction implementation • Extend control input to RegDst Mux: RegDst & Link • Extend control input to MemtoReg Mux: MemtoReg & Link Chapter 1 — Computer Abstractions and Technology — 15
Simple Pipeline • Add pipeline registers hold information produced in each cycle Chapter 4 — The Processor — 16
Pipelined Control Chapter 4 — The Processor — 17
Hazards • Situations that prevent starting the next instruction safely in the next cycle • The simple pipeline won’t work correctly • Structure hazards • A required resource is busy • Data hazard • Need to wait for previous instruction to complete its data read/write • Control hazard • Deciding on control action depends on previous instruction Chapter 4 — The Processor — 18
Data Hazards Program with data dependence sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2) Program with control dependence beq $1, $3, +4addi $2, $2, 1 addi $4, $4, 1 Chapter 1 — Computer Abstractions and Technology — 19
Data Forwarding sub $2, $1,$3 # MEM=>EX forwarding and $12,$2,$5 # WB =>EX forwarding or $13,$6,$2 add $14,$2,$2 sw$15,100($2) • IF IDEX MEM WB or and sub … … AND gets forwarded new $2 value add or and sub … sw add or and sub SUB gets forwardednew $2 value Chapter 1 — Computer Abstractions and Technology — 20
Data Forwarding Paths Chapter 4 — The Processor — 21
Detecting the Need to Forward • Input • rs and rt from EX • rd and RegWrite from MEM • rd and RegWrite from WB • Output • FwdA, FwdB • Caveats • Check RegWrite • Check if rd = 0 • Forwarding from MEM wins over WB Review slides and textbook for details Chapter 4 — The Processor — 22
Load-Use Data Hazard lw $s0, 20($t1) sub $t2, $s0, $t3 • Can’t always avoid stalls by forwarding • Must stall pipeline by one cycle Chapter 4 — The Processor — 23
Datapath with Hazard Detection Chapter 4 — The Processor — 24
Hazard Detection Unit • Input • rs and rt from ID • rt and MemRead from EX • Output • PCWrite, IF/IDWrite (0 for holding instructions) • Select signal to a MUX to insert bubble in EX Read slides/textbook for details Chapter 4 — The Processor — 25
Pipeline Stall • The nop has all control signals set to zero • It does nothing at EX, MEM and WB • Prevent update of PC and IF/ID register • Using instruction is decoded again (OK) • Following instruction is fetched again (OK) • 1-cycle stall allows MEM to read data for lw • Can subsequently forward from WB to EX Chapter 4 — The Processor — 26
Code Scheduling to Avoid Stalls • Reorder code to avoid use of load result in the next instruction • C code for A = B + E; C = B + F; lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles Chapter 4 — The Processor — 27
Control Hazards • Branch determines flow of control • Two branch outcomes: Taken or Not-Taken • The CPU doesn’t recognize a branch until it reaches the end of the ID stage • Every cycle, the CPU has to fetch one instruction Chapter 4 — The Processor — 28
Control Hazards • The MIPS pipeline in textbook always predict “not-taken” • Pipeline flush on every taken branch • OK to flush because mis-fetched instructions don’t write to register/memory • But this incurs pipeline bubbles (performance penalty) • The revised MIPS pipeline move branch comparison to the ID stage • Doable for BEQ and BNE • Reduce pipeline bubbles from 3 to 1 per taken branch • Complicate data forwarding and hazard detection Chapter 4 — The Processor — 29
Revised MIPS Pipeline Chapter 4 — The Processor — 30
Revised MIPS Pipeline Note: Branch does nothing in EX, MEM and WB Chapter 4 — The Processor — 31
Performance Penalty • Any pipeline bubbles? loop: addi $1, $1, -1 lw $1, addr add $4, $5, $6 add $4, $5, $6 beq $1, $zero, loop beq $1, $4, target Chapter 1 — Computer Abstractions and Technology — 32
Delayed Branch Delayed branch may remove the one-cycle stall • The instruction right after the beq is executed no matter the branch is taken or not (sub instruction in the example) • Alternatingly saying, the execution of beq is delayed by one cycle sub $10, $4, $8 beq $1, $3, 7 beq $1, $3, 7 => sub $10, $4, $8 and $12, $2, $5 and $12, $2, $5 Must find an independent instruction, otherwise • May have to fill in a nop instruction, or • Need two variants of beq, delayed and not delayed Chapter 1 — Computer Abstractions and Technology — 33
Other Topics • Exception handling • Multi-issue pipeline Those topics will be covered in the final exam • Exam 2 will NOT cover them Chapter 1 — Computer Abstractions and Technology — 34