1 / 24

Stalling

Stalling. The easiest solution is to stall the pipeline We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes called a bubble Notice that we’re still using forwarding in cycle 5, to get data from the MEM/WB pipeline register to the ALU. IM. Reg.

hamal
Télécharger la présentation

Stalling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stalling The easiest solution is to stall the pipeline We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes called a bubble Notice that we’re still using forwarding in cycle 5, to get data from the MEM/WB pipeline register to the ALU IM Reg DM Reg Clock cycle 1 2 3 4 5 6 7 lw $2, 20($3) and $12, $2, $5 IM Reg DM Reg 1

  2. Stalling and forwarding Without forwarding, we’d have to stall for two cycles to wait for the LW instruction’s writeback stage In general, you can always stall to avoid hazards—but dependencies are very common in real code, and stalling often can reduce performance by a significant amount IM Reg DM Reg Clock cycle 1 2 3 4 5 6 7 8 lw $2, 20($3) and $12, $2, $5 IM Reg DM Reg 2

  3. Load-Use Hazard Detection • Check when using instruction is decoded in ID stage • ALU operand register numbers in ID stage are given by • IF/ID.RegisterRs, IF/ID.RegisterRt • Load-use hazard when • ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt)) • If detected, stall and insert bubble

  4. How to Stall the Pipeline • Force control values in ID/EX registerto 0 • EX, MEM and WB do nop (no-operation) • Prevent update of PC and IF/ID register • Using instruction is decoded again • Following instruction is fetched again • 1-cycle stall allows MEM to read data for lw • Can subsequently forward to EX stage

  5. Stalling delays the entire pipeline If we delay the second instruction, we’ll have to delay the third one too This is necessary to make forwarding work between AND and OR It also prevents problems such as two instructions trying to write to the same register in the same cycle IM Reg DM Reg Clock cycle 1 2 3 4 5 6 7 8 lw $2, 20($3) and $12, $2, $5 or $13, $12, $2 IM Reg DM Reg IM Reg DM Reg 5

  6. But what about the ALU during cycle 4, the data memory in cycle 5, and the register file write in cycle 6? Those units aren’t used in those cycles because of the stall, so we can set the EX, MEM and WB control signals to all 0s. What about EX, MEM, WB IM Reg DM Reg Clock cycle 1 2 3 4 5 6 7 8 lw $2, 20($3) and $12, $2, $5 or $13, $12, $2 IM Reg Reg DM Reg IM IM Reg DM Reg 6

  7. Detecting Stalls, cont. When should stalls be detected? EX stage (of the instruction causing the stall) IM Reg DM Reg lw $2, 20($3) and $12, $2, $5 mem\wb ex/mem id/ex if/id mem\wb IM Reg Reg DM Reg id/ex ex/mem if/id if/id • What is the stall condition? if (ID/EX.MemRead = 1 and (ID/EX.rt = IF/ID.rs or ID/EX.rt = IF/ID.rt)) then stall 7

  8. Adding hazard detection to the CPU 0 1 ID/EX.MemRead Hazard Unit ID/EX.RegisterRt ID/EX 0 IF/ID Write Rs Rt 0 1 EX/MEM WB PC Write MEM/WB M WB PC Control EX M WB IF/ID Read register 1 Read data 1 0 1 2 Addr Instr Read register 2 ALU Zero ALUSrc Write register Read data 2 Result Address 0 1 2 Instruction memory 0 1 Data memory Write data Registers Write data Read data Instr [15 - 0] 1 0 RegDst Extend Rt Rd EX/MEM.RegisterRd Rs Forwarding Unit MEM/WB.RegisterRd 8

  9. Stalls and Performance • Stalls reduce performance • But are required to get correct results • Compiler can arrange code to avoid hazards and stalls • Requires knowledge of the pipeline structure

  10. Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction Ex: c code for A = B + E; C = B + F; lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles

  11. Branches in the original pipelined datapath 1 0 0 1 Add Add When are they resolved? ID/EX EX/MEM WB PCSrc Control MEM/WB M WB IF/ID EX M WB 4 P C Shift left 2 RegWrite Read register 1 Read data 1 MemWrite ALU Read address Instruction [31-0] Zero Read register 2 Read data 2 0 1 Result Address Write register Data memory Instruction memory MemToReg Registers ALUOp Write data ALUSrc Write data Read data 1 0 Instr [15 - 0] Sign extend RegDst MemRead Instr [20 - 16] Instr [15 - 11] 11

  12. Branch Hazards If branch outcome determined in MEM: Flush theseinstructions (Set controlvalues to 0) PC

  13. Reducing Branch Delay Move hardware to determine outcome to ID stage • Target address adder • Register comparator Example: branch taken 36: sub $10, $4, $840: beq $1, $3, 744: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7 ...72: lw $4, 50($7)

  14. Example: Branch Taken

  15. Example: Branch Taken

  16. IF IF IF IF ID ID ID ID EX EX EX EX MEM MEM MEM MEM WB WB WB WB Data Hazards for Branches If a comparison register is a destination of 2nd or 3rd preceding ALU instruction add $1, $2, $3 add $4, $5, $6 … beq $1, $4, target Can resolve using forwarding

  17. IF IF ID ID EX EX MEM MEM WB WB Data Hazards for Branches If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction Need 1 stall cycle lw $1, addr add $4, $5, $6 IF ID beq stalled ID EX MEM WB beq $1, $4, target

  18. IF ID EX MEM WB Data Hazards for Branches If a comparison register is a destination of immediately preceding load instruction • Need 2 stall cycles lw $1, addr IF ID beq stalled ID beq stalled ID EX MEM WB beq $1, $0, target

  19. Branch Prediction • Longer pipelines can’t readily determine branch outcome early • Stall penalty becomes unacceptable • Predict (i.e., guess) outcome of branch • Only stall if prediction is wrong • Simplest prediction strategy • predict branches not taken • Works well for loops if the loop tests are done at the start. • Fetch instruction after branch, with no delay

  20. Dynamic Branch Prediction • In deeper and superscalar pipelines, branch penalty is more significant • Use dynamic prediction • Branch prediction buffer (aka branch history table) • Indexed by recent branch instruction addresses • Stores outcome (taken/not taken) • To execute a branch • Check table, expect the same outcome • Start fetching from fall-through or target • If wrong, flush pipeline and flip prediction

  21. 1-Bit Predictor: Shortcoming Inner loop branches mispredicted twice! outer: … …inner: … … beq …, …, inner … beq …, …, outer • Mispredict as taken on last iteration of inner loop • Then mispredict as not taken on first iteration of inner loop next time around

  22. 2-Bit Predictor Only change prediction on two successive mispredictions

  23. Calculating the Branch Target • Even with predictor, still need to calculate the target address • 1-cycle penalty for a taken branch • Branch target buffer • Cache of target addresses • Indexed by PC when instruction fetched • If hit and instruction is branch predicted taken, can fetch target immediately

  24. Concluding Remarks • ISA influences design of datapath and control • Datapath and control influence design of ISA • Pipelining improves instruction throughputusing parallelism • More instructions completed per second • Latency for each instruction not reduced • Hazards: structural, data, control • Main additions in hardware: • forwarding unit • hazard detection and stalling • branch predictor • branch target table

More Related