Download
pipeline hazards data forwarding and stalls n.
Skip this Video
Loading SlideShow in 5 Seconds..
Pipeline Hazards, Data Forwarding, and Stalls PowerPoint Presentation
Download Presentation
Pipeline Hazards, Data Forwarding, and Stalls

Pipeline Hazards, Data Forwarding, and Stalls

256 Vues Download Presentation
Télécharger la présentation

Pipeline Hazards, Data Forwarding, and Stalls

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Pipeline Hazards, Data Forwarding, and Stalls Never trouble trouble till trouble troubles you.

  2. Use and Distribution Notice • Possession of any of these files implies understanding and agreement to this policy. • The slides are provided for the use of students enrolled in Jeff Six's Computer Architecture class (CMSC 411) at the University of Maryland Baltimore County. They are the creation of Mr. Six and he reserves all rights as to the slides. These slides are not to be modified or redistributed in any way. All of these slides may only be used by students for the purpose of reviewing the material covered in lecture. Any other use, including but not limited to, the modification of any slides or the sale of any slides or material, in whole or in part, is expressly prohibited. • Most of the material in these slides, including the examples, is derived from Computer Organization and Design, Second Edition. Credit is hereby given to the authors of this textbook for much of the content. This content is used here for the purpose of presenting this material in CMSC 411, which uses this textbook.

  3. Instructions with Dependencies • So far in our discussion of pipelined datapaths, we have use instruction sets that are independent. • Let us now study a sequence with many dependencies… sub $r2, $r1, $r3 and $r12, $r2, $r5 or $r13, $r6, $r2 add $r14, $r2, $r2 sw $r15, 100($r2)

  4. Dependencies sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) • The last four instruction depend on the result, stored in $2, from the first instruction. • If $2 had the value of 10 before these instructions execute and the subtract stored the value –20 in $2, the programmer intends for the last four instructions to use the –20 value, not the 10 value.

  5. T i m e ( i n c l o c k c y c l e s ) C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 V a l u e o f r e g i s t e r $ 2 : 1 0 1 0 1 0 1 0 1 0 / – 2 0 – 2 0 – 2 0 – 2 0 – 2 0 P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) R e g s u b $ 2 , $ 1 , $ 3 I M R e g D M a n d $ 1 2 , $ 2 , $ 5 I M D M R e g R e g I M D M R e g o r $ 1 3 , $ 6 , $ 2 R e g a d d $ 1 4 , $ 2 , $ 2 I M D M R e g R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M D M R e g R e g The Dependency Problem • This is a problem if we run this instruction sequence in our pipeline…

  6. Pipeline Data Hazards • Looking at this picture, the lines drawn from the top datapath to the lower ones show the dependencies. Those that must go backwards in time are known as pipeline data hazards. • This type of hazard occurs when a value is needed by a later instruction before it is stored by a previous instruction. This exists only because we are running these instructions in the pipeline; if we ran them in a nonpipelined datapath, this would not be a problem.

  7. T i m e ( i n c l o c k c y c l e s ) C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 V a l u e o f r e g i s t e r $ 2 : 1 0 1 0 1 0 1 0 1 0 / – 2 0 – 2 0 – 2 0 – 2 0 – 2 0 P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) R e g s u b $ 2 , $ 1 , $ 3 I M R e g D M a n d $ 1 2 , $ 2 , $ 5 I M D M R e g R e g I M D M R e g o r $ 1 3 , $ 6 , $ 2 R e g a d d $ 1 4 , $ 2 , $ 2 I M D M R e g R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M D M R e g R e g One Dependency Solved • Notice clock cycle five – there might be a problem here and there might not. • If we assume that when a write and a read occur on the same clock cycle, the write happens in the first half of the cycle and the read in the second half, this one dependency is resolved.

  8. T i m e ( i n c l o c k c y c l e s ) C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 V a l u e o f r e g i s t e r $ 2 : 1 0 1 0 1 0 1 0 1 0 / – 2 0 – 2 0 – 2 0 – 2 0 – 2 0 P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) R e g s u b $ 2 , $ 1 , $ 3 I M R e g D M a n d $ 1 2 , $ 2 , $ 5 I M D M R e g R e g I M D M R e g o r $ 1 3 , $ 6 , $ 2 R e g a d d $ 1 4 , $ 2 , $ 2 I M D M R e g R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M D M R e g R e g Correct and Incorrect Values • We can see that the values read for $2 would not be the result of the subtract unless they occur in cycle 5 or later. • The AND and OR get the incorrect value of 10. • The ADD and SW get the correct value of –20.

  9. A Possible Solution:Do Not Allow These Problems • One solution might be to put instructions that do not depend on the subtract instruction between it and those instructions that do. • If we cannot find anything else, the NOP (no operation) instruction can work… sub $2, $1, $3 nop nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)

  10. This Does Not Work Well • This code will work correctly in our pipeline, however, these two NOPs are two instructions that do no useful work. • These dependencies occur quite frequently, too often in fact to attempt to rely on the compiler to save the day (well, it could work, but the code would run very slow, much less than what is possible). • We must design support into the hardware to deal with resolving data hazards.

  11. Considering Dependencies • A dependency exists when an instruction tries to read a register in its EX stage that an earlier instruction intends to write in its WB stage. • This can be either the instruction immediately before or the instruction two before (in our five stage pipeline). • We have a dependency when the Rd field (destination register) of one of the two previous instructions matches the Rs or Rt field (source registers) of the current instruction.

  12. The Four Hazard Conditions • We can make this clearer by naming the fields of the pipeline registers. • For example, ID/EX.RegisterRs will refer to the Rs value that is currently in the pipeline register between the ID and EX stages. • Using this notation, we can outline the four possible hazard conditions… • (1a) EX/MEM.RegisterRd = ID/EX.RegisterRs • (1b) EX/MEM.RegisterRd = ID/EX.RegisterRt • (2a) MEM/WB.RegisterRd = ID/EX.RegisterRs • (2b) MEM/WB.RegisterRd = ID/EX.RegisterRt

  13. Classifying One Dependency • Looking at our original instruction sequence… • We can see that the first hazard is between the result of the SUB and the first read operand of the AND. This hazard arises when AND is in the EX stage (SUB is in the MEM stage)…this means we have a type 1a hazard… EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)

  14. The Other Dependencies sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) • Looking at the rest of the dependencies… • The SUB/OR dependency is a type 2b hazard… MEM/WB.RegisterRd = ID/EX.RegisterRt = $2 • The two dependencies between SUB/ADD are not hazards, as the SUB writes on the same cycle as the ADD reads (and that results in the correct data). • There is no hazard between SUB and SW. SUB has already finished writing $2 when SW reads it.

  15. Instructions Not Writing Registers • While these hazard conditions are helpful, they are not yet complete. • Some instructions do not write to registers – if we blindly followed these conditions, we would forward when it is not necessary (and not correct!). • The simplest solution is to look at the RegWrite signal of the instruction to see if it is, in fact, writing a register and we need to forward its results to later instructions.

  16. The Zero Register • Recall that register $0 is the Zero register in MIPS – a read should always result in the value of zero and a write does nothing. • So, if there is an instruction in the pipeline that writes to $0, we do not want to forward that possibly non-zero value to other instructions which may read from $0. • We can do this by adding conditions to the two hazard conditions… • (1) EX/MEM.RegisterRd != 0 • (2) MEM/WB.RegisterRd != 0

  17. Data Forwarding • Now that we have seen how to detect hazards, we can move to correcting the problem by forwarding the data from an instruction to later instructions. • We will forward the result from the EX/MEM pipeline register or the MEM/WB pipeline register to the beginning of the EX stage of the dependent instruction, directly as input to the ALU. • To see this, we can redrawn our pipeline diagram…

  18. T i m e ( i n c l o c k c y c l e s ) C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 V a l u e o f r e g i s t e r $ 2 : 1 0 1 0 1 0 1 0 1 0 / – 2 0 – 2 0 – 2 0 – 2 0 – 2 0 V a l u e o f E X / M E M : X X X – 2 0 X X X X X V a l u e o f M E M / W B : X X X X – 2 0 X X X X P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) s u b $ 2 , $ 1 , $ 3 I M R e g D M R e g a n d $ 1 2 , $ 2 , $ 5 I M R e g D M R e g o r $ 1 3 , $ 6 , $ 2 I M R e g D M R e g a d d $ 1 4 , $ 2 , $ 2 I M R e g D M R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M R e g D M R e g The Forwarding Paths • Now we show the forwarding paths necessary to correct the hazards…

  19. Constructing the Forwarding Paths • To make these forwarding paths work, we need to be able to take ALU inputs from any (well, the EX/MEM or MEM/WB) pipeline register, instead of always taking them from the ID/EX register. • We can just add multiplexors to the ALU inputs. This will let us run the pipeline at full speed even with all of these data dependencies. • The multiplexors will select the correct ALU input, be it from the ID/EX register, a forward from the EX/MEM register, or a forward from the MEM/WB register.

  20. I D / E X E X / M E M M E M / W B R e g i s t e r s A L U D a t a m e m o r y M u x The Original Datapath • Without forwarding…

  21. I D / E X E X / M E M M E M / W B M u x R e g i s t e r s F o r w a r d A A L U M D a t a u m e m o r y M x u x R s F o r w a r d B R t R t M E X / M E M . R e g i s t e r R d u R d x F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t The Forwarding Datapath • With the multiplexors to add data forwarding…

  22. The Multiplexor Control Signals • We can specify how these ForwardA and Forward B signals are computed. (1a) if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 (1b) if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 (2a) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 (2b) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

  23. The Multiplexor Control Signals • These control signals are simple to understand…

  24. One More Issue • Consider this instruction sequence… • Here we have a problem – instruction three is going to receive a forwarding result from instructions one and two. • Which one is correct? • Instruction two because it is a more recent result! add $1, $1, $2 add $1, $1, $3 add $1, $1, $4

  25. Solving This Problem • We have both a EX hazard (one instruction ago) and a MEM hazard (two instructions ago). • We can solve this by rewriting the MEM hazard condition… (2a) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 (2b) if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

  26. I D / E X W B E X / M E M M W B C o n t r o l M E M / W B E X M W B I F / I D M n o u i t c x u r t R e g i s t e r s s n D a t a I I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x I F / I D . R e g i s t e r R s R s I F / I D . R e g i s t e r R t R t I F / I D . R e g i s t e r R t R t M E X / M E M . R e g i s t e r R d u I F / I D . R e g i s t e r R d R d x F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t Our Pipelined Datapath • The complete datapath with forwarding support…

  27. Tracing Some Instructions • Now let’s trace our instruction sequence through the pipeline and see how the forwarding unit works and how the data gets forwards from instruction to instruction. • We’ll start with cycle 3, where the SUB instruction is already in the EX stage.

  28. o r $ 4 , $ 4 , $ 2 a n d $ 4 , $ 2 , $ 5 s u b $ 2 , $ 1 , $ 3 b e f o r e < 1 > b e f o r e < 2 > I D / E X 1 0 1 0 W B E X / M E M C o n t r o l M W B M E M / W B E X M W B I F / I D 2 $ 2 $ 1 M n u o 5 i t x c u r t R e g i s t e r s s D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 5 $ 3 u x M u x 2 1 5 3 M u 4 2 x F o r w a r d n g u n i t C l o c k 3 Clock Cycle 3 • Nothing special yet… i i

  29. a d d $ 9 , $ 4 , $ 2 o r $ 4 , $ 4 , $ 2 a n d $ 4 , $ 2 , $ 5 s u b $ 2 , . . . b e f o r e < 1 > I D / E X 1 0 1 0 W B E X / M E M 1 0 C o n t r o l M W B M E M / W B E X M W B I F / I D 4 $ 4 $ 2 M n u o 6 i t x c u r t R e g i s t e r s s D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 2 $ 5 u x M u x 2 2 6 5 M 2 u 4 4 x F o r w a r d n g u n i t C l o c k 4 Clock Cycle 4 • There is an EX hazard; the forwarding unit is forwarding the value…

  30. a f t e r < 1 > a d d $ 9 , $ 4 , $ 2 o r $ 4 , $ 4 , $ 2 a n d $ 4 , . . . s u b $ 2 , . . . I D / E X 1 0 1 0 W B E X / M E M 1 0 C o n t r o l M W B M E M / W B 1 E X M W B I F / I D 4 $ 4 $ 4 M n u o 2 i t x c u r t R e g i s t e r s s 2 D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 2 $ 2 u x M u x 4 4 2 2 M 4 2 u 9 4 x F o r w a r d i n g u n i t C l o c k 5 Clock Cycle 5 • There is an EX hazard and a MEM hazard; the forwarding unit is forwarding both values…

  31. a f t e r < 1 > a d d $ 9 , $ 4 , $ 2 o r $ 4 , . . . a n d $ 4 , . . . a f t e r < 2 > I D / E X 1 0 W B E X / M E M 1 0 M W B C o n t r o l M E M / W B 1 E X M W B I F / I D $ 4 M n u o i t x c u r t R e g i s t e r s s 4 D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 2 u x M u x 4 2 4 4 M 9 u x F o r w a r d i n g u n i t C l o c k 6 Clock Cycle 6 • There is an EX hazard; the forwarding unit is forwarding the value…

  32. T i m e ( i n c l o c k c y c l e s ) P r o g r a m C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) R e g l w $ 2 , 2 0 ( $ 1 ) I M D M R e g a n d $ 4 , $ 2 , $ 5 I M R e g D M R e g o r $ 8 , $ 2 , $ 6 I M R e g D M R e g a d d $ 9 , $ 4 , $ 2 I M R e g D M R e g s l t $ 1 , $ 6 , $ 7 I M D M R e g R e g Load Instruction Hazards • Data forwarding cannot help when an instruction tries to read a register following a load into that register. • The data is still being read from the data memory when the ALU is performing its operation for the next instruction…

  33. The Hazard Detection Unit • We need to introduce a pipeline stall, where the pipeline will wait for the necessary value to become available before allowing the dependent instruction to proceed (between the load and the use of the data). • This can be accomplished by the introduction of a hazard detection unit; its job is to look at instructions and determine if a pipeline stall (or bubble) needs to be introduced to resolve this type of hazard.

  34. P r o g r a m T i m e ( i n c l o c k c y c l e s ) e x e c u t i o n C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 C C 1 0 o r d e r ( i n i n s t r u c t i o n s ) R e g D M R e g I M l w $ 2 , 2 0 ( $ 1 ) R e g D M I M R e g R e g a n d $ 4 , $ 2 , $ 5 R e g o r $ 8 , $ 2 , $ 6 D M R e g I M I M b u b b l e a d d $ 9 , $ 4 , $ 2 R e g I M D M R e g s l t $ 1 , $ 6 , $ 7 R e g D M I M R e g Stalling the Pipeline • The bubble is a simple idea…

  35. The HDU Logic • The condition for introducing a bubble (the entire logic for the hazard detection unit/HDU) is pretty simple… if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRr = IF/ID.RegisterRt))) stall the pipeline • This condition checks… • is the instruction a load? • is the load destination the same register as one of the operands of the following instruction? • If so, the HDU will introduce a one cycle stall. After this one cycle, forwarding can handle the dependency and execution will proceed.

  36. The Bubble Itself • The stall must prevent the instructions in the ID stage and the IF stage from proceeding to the next stage. • This is accomplished by simply preventing the PC and the ID/IF pipeline registers from changing. • The instruction in IF will continue reading using the same PC and the registers in the ID stage will continue to be read using the same instruction fields. • Turning back to the laundry, this is the same as restarting the washer with the same clothes in it and letting the dryer run while empty.

  37. The Bubble and the NOP • This bubble has the same effect as the NOP instruction we have seen. • While the IF and ID stages hold their instruction, the bubble/NOP begins at the EX stage. • We identified the hazard in the ID stage. • This lets us set all of the propagated control signals in the ID/EX pipeline register to zero, causing a bubble to be inserted – no registers or memories are written to if all of the control signals are zero (a NOP). • The bubble then propagates through, delaying everything after it until it exits the pipeline.

  38. P r o g r a m T i m e ( i n c l o c k c y c l e s ) e x e c u t i o n C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 C C 1 0 o r d e r ( i n i n s t r u c t i o n s ) R e g D M R e g I M l w $ 2 , 2 0 ( $ 1 ) R e g D M I M R e g R e g a n d $ 4 , $ 2 , $ 5 R e g o r $ 8 , $ 2 , $ 6 D M R e g I M I M b u b b l e a d d $ 9 , $ 4 , $ 2 R e g I M D M R e g s l t $ 1 , $ 6 , $ 7 R e g D M I M R e g The Bubble and the Delay • The bubble delays everything after it…but it allows the necessary delay for everything to work correctly and remove the hazard.

  39. I D / E X . M e m R e a d H a z a r d d e t e c t i o n u n i t I D / E X e W B t i E X / M E M r W D M I / C o n t r o l M W B u F I M E M / W B x 0 E X M W B I F / I D e t i r W M n C o u P i t c x u r t s R e g i s t e r s n D a t a I I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x I F / I D . R e g i s t e r R s I F / I D . R e g i s t e r R t R t I F / I D . R e g i s t e r R t M E X / M E M . R e g i s t e r R d u I F / I D . R e g i s t e r R d R d x I D / E X . R e g i s t e r R t R s F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t R t The New Pipelined Datapath • The HDU can introduce a bubble (the new MUX) and stall the IF and ID instructions (the write signals).

  40. Tracing Some Instructions • Let’s trace our instruction sequence through our new pipelined datapath and see the HDU and forwarding unit in action. • We’ll start with the load instruction in the ID stage and go from there.

  41. a n d $ 4 , $ 2 , $ 5 l w $ 2 , 2 0 ( $ 1 ) b e f o r e < 1 > b e f o r e < 2 > b e f o r e < 3 > H a z a r d I D / E X . M e m R e a d d e t e c t i o n 1 u n t I D / E X X 1 1 W B e t E X / M E M i r W M D I / u M C o n t r o l W B F M E M / W B I x 0 E X M W B I F / I D 1 $ 1 e t r W M n C u o X P i t x c u r t R e g i s t e r s s D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ X u x M u x 1 X 2 M u x I D / E X . R e g i s t e r R t F o r w a r d i n g u n i t C l o c k 2 Clock Cycle 2 i i • Nothing special yet… i i

  42. l w $ 2 , 2 0 ( $ 1 ) o r $ 4 , $ 4 , $ 2 a n d $ 4 , $ 2 , $ 5 b e f o r e < 1 > b e f o r e < 2 > H a z a r d I D / E X . M e m R e a d d e t e c t i o n 2 u n t I D / E X 5 1 1 0 0 W B e t E X / M E M i r W M D I / u M W B C o n t r o l F M E M / W B I x 0 E X M W B I F / I D $ 2 $ 1 2 e t i r W M n C u o 5 P i t x c u r t R e g i s t e r s s D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 5 $ X u x M u x 2 1 5 X 2 M u 4 x I D / E X . R e g i s t e r R t F o r w a r d n g u n i t C l o c k 3 Clock Cycle 3 • The HDU detects a hazard. It feeds control zeros into the ID/EX register, introducing a bubble.

  43. o r $ 4 , $ 4 , $ 2 a n d $ 4 , $ 2 , $ 5 b u b b l e l w $ 2 , . . . b e f o r e < 1 > H a z a r d I D / E X . M e m R e a d d e t e c t i o n 2 u n i t I D / E X 5 1 0 0 0 W B e t E X / M E M i r W M D 1 1 I / u M W B C o n t r o l F M E M / W B I x 0 E X M W B I F / I D 2 $ 2 $ 2 e t i r W M n C u o 5 P i t x c u r t R e g i s t e r s s D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 5 $ 5 u x M u x 2 2 5 5 2 M 4 4 u x I D / E X . R e g i s t e r R t F o r w a r d i n g u n i t C l o c k 4 Clock Cycle 4 • The 2nd and 3rd instructions are stalled in IF and ID. The forwarding unit forwards an invalid/unused value.

  44. a d d $ 9 , $ 4 , $ 2 o r $ 4 , $ 4 , $ 2 a n d $ 4 , $ 2 , $ 5 b u b b l e l w $ 2 , . . . H a z a r d I D / E X . M e m R e a d d e t e c t i o n 4 u n i t I D / E X 2 1 0 1 0 e W B t E X / M E M i r W M D 0 I / u C o n t r o l M W B F M E M / W B I x 0 1 1 E X M W B I F / I D 4 e $ 4 $ 2 t i r W M n C u o 2 P i t x c u r t R e g i s t e r s s 2 D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 2 $ 5 u x M u x 4 2 2 5 M 2 u 4 4 x I D / E X . R e g i s t e r R t F o r w a r d i n g u n i t C l o c k 5 Clock Cycle 5 • The forwarding unit forwards the value read from the memory into the EX stage and the second instruction.

  45. a f t e r < 1 > a d d $ 9 , $ 4 , $ 2 a n d $ 4 , . . . o r $ 4 , $ 4 , $ 2 b u b b l e H a z a r d I D / E X . M e m R e a d d e t e c t i o n 4 u n i t I D / E X 2 1 0 1 0 W B e t E X / M E M i r W M D 1 0 I / u M W B C o n t r o l F M E M / W B I x 0 0 E X M W B I F / I D 4 $ 4 e $ 4 t i r W M n C u o 2 P i t x c u r t R e g i s t e r s s D a t a n I n s t r u c t i o n I A L U P C m e m o r y M m e m o r y $ 2 $ 2 u M x u x 4 4 2 2 M 4 9 u 4 x I D / E X . R e g s t e r R t F o r w a r d i n g u n i t C l o c k 6 Clock Cycle 6 • The forwarding unit forwards the value from the MEM stage into the EX stage. i

  46. a f t e r < 2 > a f t e r < 1 > a d d $ 9 , $ 4 , $ 2 o r $ 4 , . . . a n d $ 4 , . . . H a z a r d I D / E X . M e m R e a d d e t e c t i o n u n i t I D / E X 1 0 1 0 W B e t E X / M E M i r W M D 1 0 I / u M W B C o n t r o l F M E M / W B I x 0 1 E X M W B I F / I D $ 4 e t i r W M n C u o P i t x c u r t R e g i s t e r s s 4 D a t a n I I n s t r u c t i o n A L U P C m e m o r y M m e m o r y $ 2 u x M u x 4 2 M 4 4 9 u x I D / E X . R e g i s t e r R t F o r w a r d i n g u n i t C l o c k 7 Clock Cycle 7 • The forwarding unit forwards the value from the MEM stage into the EX stage.

  47. Control Hazards • We have now seen how to deal with data dependencies, those that involve arithmetic and memory transfer instructions. • We also have control or branchhazards. This stems from the fact than an instruction must be fetch at every clock cycle yet the decision about whether the branch should be followed does not occur until the MEM stage. • While forwarding and stalls can deal with all data dependencies, control hazards are much harder to deal with, as we will soon see.

  48. T i m e ( i n c l o c k c y c l e s ) P r o g r a m e x e c u t i o n C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 o r d e r ( i n i n s t r u c t i o n s ) 4 0 b e q $ 1 , $ 3 , 7 I M R e g D M R e g 4 4 a n d $ 1 2 , $ 2 , $ 5 I M R e g D M R e g 4 8 o r $ 1 3 , $ 6 , $ 2 I M R e g D M R e g 5 2 a d d $ 1 4 , $ 2 , $ 2 I M R e g D M R e g 7 2 l w $ 4 , 5 0 ( $ 7 ) R e g D M R e g I M The Control Hazard • If we do not deal with this problem, the three instructions after the branch will complete and then the target instruction will run. • This is incorrect behavior!

  49. Solving This Problem • One solution would be to stall the pipeline when a branch instruction is encountered, forcing the datapath to wait until the branch hits the MEM stage. • This would introduce a large delay (3 cycles) for each branch. • Another idea is to assume that the branch will not be taken and continue after it. If the branch is taken, the incorrect instructions are flushed from the pipeline (all of the propagated control signals are set to zero for the IF, ID, and EX stage instructions).

  50. How Much does this Help? • If a branch is taken one half of the time, the “assume branch not taken” behavior will save ½ of the delay that would be present if we stalled for each branch. • This is still a length delay. On average, this design would introduce one and one half clock cycles of delay for each branch. • We can do better than this.