html5-img
1 / 9

Dynamic Branch Prediction (Sec 4.3)

Dynamic Branch Prediction (Sec 4.3). Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction schemes Here, we talk about using hardware to dynamically predict branch outcome. The effectiveness of a branch prediction scheme depends on

jam
Télécharger la présentation

Dynamic Branch Prediction (Sec 4.3)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Branch Prediction (Sec 4.3) • Control dependences become a limiting factor in exploiting ILP • So far, we’ve discussed only static branch prediction schemes • Here, we talk about using hardware to dynamically predict branch outcome. • The effectiveness of a branch prediction scheme depends on • Its accuracy of prediction • Its cost when the prediction is correct and when it is incorrect.

  2. Branch Prediction Buffer • In its simplest form, a memory contains a bit, called prediction bit, saying whether the branch was recently taken or not • The memory is indexed by the lower portion of the address of the branch instruction • The fetching begins in the predicted direction • If the prediction is wrong, the prediction bit is inverted • The simple one-bit scheme has performance shortcomings (Example on page 263)

  3. Branch Prediction Buffer (Cont’d) • Two-bit prediction schemes track the previous two consecutive branches to change the prediction (Fig. 4.13) • An n-bit predictor can have an n-bit counter, and a branch prediction can depend on its value • The branch prediction buffer is accessed during the IF stage • If the instruction is decoded as branch, the next fetch is based on the prediction • See Figure 4.14 to see the prediction accuracy • Prediction accuracy becomes more important in programs with high branch frequency • We may improve prediction accuracy if we also look at the recent behavior of other branches

  4. Branch Prediction Buffer (Cont’d) • Consider the following code fragment: If (aa = = 2)aa = 0; If (bb = = 2)bb =0; If (aa ! = bb) { • DLX code for the above is SUBI R3, R1, #2 BNEZ R3, L1 ;branch b1 (aa !=2) ADD R1, R0, R0 ;aa = = 0 L1: SUBI R3, R2, #2 BNEZ R3, L2 ;branch b2 (bb!=2) ADD R2, R0, R0 ;bb= = 0 L2: SUB R3, R1, R2 ; R3= aa - bb BEQZ R3, L3 ;branch b3 (aa = = bb) • b3 behavior is correlated with the behavior of b1 & b2

  5. Correlating Branch Predictors • Consider the code: If (d = = 0) d = 1; If(d = = 1) • The instruction sequence generated as follows: BNEZ R1, L1 ;b1 (d != 0) ADDI R1, R0, #1 ;d = = 0 so d = 1 L1: SUBI R3, R1, #1 BNEZ R3, L2 ;branch b2 (d != 1) L2: • See Figures 4.26, 4.17, 4.18 and 4.19

  6. Correlating Branch Predictors (cont’d.) • (m, n) predictor (Figure 4.20) • Uses the behavior of last ‘m’ branches (global history) • N-bit predictor for a branch • 2m branch predictors to choose from • Global history can be recorded as an n-bit shift register • Concatenate low order bits prove the branch address with m-bit global history (see figure 4.20)

  7. Branch Target Buffers • A branch target buffer stores the predicted address for the next instruction • The intent is to know the branch target address at the end of the IF stage (see Fig. 4.22) • We access the buffer during the IF stage • If we get a bit, we fetch the next instruction for the predicted PC value • If there is no match, proceed normally • A branch predictor field can also be added for extra prediction • See Fig. 4.23, Fig 4.24, Do example on page 274

  8. Multiple–Issue Processors • So for, we tried to achieve the ideal CPI of 1 • How can we improve performance further, to achieve CPI < 1? • Multiple-issue processors are used to improve performance further • Superscalar processor: • Issue varying numbers of instructions per clock • Could be statically scheduled (Sun Ultra SPARC II/III) • Or dynamically scheduled (Pentium III/4, MIPSR 10k) • VLIW (Very Large Instruction World) processors • Fixed number of instructions per clock • Statically scheduled by the compiler (Trimedia, 1860, Itanium)

  9. Superscalar Processors • A superscalar processor has dynamic issue capability • The hardware may issue from one to eight instruction in a clock cycle • Usually the instructions are independent and/or follow certain constraints, such as memory access, etc. • If there is a dependency or structural hazard in an instruction, only the preceding instructions are issued

More Related