1 / 25

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor. Sizhuo Zhang 6.175 TA. 1-bit Distributed Epochs. Delay: 0 cycle. eEp =0. feEp =0. Delay: 100 cycles. dEp =0. fdEp =0. ieEp =0. I1. d2e. Execute. f2d. PC. Decode. Fetch. Decode redirects I1 ( ieEp = idEp =0).

fcody
Télécharger la présentation

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constructive Computer ArchitectureTutorial 5Epoch & Branch Predictor Sizhuo Zhang 6.175 TA http://csg.csail.mit.edu/6.175

  2. 1-bit Distributed Epochs Delay: 0 cycle eEp=0 feEp=0 Delay: 100 cycles dEp=0 fdEp=0 ieEp=0 I1 d2e Execute ... f2d PC Decode Fetch http://csg.csail.mit.edu/6.175 • Decode redirects I1 (ieEp=idEp=0)

  3. 1-bit Distributed Epochs Delay: 0 cycle eEp=0 feEp=0 Delay: 100 cycles dEp=1 fdEp=0 I1 redirect, ieEp=0 deEp=0 I1 d2e Execute ... f2d PC Decode Fetch • Decode redirects I1 (ieEp=idEp=0) http://csg.csail.mit.edu/6.175

  4. 1-bit Distributed Epochs Delay: 0 cycle eEp=0 feEp=0 Delay: 100 cycles dEp=1 fdEp=0 I1 redirect, ieEp=0 deEp=0 I1 d2e Execute ... f2d PC Decode Fetch Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 http://csg.csail.mit.edu/6.175

  5. 1-bit Distributed Epochs Delay: 0 cycle eEp=1 feEp=1 Delay: 100 cycles dEp=1 fdEp=0 I1 redirect, ieEp=0 deEp=0 I1 d2e Execute ... f2d PC Decode Fetch • Decode redirects I1 (ieEp=idEp=0) • Execute redirects I1 http://csg.csail.mit.edu/6.175

  6. 1-bit Distributed Epochs Delay: 0 cycle eEp=1 feEp=1 Delay: 100 cycles dEp=1 fdEp=0 I1 redirect, ieEp=0 deEp=0 I2 d2e Execute ... f2d PC Decode Fetch Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues http://csg.csail.mit.edu/6.175

  7. 1-bit Distributed Epochs Delay: 0 cycle eEp=1 feEp=1 Delay: 100 cycles dEp=0 fdEp=0 I1 redirect, ieEp=0 deEp=1 I2 d2e Execute ... f2d PC Decode Fetch Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues http://csg.csail.mit.edu/6.175

  8. 1-bit Distributed Epochs Delay: 0 cycle eEp=1 feEp=1 Delay: 100 cycles dEp=0 fdEp=0 I1 redirect, ieEp=0 deEp=1 I2 d2e Execute ... f2d PC Decode Fetch Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 http://csg.csail.mit.edu/6.175

  9. 1-bit Distributed Epochs Delay: 0 cycle eEp=0 feEp=0 Delay: 100 cycles dEp=0 fdEp=0 I1 redirect, ieEp=0 deEp=1 I2 d2e Execute ... f2d PC Decode Fetch Decode redirects I1 (ieEp=idEp=0) Execute redirects I1 Correct-path I2 (ieEp=1,idEp=0) issues Execute redirects I2 http://csg.csail.mit.edu/6.175

  10. 1-bit Distributed Epochs Delay: 0 cycle eEp=0 feEp=0 Delay: 100 cycles dEp=0 fdEp=0 I1 redirect, ieEp=0 deEp=1 I2 d2e Execute ... f2d PC Decode Fetch • Decode redirects I1 (ieEp=idEp=0) • Execute redirects I1 • Correct-path I2 (ieEp=1,idEp=0) issues • Execute redirects I2 • I1 redirect arrives at Fetch (ieEp == feEp) • change PC to a wrong value http://csg.csail.mit.edu/6.175

  11. Unbounded Global Epochs redirect PC eEp Redirect dEp redirect PC miss pred? miss pred? PC d2e f2d ... Decode Fetch Execute http://csg.csail.mit.edu/6.175 • Both Decode and Execute can redirect the PC • Execute redirect should never be overruled • Global epoch for each redirecting stage • eEpoch: incremented when redirect from Execute takes effect • dEpoch: incremented when redirect from Decode takes effect • Initially set all epochs to 0

  12. Execute stage {pc, newpc} eEp Redirect dEp {pc, newpc} miss pred? miss pred? {…, ieEp, idEp} {pc, ppc, ieEp} ... d2e PC f2d Fetch Execute Decode Is ieEp == eEp? yes no Current instruction is OK; check the ppcprediction by execution, signal a redirect message (by writing EHR) on misprediction Wrong path instruction; poison it http://csg.csail.mit.edu/6.175

  13. Decode Stage {pc, newpc} eEp Redirect dEp {pc, newpc} miss pred? miss pred? {pc, ppc, ieEp, idEp} {..., ieEp} d2e f2d PC Fetch Decode Execute ... Is ieEp == eEp&& idEp == dEp? yes no Current instruction is OK; check the ppc prediction via BHT, signal a redirect message (by writing EHR) on misprediction Wrong path instruction; drop it http://csg.csail.mit.edu/6.175

  14. Fetch stage: Redirect PC, Change Epoch eEp Redirect dEp {pc, newpc} miss pred? miss pred? ... d2e f2d PC Execute Decode Fetch http://csg.csail.mit.edu/6.175 • If there is redirect message from Execute • Set PC to correct value, increment eEp • Else if there is redirect message from Decode • Set PC to correct value, increment dEp • We can always request IMem to fetch new instruction • New instruction will be tagged with current eEpoch and dEpoch • PC redirection overrides next-PC prediction • When PC redirects, new instruction must be killed at Decode stage

  15. Observation eEpoch ... ... ... Execute Fetch Decode dEpoch • At cycle , instructions between Fetch and Execute: • at Fetch, at Execute, at Decode • Invariant: • Proved by induction on • At cycle , both Decode and Execute redirect • , invariants still hold at cycle http://csg.csail.mit.edu/6.175

  16. 1-bit Global Epochs • The max difference of values of one type of epoch is 1 • Only need 1 bit to encode each epoch • Increment epoch  flip epoch http://csg.csail.mit.edu/6.175

  17. 4-stage Pipeline: Code Sketch modulemkProc(Proc); // stage 1: Fetch // stage 2: Decode & read register // stage 3: Execute & access memory // stage 4: Commit (write register file) Ehr#(2, Addr) pc <- mkEhr(?); IMemoryiMem <- mkIMemory; ... Fifo#(2, Fetch2Decode) f2d <- mkCFFifo; Fifo#(2, Decode2Execute) d2e <- mkCFFifo; Fifo#(2, Execute2Commit) e2c <- mkCFFifo; Reg#(Bool) eEpoch<- mkReg(False); Reg#(Bool) dEpoch<- mkReg(False); Ehr#(2, Maybe#(ExeRedirect)) exeRedirect<- mkEhr(Invalid); Ehr#(2, Maybe#(DecRedirect)) decRedirect<- mkEhr(Invalid); BTB#(16) btb <- mkBTB; BHT#(1024) bht <- mkBHT; http://csg.csail.mit.edu/6.175

  18. Execute Stage BHT training will be in lab http://csg.csail.mit.edu/6.175 ruledoExecute; d2e.deq; let x = d2e.first; if(x.ieEp== eEpoch) begin leteInst = exec(...);if(eInst.mispredict) exeRedirect[0] <= Valid ({pc: x.pc, newpc: eInst.addr}); ... end else e2c.enq(Exec2Commit{poisoned: True, ...}); // wrong path instruction, poison itendrule

  19. Decode Stage http://csg.csail.mit.edu/6.175 ruledoDecode; f2d.deq; let x = f2d.first; if(x.ieEp == eEpoch && x.idEp == dEpoch) begin let stall = ...; // check scoreboard for stall if(!stall) begin if(x.iType== Br) begin letbht_ppc= ...; // BHT prediction if(bht_ppc!= x.ppc) decRedirect[0] <= Valid ({pc: x.pc, newpc: bht_ppc}); end ... d2e.enq(Decode2Execute{ieEp: x.ieEp, ...}); end end // else: wrong path instruction, killed endrule

  20. Fetch Stage http://csg.csail.mit.edu/6.175 ruledoFetch; letinst = iMem.req(pc[0]); letppc = btb.predPc(pc[0]); pc[0] <= ppc; f2d.enq(Fetch2Decode{pc: pc[0], ppc: ppc, inst: inst, ieEp: eEpoch, idEp: dEpoch}); endrule rule cononicalizeRedirect; if(exeRedirect[1] matchestagged Valid .r) begin pc[1] <= r.newpc; btb.update(r.pc, r.newpc); eEpoch <= !eEpoch; end else if(decRedirect[1] matchestagged Valid .r) begin pc[1] <= r.newpc; btb.update(r.pc, r.newpc); dEpoch<= !dEpoch; end exeRedirect[1] <= Invalid; decRedirect[1] <= Invalid; endrule

  21. Advanced Branch Predictor • BHT cannot predict very accurately • Need more sophisticated schemes • Global branch history • taken/not token for previous branches • Local branch history • taken/not token for previous occurrences of the same branch • Tournament branch predictor • Use both global and local history • Alpha 21264 http://csg.csail.mit.edu/6.175

  22. Tournament Predictor • “The Alpha 21264 Microprocessor Architecture” • 10-bit PC: index 1024 x 10-bit local history table • 10-bit local history • Index 1024 x 3-bit BHT: prediction 1 • 12-bit global history • Index 4096 x 2-bit BHT: prediction 2 • Index 4096 x 2-bit BHT: select between predictions 1, 2 http://csg.csail.mit.edu/6.175

  23. TAGE Branch Predictors “A case for partially tagged geometric history length branch prediction” TAGE (Tagged Geometric history length) http://csg.csail.mit.edu/6.175

  24. Other Predictors • Perceptron-based predictors • Classification problem in machine learning • Championship Branch Predictor • Papers • Slides • simulation codes http://csg.csail.mit.edu/6.175

  25. Return Address Stack • Use a stack to store return address from function call • Push when function is called • Pop when returning from a function • Instruction to return from function call • JALR: rd = x0, rs1 = x1 (ra) • Instruction to initiate function call • JAL: rd = x1 • JALR: rd = x1 http://csg.csail.mit.edu/6.175

More Related