1 / 20

CSE 420/598 Computer Architecture Lec 10 – Chapter 2 - DynPred-BTB

CSE 420/598 Computer Architecture Lec 10 – Chapter 2 - DynPred-BTB. Sandeep K. S. Gupta School of Computing and Informatics Arizona State University. Based on Slides by David Patterson. Agenda. Dynamic Branch Prediction (Review) BTB. Applying the Prediction.

audrey-tate
Télécharger la présentation

CSE 420/598 Computer Architecture Lec 10 – Chapter 2 - DynPred-BTB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 420/598 Computer Architecture Lec 10 – Chapter 2 - DynPred-BTB Sandeep K. S. Gupta School of Computing and Informatics Arizona State University Based on Slides by David Patterson

  2. Agenda • Dynamic Branch Prediction (Review) • BTB CSE420/598

  3. Applying the Prediction • The earliest time we can begin using the prediction is when • the prediction bits are available • the branch target is available • The earliest time we can know whether we have predicted correctly is when • the branch condition is resolved • The difference between these times is roughly what is saved by a correct prediction • If the branch target is available late, the window of savings is reduced CSE420/598

  4. Correlating Predictors • The prediction is a function of the last k branch outcomes • The branch history buffer is indexed by • m bits taken from address of branch • k bits of branch history • i.e., m + k bits all told • Each entry in the branch history buffer has q bits (i.e., is a q-bit predictor) • The branch history buffer has 2m+k q bits of storage CSE420/598

  5. Correlating predictor with2 history bits and 2 state bits (2,2) CSE420/598

  6. Local versus Global CSE420/598

  7. Hashing Correlation For the same amount of table storage, we can get better associativity in the case of fewer branches but highly correlated behavior. CSE420/598

  8. Tournament Predictor • Move “toward” the other predictor when • I am wrong • He is right • Stay put when I am right and he is right, or I am wrong and he is wrong. CSE420/598

  9. Tournament predictor local vs global CSE420/598

  10. Alpha 21264 Branch Predictor • Tournament predictor (4K x 2) chooses between global and local • Global has 4K 2-bit entries indexed by last 12 branch outcomes XORed with address • Local is also a two-level predictor • 1K x 10 branch history buffer (last 10 outcomes for indexed branch) indexed by address • The selected 10-bit history is XORed with address to index a table of 3-bit entries CSE420/598

  11. Alpha 21264 Predictor CSE420/598

  12. Branch Target Buffers (BTB) or Caches (BTC) Branch target calculation is costly and stalls the instruction fetch. To reduce the branch penalty need to know what the address is by the end of IF but the instruction isn’t even decoded yet so we have to wait a cycle and perhaps get a branch (penalty = 1 for MIPS) so use the branch instruction address to predict the branch target if prediction works then penalty goes to 0!

  13. BTB - Idea • BTB stores PCs the same way as caches • Only PCs of predicted taken branches are stored (no need to store untaken) • The match tag is the PC (associative memory OK if it’s small) • The datafield is the predicted PC • The PC of a (potential) branch is sent to the BTB • When a match is found the corresponding Predicted PC is returned • If PC not in table, it is taken to mean • either not a branch • or not predicted taken • in either case, continue fetching from PC + k (k =4 for MIPS) • If the branch was predicted taken, instruction fetch continues at the returned predicted PC • BTB gets us the branch target address early CSE420/598

  14. Branch Target Buffers

  15. Changes in MIPS to incorporate BTB CSE420/598

  16. Penalties Using BTB in MIPS • Note • Penalties for mis-prediction more complex machines are much higher CSE420/598

  17. Questions Concerning BTBs • Can BTB be combined with branch prediction machinery introduced earlier in this lecture? How? • What kind of branches can a BTB accelerate that are out of the reach of ordinary branch predictors? CSE420/598

  18. BTB coupled with BHT CSE420/598

  19. Improvements • Store instructions rather than target address • increases entry size but removes Ifetch time • permits BTB to run slower and therefore be larger • permits branch folding - branches effectively disappear • branch job is to change PC and get the real instruction • if you have the instruction then the branch isn’t there (folded out of the way) • result is 0-cycle jumps and effectively 0-cycle properly predicted branches • however - branches must be checked • in a parallel path the branch must be fetched and checked to see if the prediction is true • Predicting indirect jumps • major source is procedure return • obvious model is to use a stack as the return predictor • note this can be combined with the above to get jump folding CSE420/598

  20. Dynamic Branch Prediction Summary • Prediction becoming important part of execution • Branch History Table: 2 bits for loop accuracy • Correlation: Recently executed branches correlated with next branch • Either different branches (GA) • Or different executions of same branches (PA) • Tournament predictors take insight to next level, by using multiple predictors • usually one based on global information and one based on local information, and combining them with a selector • In 2006, tournament predictors using  30K bits are in processors like the Power5 and Pentium 4 • Branch Target Buffer: include branch address & prediction • Next Class: Dynamic Scheduling CSE420/598

More Related