1 / 30

Dynamic Branch Prediction

Dynamic Branch Prediction. Ali Azarpeyvand. Tomasulo Review. Reservations stations: renaming to larger set of registers + buffering source operands Prevents registers as bottleneck Avoids WAR, WAW hazards of Scoreboard Allows loop unrolling in HW

keisha
Télécharger la présentation

Dynamic Branch Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Branch Prediction Ali Azarpeyvand

  2. Tomasulo Review • Reservations stations: renaming to larger set of registers + buffering source operands • Prevents registers as bottleneck • Avoids WAR, WAW hazards of Scoreboard • Allows loop unrolling in HW • Not limited to basic blocks (integer units gets ahead, beyond branches) • Lasting Contributions • Dynamic scheduling • Register renaming • Load/store disambiguation • 360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264

  3. Outline • Dynamic Branch Prediction • Branch prediction buffer or branch history table • Correlating branch predictors • Tournament predictors • Branch target buffers • Integrated Instruction fetch unit • Return address predictors

  4. Dynamic Branch Prediction • Performance = ƒ(accuracy, cost of misprediction) • Branch History Table (branch-prediction buffer) is simplest • Lower bits of PC address index table of 1-bit values • Says whether or not branch taken last time • No address check • Problem: in a loop, 1-bit BHT will cause two mispredictions (example: 9 iterations before exit  80%): • Solution  2 bit

  5. Dynamic Branch Prediction • Solution: 2-bit scheme where change prediction only if get mispredictiontwice: • Dark: stop, not taken • Light: go, taken

  6. BHT Accuracy • Mispredict because either: • Wrong guess for that branch • Got branch history of wrong branch when index the table • 4096 entry table programs vary from 1% misprediction (nasa7, tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12%, • 4096 about as good as infinite table(in Alpha 21164), • Branch penalty and branch frequency are also important

  7. BHT Accuracy 4096 entry, two bit prediction

  8. Unlimited Entries

  9. Correlating Branches • Hypothesis: recent branches are correlated; that is, behavior of recently executed branches affects prediction of current branch • Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper branch history table • In general, (m,n) predictor means record last m branches to select between 2m history tables each with n-bit counters • Old 2-bit BHT is then a (0,2) predictor

  10. Examples Code from eqntottfrom SPEC92 b3 has correlation with b1, b2

  11. Branch Prediction Result 1 bit predictor, (d is 0 or 2)

  12. Correlating Prediction Performance One bit predictor with one bit correlation

  13. Correlating Branches (2,2) predictor • Then behavior of recent branches selects between, say, four predictions of next branch, updating just that prediction • Simple implementation: • global history can be stored in a shift register Branch address is concatenated withglobal branch history and then indexed.

  14. Number of Stored Bits • For an (m,n) predictor: • 2^m * n * Number of prediction entries • Example: • 2-bit predictor with 4096 entries: • 2^0 * 2 * 4k = 8k • (2,2) predictor, how many entries to be 8k: • 2^2 * 2 * x = 8k  x = 1k • Comparison in the next slide

  15. Accuracy of Different Schemes 18% 4096 Entries 2-bit BHT Unlimited Entries 2-bit BHT 1024 Entries (2,2) BHT Frequency of Mispredictions 0%

  16. PC Local Predictor Global Predictor Choice Predictor mux Global history NT/T Tournament Branch Predictor • Used in Alpha 21264: Track both “local” and global history • Intended for mixed types of applications • Global history: T/NT history of past k branches, e.g. 0 1 0 1 0 1 (NT T NT T NT T)

  17. Predictor Select

  18. Local Predictor Percentage

  19. Performance Comparison

  20. Global history12-bit Counters(4Kx2) NT/T 12 1 Counters(4Kx2) NT/T 0 1 0 1 0 1 0 1 0 1 0 1 local/global 1 Tournament Branch Predictor • Local predictor: use 10-bit local history, 3-bit counters • Global and choice predictors: PC Local historytable (1Kx10) Counters (1Kx3) NT/T 10 1

  21. Reducing Branch Stalls • In MIPS, branch predicted as taken • We need the target address  • High Performance Instruction Delivery • Branch target buffer • integrated instruction fetch unit • predicting return addresses

  22. Need Address at Same Time as Prediction • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken)

  23. Branch Target Buffer flow chart

  24. Example Prediction accuracy: 90% (for instructions in the buffer) Hit rate in the buffer: 90% (for branches predicted taken) Taken branch frequency: 60% Probability (branch in buffer, but actually not taken) = Percent buffer hit rate × Percent incorrect predictions=90% × 10%=0.09 Probability (branch not in buffer, but actually taken) = 10% × 60%=0.06 Branch penalty =(0.09 + 0.06)× 2 Branch penalty = 0.30

  25. Branch Folding • Idea: to store one or more target instructions • instead of, or in addition to, the predicted target address. • Advantages: • it allows the branch-target buffer access to take longer than the time between successive instruction fetches • allows us to perform an optimization called branch folding • Branch Folding: • zero-cycle unconditional branches, and sometimes zero-cycle conditional branches.

  26. Branch PC Predicted PC PC of instruction FETCH =? Extra prediction state bits Yes: instruction is branch and use predicted PC as next PC No: branch not predicted, proceed normally (Next PC = PC+4) Branch Target Buffer (summary) • Branch Target Buffer (BTB): Address of branch index to get prediction AND branch address (if taken) • Note: must check for branch match now, since can’t use wrong branch address • Example: BTB combined with BHT

  27. Return Addresses Prediction • Register indirect branch hard to predict address • If we use branch prediction buffer techniques in this situation doesn’t work: • Many callers, one callee • Jump to multiple return addresses from a single address (no PC-target correlation) • SPEC89 85% such branches for procedure return • Use stack discipline for procedures, save return address in small buffer that acts like a stack: 8 to 16 entries has small miss rate

  28. Accuracy of Return Address Predictor

  29. Short Seminar • Section 2.10 on Pentium 4, Branch prediction • Pentium 4 Tomasulo

  30. Dynamic Branch Prediction Summary • Prediction becoming important part of scalar execution. • Branch History Table: 2 bits for loop accuracy. • Correlation: Recently executed branches correlated with next branch. • Either different branches. • Or different executions of same branches. • Tournament Predictor: more resources to competitive solutions and pick between them. • Branch Target Buffer: include branch address & prediction. • Return address stack for prediction of indirect jump.

More Related