1 / 20

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank. CDA 5155 Summer 2003 Module #24 Speculation. Speculation. What’s Speculation?. Unconditional early execution of an instruction that is expected to be needed (based on predicted branch outcome), but that may not be. What makes this difficult?

saber
Télécharger la présentation

Computer Architecture Principles Dr. Mike Frank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture PrinciplesDr. Mike Frank CDA 5155Summer 2003 Module #24Speculation

  2. Speculation

  3. What’s Speculation? • Unconditional early execution of an instruction that is expected to be needed (based on predicted branch outcome), but that may not be. • What makes this difficult? • Instruction may raise fatal (non-resumable) exceptions that shouldn’t have been raised. • Instruction may have side effects that affect the data-flow of later instructions that shouldn’t be affected. • The conservative approach (never speculate under these conditions) is overly constraining.

  4. A Simple Speculation Example • C source: if (A==0) A=B; else A=A+4; • A in 0(R3), B in 0(R2), R14 available • Original assembly:With speculative load: LD R1,0(R3) LD R1,0(R3) BNEZ R1,L1 LD R14,0(R2) LD R1,0(R2) BEQZ R1,L3 J L2 DADDI R14,R1,#4 L1: DADDI R1,R1,#4 L3: SD R14,0(R3) L2: SD R1,0(R3) Note that this simple transformationdoes not preserve exception behavior! Note that then clause is now effectively unconditional. (Equivalent C code:T=B; if (A!=0) then T=A+4; A=T;). Note use of extra register R14.

  5. Ambitious Speculation Methods Here are some alternatives: • Hardware and OS cooperatively ignore (or delay) exceptions for speculative instructions. • Poison bits mark register values written by speculative instructions that generated exceptions. • Results of speculative instructions are buffered (not committed) until the speculative branch prediction is confirmed. (Sentinel method.)

  6. HW/SW-cooperation Method • A way of coping with non-resumable exceptions in speculative instructions. • Basic strategy: Simply ignore fatal errors in any speculative instructions. • Correct programs will never generate such errors anyway, so, no problem (no “false positives”). • But, incorrect programs may silently go haywire! • Treated as an unavoidable cost of optimization • a kind of imprecise exception handling • In this case, if a program misbehaves in testing, one could always recompile it with strict exception handling (& no speculation) to track down the error.

  7. Example: Speculative Load Inst. • Previous example, with special “Speculative Load” (sLD) instruction: LD R1,0(R3) LD R1,0(R3) sLD R14,0(R2) sLD R14,0(R2) BEQZ R1,L3 BNEZ R1,L1 DADDI R14,R1,#4 SPECCK 0(R2) L3: SD R14,0(R3) J L2 L1: DADDI R14,R1,#4 L2: SD R14,0(R3) This version does not preserveexception behavior, but at leastavoids false positives. Using a separate “speculation check”(SPECCK) instruction to restore correct exception behavior.

  8. Poison Bits • Speculative instructions are marked as such. • Like the “sLD” instruction we saw earlier. • Each ISA register has an associated “poison bit.” • When a speculative inst. generates a fatal exception, then, instead of invoking exception handling, the destination register is marked as “poison.” • Poison is propagated through data dependencies of subsequent speculative instructions. • If a non-speculative instruction ever uses a poisoned register, then that instruction generates a fatal exception which halts the program. • All fatal exceptions do eventually occur, but maybe a bit late vs. normally. (Still pretty easy to debug, tho.)

  9. Poison Bit Example • C src:if (A==0) A=B+8 else A=A+4; LD R1,0(R3) ;Ld A non-speculatively sLD R12,0(R2) ;0(R2)ex.may poison R12 sDADDI R14,R12,#8 ;R14 inherits poison BEQZ R1,L3 ;skip next line if A=0 DADDI R14,R1,#4 ;clears R14 poison bit L3: SD R14,0(R3) ;exception happens here • Note if accessing B causes an exception, it still happens (but late) only if “then” clause runs. R12 R14 Poison bits:

  10. Speculative Insts. w. renaming • Problem: What to do about data-flow when a speculative inst. writes a register that’s later used non-speculatively? • Ordinary solution: Compiler does register renaming, writes speculative results to different (separately allocated) registers. (See sLD example) • Problem: Have to move values between normal & speculative registers, and can run out of registers! • Alternative solution: (“Boosting”) Let the HW do the renaming & buffering of speculative results • Like in Tomasulo’s algorithm.

  11. Sentinel Method • Special “sentinel” instruction marks original location of an instruction moved speculatively. • Write-back (& exception handling) of the speculative instruction is delayed until the corresponding sentinel is reached. • Note writeback never occurs if sentinel not reached! LD BEQ BEQLD sentinel

  12. Hardware-Based Speculation • Combines 3 ideas: • Dynamic branch prediction chooses which instructions will be pre-executed. • Speculation executes conditional instructions early (before branch conditions are resolved). • Dynamic scheduling handles scheduling of different dynamic sequences of basic blocks encountered. • Dataflow execution: Execute instructions as soon as their operands are available. • Like with Tomasulo’s algorithm

  13. Advantages of HW-based spec. • Dynamic speculation can disambiguate memory references, so a store can be moved before a load (if the locations addressed are different). • Speculation works better if more accurate dynamic branch predictions can be used. • Precise exception handling even for speculated instructions. • No extra bookkeeping code (speculation bits, register renaming code) in the program. • Code independent of implementation

  14. Implementing HW-based spec. • Separate the execution of speculative instructions (including dataflow between them) from the committing of results permanently to registers/memory (if speculations are correct). • New structure called the reorder buffer holds results of instructions that have executed speculatively but cannot yet be committed. • The reorder buffer represents non-programmer-visible temporary storage, like the reservation stations in Tomasulo’s algorithm.

  15. Fields of Reorder Buffer Entries • Instruction type field: • “Branch” (no dest.) • “Store” (dest.=memory) • “Register” (dest.=register). • Destination field: • Register number (for loads & ALU ops) • Memory address (for stores) • Value field: • Register or memory value to be stored permanently when instruction commits. • Ready field: Instruction has completed

  16. Steps of Execution in HWBS • Issue (or dispatch): • Get next fetched instruction. • Issue if reservation station & reorder buffer not full. • Check ROB & registers for available operands • Execute: • Monitor CDB for operands until ready, then execute • Write result: • Write to CDB, reorder buffer, & reservation stations • Commit: • When instruction is first in reorder buffer (& wasn’t mispredicted), commit value to register/memory. • Committing mispredicted branch flushes reorder buffer.

  17. HWBS implementation sketch

  18. HWBS execution example (3rd ed., p. 229) L.D F6,34(R2) IEWC L.D F2,45(R3) IEWC MUL.D F0,F2,F4 I EEEEEEEEEEWC SUB.D F8,F6,F2 IEW C DIV.D F10,F0,F6 I EEEEE…EWC ADD.D F6,F8,F2 IEW C Also go through figure 3.30 on p. 230… (40 cycles)

  19. HWBS loop example 1 L.D F0,0(R1) IEWC 1 MUL.D F4,F0,F2 I EE…EWC 1 S.D F4,0(R1) IE WC 1 DADDIU R1,R1,#-8 1 BNE R1,R2,Loop 2 L.D F0,0(R1) 2 MUL.D F4,F0,F2 2 S.D F4,0(R1) 2 DADDIU R1,R1,#-8 2 BNE R1,R2,Loop

  20. Explicit Register Renaming • An alternative to reorder buffers for HWBS: • Have more physical registers than architectural (programmer-visible) registers. • Dynamically map destination ISA register to unused physical register when instruction is issued. • Also track which mapping corresponds to last committed instruction, to support restarts. LastIssued LastCommitted Approach used in:PPC 603/604,MIPS R10000/12000,Alpha 21264,Pentium II/III/4 R1 ISARegisterMap PhysicalRegisters R2 … … … F31

More Related