Pipeline Hazards

Pipeline Hazards • Pipeline hazards • These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. • This leads to an interrupt of the synchronous execution in the pipeline and thus to a performance decrease. • Solution: suspend the execution of the instruction (pipeline stall) • If an instruction is suspended in a certain stage of the pipeline, all subsequent instructions are also stopped. • The pipeline logic inserts NOP operations into the next pipeline stage. • The processing of all earlier instructions is continued.

Resource Hazards • Structural hazards • Result from two instructions that are processed in different stages which require the same resource. • Not all of the components can be replicated to make sure that this never happens. • Examples • Parallel writes to the register file, e.g., if arithmetic operations can write directly and load in the memory access phase. • Parallel access to memory in IF and MA • Subsequent instructions need the FP division hardware that is not implemented as a pipeline.

Data Hazards and Control Hazards • Data hazards • Instruction access the same data as earlier instructions and these are not yet finished, e.g., an operand computed by a previous instruction is not yet available. • Data hazards result from data dependences between the instructions. • Branch (control) hazards • The next instruction cannot be fetched due to a jump in the control flow.

Resolving Pipeline Hazards • Simple solution is to stop the pipeline • Insertion of NOPs or Pipeline Bubbles. • This reduces the pipeline throughput. • Many techniques in hardware and software have been developed to reduce the effect of hazards on the performance.

Pipeline Hazards and Data Dependences • Data dependences occur between statements in the program. • Example add R1,R2,R3 sub R4,R5,R6 and R6,R1,R8 xor R9,R1,R11

Data Dependence • An instruction j is data dependent on instruction i if • There is a path from i to j • and where • I(i) = set or read data • O(i)=set of written data

True dependence • True or flow dependence: first write then read • Example LOOP: load F0,0(R1) add F4,F0,F2

Anti Dependence • Anti dependence (first read then write) • Instruction i reads an operand from a register or memory which is overwritten by a later instruction. ADD R2,R3,R4 XOR R3,R5,R6

Output Dependence • Output dependence (both write) • Instruction i and j write the same register or memory address: ADD R2,R3,R4 XOR R2,R5,R6 • Anti and output dependences are called name dependences.

Dependences and Pipeline Hazards • Data dependences are properties of the program. • It depends on the pipeline organization and the temporal execution of instructions whether data dependences lead to pipeline hazards or not. • Data dependences • may induce hazards. Thus, they point out the possibility. • They determine the execution order of instructions. • Independent instructions can be reordered and even executed in parallel. • They determine the maximum degree of parallelism.

Data Hazards IF IF IF IF ID ID ID ID EX EX EX EX MA MA MA MA WB WB WB WB Zeit ti ti+1 ti+2 ti+3 ti+4 • Data hazards can occur if data dependent instructions are executed only with a short delay in the pipeline. • Thus their accesses can overlap in the pipeline. • Example: True dependence load R1, A load R2, B add R2,R1,R2 mul R1,R2,R1

Data Hazards add R2,R1,R2 IF IF ID ID EX EX MA MA WB WB R2 alt R2 neu Read wrong value mul R1,R2,R1 Zeit ti ti+1 ti+2 ti+3 ti+4 • Example: True dependence

Data Hazards Classification • Read-after-write (RAW) • Happens if instruction j reads a source register before instruction i wrote its result. • Implied by a true dependence. • Write-after-Read (WAR) • Happens if instruction j writes the target register before instruction i reads the operand. • Implied by an anti dependence • Write-after-Write (WAW) • Happens if instruction j writes its target register before instruction i wrote its result to the same register. • Implied by an output dependence. • Can happen in pipelines where multiple stages can write or an instruction can proceed without waiting for a stalled previous instruction. insti … instj

Handling Hazards • Software solutions (static solutions) • Implemented by the compiler • Insertion of NOPs • Detection of potential data hazards • Insertion of NOPs after instructions that might lead to hazards. • Reordering of instructions • Instruction scheduling phase of the compiler • Reorders instructions so that independent instructions are executed between dependent instructions.

Handling Hazards • Hardware solutions (Dynamic Solutions) • Detection of conflicts • Requires an appropriate hardware logic • Handling • Interlocking, Stalling • Forwarding • Forwarding with interlocking

Handling Hazards in the Hardware IF ID EX MA WB • Pipeline Interlocking • Detection of hazards. • Stops instruction j and all subsequent instructions for multiple cycles. add R2,R1,R2 R2 mul R1,R2,R1 IF ID stall stall EX MA WB Zeit ti ti+1 ti+2 ti+3 ti+4

Handling Hazards in the Hardware IF IF ID ID EX EX MA MA WB WB • Forwarding • Direct forward of ALU results to the ALU input. • Eliminates stall cycles. • Requires additional hardware (forwarding logic) add R2,R1,R2 mul R1,R2,R1 Zeit ti ti+1 ti+2 ti+3 ti+4

Forwarding and Interlocking IF IF IF ID ID ID EX EX EX MA MA MA WB WB WB load R2,A add R1,R2,R1 Zeit ti ti+1 ti+2 ti+3 ti+4 • Not all hazards can be handled by forwarding • Example: true dependence with load operation load R2,A Solution: Forwarding + Interlocking add R1,R2,R1 IF ID stall EX MA WB Zeit ti ti+1 ti+2 ti+3 ti+4

MIPS-Pipeline Hinweis: Skript Wismüller

Branch Hazards • Computation of the target and condition is done in the EX phase and it replaces PC in the MA phase. • Condition typically depends on the EXE phase of the previous instruction requiring forwarding. • Thus, only after three cycles the correct instruction can be loaded.

Branch Hazards IF ID EX MA WB IF ID EX MA WB PC stall stall stall Zeit Stall cycles JUMP Target

Branch Hazards • Condition and target should be computed already in ID • Structural Hazard: • ALU can not be used for the computation of the target. Additional ALU is thus required in ID. • Data dependence with previous arithmetic instruction • RAW Hazard • Critical path in ID phase is prolongated • Decoding, computation of branch target, and updating PC for critical path.

Resolving Branch Hazards add R1,R2,R3 br addr nop ... br addr add R1,R2,R3... ... ... • Insertion of independent instructions • Instruction scheduling of compiler • Fill the stall cycle with an indepent instruction (Delay Slot)

Branch Prediction • Prediction of branch decision when a jump is encountered. • Speculative execution of instructions dependent on the predicted outcome. • After the condition was computed • Either continue without delay since the prediction was correct • or delete the started instructions and fetch the correct ones. • Two classes • Static branch prediction by hardware or compiler • Dynamic branch prediciton by the hardware

Static Branch Prediction • Hardware • Static prediction in processor, backward jumps are predicted to be always taken. • Compiler • Specification via a bit in the jump opcode • Prediction can be guided by program analysis or profiling (feedback directed compilation)

Dynamic Branch Prediction • Properties • Based on dynamic behavior of the application • The history of a jump is taken into account. • Leads to more precise predictions • Expensive in terms of hardware • Branch Prediction Buffer • Cache for information about conditional jumps • Requires that the target can be computed fast

Branch Prediction Buffer • Cache Organization (Instruction address >> 2) % 1024 Address-Tag 0 Address-Tag 0 Address-Tag 1 Address-Tag 1 inval 0 1024 entries inval 0 Address-Tag 1 inval 0 inval 1 Address-Tag 1 20 Bit

Single Bit Predictor • Single prediction bit • If the Bit is set, the brunch is predicted to be taken. • If the prediction is wrong the bit is inverted. T NT Predict Not Predict Taken Taken T NT

Single Bit vs Double Bit Predictors • Single Bit Predictor is suboptimal for nested loops • Wrong prediction in the first iteration of inner loop.

Two Bit Predictor • Two bits allow to have four states • strongly taken • weakly taken • weakly not taken • strongly not taken • Requires two mispredictions to switch prediction.

Two Bit Predictor T (11) Predict taken (10) Predict taken T NT T NT (00) Predict not taken (01) Predict not taken NT NT T weakly taken weakly not taken

Two Bit Predictor (11) Predict taken T (10) Predict taken T NT T NT (00) Predict not taken (01) Predict not taken NT NT T

Two-Bit Predictor with Saturation Scheme • Count the taken jumps • If sum >= 2, predict taken jump • Extensible to n Bit • Experiments showed that there is no big impact. (11) Predict taken T T NT (10) Predict taken T NT (01) Predict not taken T NT (00) Predict not taken NT

Size of Prediction Buffer – SPEC 89 % Misspredictions

Correlation Predictors • Prediction is also based on the history of other jumps. • Simple two bit predictor is not sufficient to predict third branch. • Taking into account the preceding jumps, enables a correct prediction. If (aa==2) aa=0; If (bb==2) bb=0; If (aa!=bb){ … }

(m,n)-Predictors • (m,n)-Predictors: • Uses the history of the last m jumps to select one of 2m n-bit predictors. • Branch History Register (BHR) • m-Bit shift register • Store the global history of the last m jumps. Bits determine whether the jump was taken. • After each jump the outcome is shifted into the BHR • The BHR gives the index in the Pattern History Table (PHT)

(m,n) Predictors Pattern History Tables PHTs (2-Bit Predictors) Jump address 2-Bit Predictor 1 1 Branch History Register (BHR) 2 Bit Schieberegister) 1 0 • Example: (2,2) Predictor:

Brunch Target Buffer • Branch Target Address Cache, Branch Target Buffer • Required, if the computation of the target address is late in the pipeline. • Stores the jump address and the target address • Can be used in the IF phase. • Can be combined with a predictor. Adress of jump instruction Targetaddress Prediction bits

Branch Target Buffer (BTB) Cycle i Send PC to memory and BTB • Prediction in IF No Yes Found? Cycle i+1 Fetch next instruction No Yes Fetch instr. at target No Yes Branch& Taken? No Taken? Yes Cycle i+2 Normal instructionexecution Branch corretlypredicted; Continue executionwith no stalls Update BTBkill instructionsupdate PC Mispredicted branchkill fetched instructionsupdate PCdelete entry from BTB

Pipeline Hazards