Intro to Pipelining
430 likes | 571 Vues
Intro to Pipelining. Chapter 6 P&H. Pipelining. Multiple instructions are overlapped in execution Key to making modern processors fast Example – laundry wash load of clothes in machine Dry load of washing in dryer Fold dry load of washing Flatmate puts clothes away. 1. 6. P. M. 7. 8.
Intro to Pipelining
E N D
Presentation Transcript
Intro to Pipelining Chapter 6 P&H
Pipelining • Multiple instructions are overlapped in execution • Key to making modern processors fast • Example – laundry • wash load of clothes in machine • Dry load of washing in dryer • Fold dry load of washing • Flatmate puts clothes away
1 6 P M 7 8 9 1 0 1 1 1 2 2 A M T i m e T a s k o r d e r A B C D Example – laundry
1 6 P M 7 8 9 1 0 1 1 1 2 2 A M T i m e T a s k o r d e r A B C D Example – pipelined laundry
Pipelined CPU • Can apply same principles to CPU Design • MIPS instructions classically take 5 steps: • Fetch instructions from memory • Decode instruction and read registers • Execute the operation or calculate address • Access operand in data memory • Write result to a register
P r o g r a m 2 4 6 8 1 0 1 2 1 4 1 6 1 8 e x e c u t i o n T i m e o r d e r ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a l w $ 1 , 1 0 0 ( $ 0 ) R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 2 , 2 0 0 ( $ 0 ) R e g A L U R e g 8 n s f e t c h a c c e s s I n s t r u c t i o n l w $ 3 , 3 0 0 ( $ 0 ) 8 n s f e t c h . . . 8 n s 1 4 2 4 6 8 1 0 1 2 T i m e I n s t r u c t i o n D a t a l w $ 1 , 1 0 0 ( $ 0 ) R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 2 , 2 0 0 ( $ 0 ) 2 n s R e g A L U R e g f e t c h a c c e s s l w $ 3 , 3 0 0 ( $ 0 ) Example MIPS CPU P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a 2 n s R e g A L U R e g f e t c h a c c e s s 2 n s 2 n s 2 n s 2 n s 2 n s
Designing Instruction sets for pipelining • All MIPS instructions are the same length • Only a few instruction formats • Source register fields in same place • Memory operands only appear in loads and stores • Can calculate address in execute stage • Operands must be aligned in memory
I F : I n s t r u c t i o n f e t c h I D : I n s t r u c t i o n d e c o d e / E X : E x e c u t e / M E M : M e m o r y a c c e s s W B : W r i t e b a c k r e g i s t e r f i l e r e a d a d d r e s s c a l c u l a t i o n 0 M u x 1 A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d r e g i s t e r 1 A d d r e s s P C R e a d d a t a 1 R e a d Z e r o r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U 0 R e a d W r i t e d a t a 2 r e s u l t A d d r e s s 1 d a t a r e g i s t e r M I n s t r u c t i o n M u D a t a u m e m o r y W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Single Cycle Datapath
Pipeline Reg naming conventions • Named by two stages separated by that register e.g. ID/EX • Work trough stages of executing a store instruction • Instruction Fetch • PC + 4 -> IF/ID • PC + 4 -> PC • Imem[PC] -> IF/ID
LW instruction • Instruction Decode • IF/ID supplies 16 bit Immed which is signed extended -> ID/EX • Two regs are read -> ID/EX • PC + 4 from IF/ID -> ID/EX • Execute or address calculation • Contents reg and signed extended immediate added -> Ex/Mem • Memory Access • Address in EX/Mem used to get data value from data mem -> Mem/WB • Write back • Mem/Wb data val -> reg file
LW instruction • Any information that might be needed in a later stage must be passed via the pipeline registers • Each resource can only be used in a single pipeline stage (otherwise have structural hazard) • Bug in previous data-path • Which instruction provides address for destination register in lw • Dest reg numb needs to be passed through pipeline registers
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 n R e a d o i t r e g i s t e r 1 A d d r e s s P C c R e a d u r d a t a 1 t R e a d s n Z e r o I r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 r e s u l t 1 d a t a r e g i s t e r M M D a t a u u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Updated Datapath
Pipelined Control • Start off by ignoring hazards
P C S r c 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d A d d 4 r e s u l t B r a n c h S h i f t R e g W r i t e l e f t 2 n R e a d M e m W r i t e o i r e g i s t e r 1 t P C A d d r e s s R e a d c u d a t a 1 r t R e a d A L U S r c s M e m t o R e g n Z e Z r e o r o r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 r e g i s t e r M d a t a M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 6 1 6 [ 1 5 – 0 ] 3 2 S i g n A L U e x t e n d M e m R e a d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 M A L U O p I n s t r u c t i o n u [ 1 5 – 1 1 ] x 1 R e g D s t Datapath showing control signals
P C S r c I D / E X 0 M W B u E X / M E M x 1 C o n t r o l M W B M E M / W B E X M W B I F / I D A d d A d d 4 A d d e r e s u l t t i r B r a n c h W S h i f t g e e l e f t 2 t i R r W A L U S r c m g n R e a d e e o i M R t r e g i s t e r 1 A d d r e s s P C c o R e a d t u r m d a t a 1 t s R e a d e n Z e r o M I r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Updated Datapath
Pipeline Hazards • Situations in pipelining where next instruction cannot execute in following clock cycle
Structural Hazards • Hardware cannot support combination of instructions we want to execute in same cycle • E.g. If used a washer/dyer combination or flatmate doing something else • E.g. MIPS Only a single memory available
Control Hazards • Need to make a decision based on the results of one instruction while it is still executing • Example – The branch instruction
Control Hazards • Can either: • Stall • Bubble(s) inserted into pipeline • Predict • Predict outcome of branch • Stall if prediction wrong • Delay branch • Execute instruction after branch regardless of whether branch is taken or not
Branch Prediction • Assume all branches will fail • Assume backwards branches always succeed while forward branches always fail • Use a dynamic branch predictor • Uses a table to record the outcomes of previous time branch instructions were executed • Gets about 90% accuracy • Longer pipelines amplify problem
P r o g r a m e x e c u t i o n 1 4 2 4 6 8 1 0 1 2 o r d e r T i m e ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a b e q $ 1 , $ 2 , 4 0 R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a a d d $ 4 , $ 5 , $ 6 R e g A L U R e g f e t c h a c c e s s 2 n s ( D e l a y e d b r a n c h s l o t ) I n s t r u c t i o n D a t a l w $ 3 , 3 0 0 ( $ 0 ) R e g A L U R e g f e t c h a c c e s s 2 n s 2 n s Delayed Branches • Used on MIPS R3000 processors
Data Hazards • Instruction depends on outcome of a previous instruction still in the pipeline • E.g.
P r o g r a m e x e c u t i o n 2 4 6 8 1 0 o r d e r T i m e ( i n i n s t r u c t i o n s ) a d d $ s 0 , $ t 0 , $ t 1 I F I D E X M E M W B s u b $ t 2 , $ s 0 , $ t 3 M E M I F I D E X M E M W B Forwarding
2 4 6 8 1 0 1 2 1 4 T i m e P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) I F I D l w $ s 0 , 2 0 ( $ t 1 ) E X M E M W B b u b b l e b u b b l e b u b b l e b u b b l e b u b b l e s u b $ t 2 , $ s 0 , $ t 3 I F I D W B E X M E M Forwarding
Data Hazards and Forwarding • Consider execution on following sequence of instructions: Sub $2, $1, $3 And $12, $2, $5 Or $13, $6, $2 Add $14, $2, $2 Sw $15, 100($2)
T i m e ( i n c l o c k c y c l e s ) C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 V a l u e o f r e g i s t e r $ 2 : 1 0 1 0 1 0 1 0 1 0 / – 2 0 – 2 0 – 2 0 – 2 0 – 2 0 P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) R e g s u b $ 2 , $ 1 , $ 3 I M R e g D M a n d $ 1 2 , $ 2 , $ 5 I M D M R e g R e g I M D M R e g o r $ 1 3 , $ 6 , $ 2 R e g a d d $ 1 4 , $ 2 , $ 2 I M D M R e g R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M D M R e g R e g Pipelined Dependencies
T i m e ( i n c l o c k c y c l e s ) C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 C C 8 C C 9 V a l u e o f r e g i s t e r $ 2 : 1 0 1 0 1 0 1 0 1 0 / – 2 0 – 2 0 – 2 0 – 2 0 – 2 0 V a l u e o f E X / M E M : X X X – 2 0 X X X X X V a l u e o f M E M / W B : X X X X – 2 0 X X X X P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) s u b $ 2 , $ 1 , $ 3 I M R e g D M R e g a n d $ 1 2 , $ 2 , $ 5 I M R e g D M R e g o r $ 1 3 , $ 6 , $ 2 I M R e g D M R e g a d d $ 1 4 , $ 2 , $ 2 I M R e g D M R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M R e g D M R e g
Hazard notation • IN our previous example the two pairs of hazard conditions are • 1a. EX/Mem.RegisterRd = ID/EX.RegisterRs • 1b. EX/Mem.RegisterRd = ID/EX.RegisterRt • 2a. Mem/WB.RegisterRd = ID/EX.RegisterRs • 2b. Mem/WB.RegisterRd = ID/EX.RegisterRs • Hazard on $2 between the sub and and instructions is: • EX/Mem.RegisterRd = ID/EX.RegisterRs = $2
Hazards • Above notation will do forwarding unnecessarily as some instruction do not write registers • Soln: can check WB control to check if register writes back • What about $0 • Do not forward as should always be zero
I D / E X E X / M E M M E M / W B R e g i s t e r s A L U D a t a m e m o r y M u x a . N o f o r w a r d i n g No Forwarding
I D / E X E X / M E M M E M / W B M u x R e g i s t e r s F o r w a r d A A L U M D a t a u m e m o r y M x u x R s F o r w a r d B R t R t M E X / M E M . R e g i s t e r R d u R d x F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t b . W i t h f o r w a r d i n g Forwarding
EX Hazard • If (EX/Mem.RegWrite) and (EX/Mem.RegRd != 0) and (EM/Mem.RegRd = ID/EX.RegRs) thenForwardA = 10 • If (EX/Mem.RegWrite) and (EX/Mem.RegRd != 0) and (EM/Mem.RegRd = ID/EX.RegRt) thenForwardB = 10
Mem Hazard • If (Mem/WB.RegWrite) and (Mem/WB.RegRd != 0) and (Mem/WB.RegRd = ID/EX.RegRs) thenForwardA = 01If (Mem/WB.RegWrite) and (Mem/WB.RegRd != 0) and (Mem/WB.RegRd = ID/EX.RegRt) thenForwardB = 01
What about: Add $1, $1, $2 Add $1, $1, $3 Add $1, $1, $4 • If (Mem/WB.RegWrite) and (Mem/WB.RegRd != 0) and (EX/Mem.RegRd != ID/EX.RegRs) and (Mem/WB.RegRd = ID/EX.RegRs) thenForwardA = 01If (Mem/WB.RegWrite) and (Mem/WB.RegRd != 0) and (EX/Mem.RegRd != ID/EX.RegRt) and (Mem/WB.RegRd = ID/EX.RegRt) thenForwardB = 01
I D / E X W B E X / M E M M W B C o n t r o l M E M / W B E X M W B I F / I D M n o i u t c x u r t s R e g i s t e r s n I D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x I F / I D . R e g i s t e r R s R s I F / I D . R e g i s t e r R t R t I F / I D . R e g i s t e r R t R t M E X / M E M . R e g i s t e r R d u I F / I D . R e g i s t e r R d R d x F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t Updated datapath to resolve data hazards
Data Hazards and Stalls • Forwarding will not work where an instruction tries to read a register following a load instruction that writes the same register
Stalls • Hazard detection unit required at the ID stage • If (ID/EX.MemRead) and (ID/EX.RegRt = IF/ID.RegRs) or (ID/EX.RegRt = IF/ID.RegRt thenstall the pipeline