1 / 58

Lecture 9. MIPS Processor Design – Pipelined Processor Design #2

2010 R&E Computer System Education & Research. Lecture 9. MIPS Processor Design – Pipelined Processor Design #2. Prof. Taeweon Suh Computer Science Education Korea University. Pipelined Datapath. 0. M. u. x. 1. I. F. /. I. D. I. D. /. E. X. E. X. /. M. E. M. M. E. M.

sherri
Télécharger la présentation

Lecture 9. MIPS Processor Design – Pipelined Processor Design #2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2010 R&E Computer System Education & Research Lecture 9. MIPS Processor Design – Pipelined Processor Design #2 Prof. Taeweon Suh Computer Science Education Korea University

  2. Pipelined Datapath

  3. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Instruction Fetch (IF) Instruction fetch

  4. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Instruction Decode (ID) Instruction decode

  5. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Execution (EX) Execution

  6. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Memory (MEM) Memory

  7. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Writeback (WB) Writeback

  8. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for sw instruction: Memory (MEM) Memory

  9. 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for sw instruction: Writeback (WB): do nothing Writeback

  10. Corrected Datapath (for lw) 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 r e s u l t 1 d a t a r e g i s t e r M M D a t a u u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d

  11. Pipelining Example add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1) 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d

  12. Pipeline Control Note that in this implementation, branch instruction decides whether to branch in the MEM stage

  13. Pipeline Control • We have 5 stages • IF, ID, EX, MEM, WB • What needs to be controlled in each stage? • Instruction fetch and PC increment • Instruction decode / operand fetch • Execution stage • RegDst • ALUop[1:0] • ALUSrc • Memory stage • Branch • MemRead • MemWrite • Writeback • MemtoReg • RegWrite (note that this signal is in ID stage)

  14. Pipeline Control • Extend pipeline registers to include control information (created in ID) • Pass control signals along just like the data

  15. Datapath with Control

  16. IF: lw $10, 9($1) P C S r c I D / E X 0 M W B u E X / M E M x 1 C o n t r o l M W B M E M / W B E X M W B I F / I D A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  17. IF: sub $11, $2, $3 ID: lw $10, 9($1) P C S r c I D / E X 0 11 M W B u E X / M E M “lw” x 010 1 C o n t r o l M W B M E M / W B 0001 E X M W B I F / I D A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  18. IF: and $12, $4, $5 ID: sub $11, $2, $3 EX: lw $10, 9($1) P C S r c I D / E X 0 11 10 M W B u E X / M E M “sub” x 010 000 1 C o n t r o l M W B 0 M E M / W B 1100 00 E X M W B I F / I D 1 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  19. IF: or $13, $6, $7 ID: and $12, $4, $5 EX: sub $11, $2, $3 MEM: lw $10, 9($1) P C S r c I D / E X 0 10 10 M W B u E X / M E M “and” x 000 000 11 1 C o n t r o l M W B 0 1 M E M / W B 1100 1 10 E X M W B 0 I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  20. IF: add $14, $8, $9 ID: or $13, $6, $7 EX: and $12, $4, $5 MEM: sub $11, .. WB: lw $10, 9($1) P C S r c I D / E X 0 10 10 M W B u E X / M E M “or” x 000 000 10 1 C o n t r o l M W B 0 1 1 M E M / W B 1100 0 10 E X M W B 0 I F / I D 0 1 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  21. IF: xxxx ID: add $14, $8, $9 EX: or $13, $6, $7 MEM: and $12… WB: sub $11, .. P C S r c I D / E X 0 10 10 M W B u E X / M E M “add” x 000 000 10 1 C o n t r o l M W B 0 1 1 M E M / W B 1100 0 10 E X M W B 0 I F / I D 0 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  22. IF: xxxx ID: xxxx EX: add $14, $8, $9 MEM: or $13, .. WB: and $12… P C S r c 0 10 M I D / E X u E X / M E M x 000 10 W B 1 C o n t r o l W B 0 1 1 M E M / W B M 0 10 M W B 0 I F / I D 0 0 E X A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  23. IF: xxxx ID: xxxx EX: xxxx MEM: add $14, .. P C S r c I D / E X 0 M W B u E X / M E M x 10 1 M C o n t r o l W B 0 1 M E M / W B 0 E X M W B 0 I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control WB: or $13…

  24. IF: xxxx ID: xxxx EX: xxxx MEM: xxxx WB: add $14.. P C S r c I D / E X 0 M W B u E X / M E M x 1 M C o n t r o l W B 1 M E M / W B E X M W B I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control

  25. Dependencies • Dependencies • Problem with starting (or executing) next instruction before first is finished • Dependencies incur data and control hazards

  26. Data Hazard - Software Solution • Data hazards • Dependencies that “go backward in time” • Have compiler guarantee no hazards? • Insert nop (no operation) instructions (“0x00000000” is nop in MIPS) • Code scheduling • Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) • Problem? • This really slows us down!

  27. R e g s u b $ 2 , $ 1 , $ 3 I M R e g D M stall stall stall I I I M M M a n d $ 1 2 , $ 2 , $ 5 I M D M R e g R e g I M D M R e g o r $ 1 3 , $ 6 , $ 2 R e g a d d $ 1 4 , $ 2 , $ 2 I M D M R e g R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M D M R e g R e g Data Hazard - Pipeline Stalls? bubble

  28. Data Hazard - Forwarding • Use temporary results, don’t wait for them to be written • Register file forwarding to handle read/write to same register • ALU forwarding Ok.. Then, do we have to do this forwarding? • If you are asked to design CPU using only rising-edge of the clock, then? • Let’s stick to this for our project • If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? • Our textbook follows this

  29. Forwarding (simplified) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX

  30. MUX MUX Forwarding (from EX/MEM) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX

  31. MUX MUX Forwarding (from MEM/WB) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX

  32. MUX MUX Forwarding (operand selection) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX Forwarding Unit

  33. MUX MUX MUX Forwarding (operand propagation) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX Rd Rt EX/MEM Rd Forwarding Unit Rt Rs MEM/WB Rd

  34. I D / E X W B E X / M E M M W B C o n t r o l M E M / W B E X M W B I F / I D M n o u i t c x u r t R e g i s t e r s s n D a t a I I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x I F / I D . R e g i s t e r R s R s I F / I D . R e g i s t e r R t R t I F / I D . R e g i s t e r R t R t M E X / M E M . R e g i s t e r R d u I F / I D . R e g i s t e r R d R d x F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t Forwarding

  35. Can't always forward • lw (load word) can still cause a hazard • An instruction tries to read a register following a load instruction that writes to the same register • Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction

  36. Stalling • We can stall the pipeline by keeping an instruction in the same stage ID ID IF IF

  37. Hazard Detection Unit • Stall by letting an instruction that won’t write anything go forward • Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or rt=IF/ID.rt)

  38. Control Hazards - Branch • When we decide to branch, other instructions are in the pipeline! • Assume: branch is not taken • When this assumption failed, flush 3 instructions • We are predicting “branch not taken” • need to add hardware for flushing instructions if we are wrong

  39. Alleviate Branch Hazards • Move branch compare to ID stage of the pipeline • Add adder to calculate branch target in ID stage • Add IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register • Reduce penalty to 1 cycle Taken target address is known here Actual condition is generated here MEM MEM IF IF ID ID EX EX WB WB beq $1,$2,L1 Bubblee add $1,$2,$3 … MEM IF ID EX WB L1: sub $1,$2, $3

  40. Flushing Instructions I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  41. Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) beq $1, $3, L2 and $12, $2, $5 I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  42. Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) beq $1, $3, L2 and $12, $2, $5 I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C L2 m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  43. Flushing Instructions (cycle N+1) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) lw $4, 40($7) beq $1, $3, L2 nop I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t

  44. Improving Performance • Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) • Add a “branch delay slot” • The next instruction after a branch is always executed • Rely on compiler to “fill” the slot with something useful • Superscalar • Start more than one instruction in the same cycle • Most all processors are now pipelined and Superscalar

  45. Dynamic Scheduling • The hardware performs the “scheduling” • Hardware tries to find instructions to execute • Out of order (OOO) execution is possible • Speculative execution and dynamic branch prediction • All modern processors are very complicated • DEC Alpha 21264: 9 stage pipeline, 6 instruction issue • PowerPC and Pentium: branch history table • Compiler technology is important • This class has given you the background you need to learn more

  46. Exceptions & Interrupts • CPU has to prepare for all possible situations it could face • “Unexpected” events require change in flow of control • Exceptions arise within the CPU • Undefined opcode • Arithmetic overflow in MIPS • Some other architectures (such as x86 and ARM) do not generate exception on arithmetic overflow. Instead, set bits of the flag register inside CPU • Interrupts are from external I/O devices • Keyboard, Mouse, Network card etc • Many architectures and authors do not distinguish between interrupts and exceptions • Often use the term “interrupt” to refer to both types of events

  47. Pipelined Performance Example • Ideally CPI = 1 • But, need to handle stalling (cause by loads and branches) • SPECINT2000 benchmark: • 25% loads • 10% stores • 11% branches • 2% jumps • 52% R-type • Suppose • 40% of loads are used by next instruction • 25% of branches are mispredicted • What is the average CPI?

  48. Pipelined Performance Example • SPECINT2000 benchmark: • 25% loads • 10% stores • 11% branches • 2% jumps • 52% R-type • If there is no stall in the pipelined MIPS, how would you calculate CPI? • Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1 CPI) = 1 • Suppose • 40% of loads are used by next instruction • 25% of branches are mispredicted • All jumps flush next instruction • What is the average CPI? • Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus • CPIlw = 1 (0.6) + 2 (0.4) = 1.4 • CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 • CPIjump = 2 (1) = 2 • Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15

  49. Pipelined Performance • Critical path of the pipelined MIPS processor: Tc = max { tpcq + tmem + tsetup ,// IF stage 2(tRFread + tmux + teq + tAND + tmux + tsetup ) , // ID stage tpcq + tmux + tmux + tALU + tsetup ,// EX stage tpcq + tmemwrite + tsetup ,// MEM stage 2(tpcq + tmux + tRFwrite) // WB stage } Where does this “2” come from? • If you are asked to design CPU using only rising-edge of the clock, then? • Let’s stick to this for our project • If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? • Our textbook follows this

  50. Pipelined Performance Example Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup ) = 2[150 + 25 + 40 + 15 + 25 + 20] ps= 550 ps

More Related