1 / 19

Pipelined b Processor

Pipelined b Processor. Recap: Unpipelined b. ILL OP. JT. Instruction Memory. XAdr. A. D. 4 3 2 1 0. PCSEL. ra<20:16>. rb<15:11>. rc<25:21>. 00. PC. WASEL. 0. 1. RA2SEL. XP. 1. +4. RA1. RA2. WD. WA. Register File. 0. WERF. WE. rc<25:21>. RD1.

tracey
Télécharger la présentation

Pipelined b Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pipelined b Processor

  2. Recap: Unpipelined b ILL OP JT Instruction Memory XAdr A D 4 3 2 1 0 PCSEL ra<20:16> rb<15:11> rc<25:21> 00 PC WASEL 0 1 RA2SEL XP 1 +4 RA1 RA2 WD WA Register File 0 WERF WE rc<25:21> RD1 RD2 <PC> + 4 + C<15:0> <<2 IRQ Z Z JT Wr C<15:0> sign extended Control Logic WD <PC> + 4 + 4C R/W 0 1 1 0 BSEL ASEL PCSEL RA2SEL ASEL BSELWDSELALUFN Wr WERF WASEL Data Memory A A B ALUFN ALU RD 1 0 2 WDSEL What can we do to make it run FASTER?

  3. CPU Performance To increase MIPS: • Decrease CPI • Instruction set simplicity reduces CPI to 1.0 • To reduce CPI below 1.0 need multiple instruction issue machines (stay tuned!) • Increase Frequency • Frequency limited by longest combinational path from register outputs to register inputs • Can therefore increase frequency by pipelining MIPS = Frequency in MHz . Clocks Per Instruction (CPI)

  4. Pipeline Stages Goal: Maintain (nearly) 1.0 CPI, but increase clock speed Approach: Structure processor as 4-stage pipeline • Instruction Fetch: Maintains PC. Fetches one instruction per cycle. • Register File: Reads source operands from register file. • ALU: Performs indicated operation. • Write-Back: Writes result back into register file. IF RF ALU WB

  5. Sketch of 4-Stage Pipeline IF Instruction Fetch instruction Register File RF (read) CL instruction A B ALU ALU CL instruction Y Write Back Same RF as above! CL RF (write) Need to pass through the stages for branch and jump instructions

  6. 4-Stage b Pipeline ILL OP JT XAdr • Omits some detail • No bypass or interlock logic 4 3 2 1 0 PCSEL Instruction Memory A 00 PCIF D IF +4 PCRF IRRF ra<20:16> rb<15:11> rc<25:21> + 0 1 RA2SEL RA1 RA2 C<15:0> <<2 Register File RD1 RD2 Z <PC> + 4 + 4C JT RF C<15:0> BSEL 0 ASEL 1 1 0 PCALU IRALU A B DALU A B ALU ALU ALUFN PCWB IRWB Y DWB WD A XP Data Memory rc<25:21> RD WB R/W 1 0 1 0 2 WASEL WDSEL WA WD Wr WERF Register File

  7. ... SUBC XOR MUL ... ADDC SUBC XOR MUL ... ADDC SUBC XOR MUL ADDC SUBC XOR MUL 4-Pipeline Execution Consider a sequence of instructions … ADDC(r1, 1, r2) SUBC(r1, 1, r3) XOR(r1, r5, r1) MUL(r2, r5, r0) … executed on the 4-stage pipeline: TIME (cycles) i i + 1 i + 2 i + 3 i + 4 i + 5 i + 6 IF ADDC Pipeline RF ALU WB

  8. Branch Execution Consider a different sequence: LOOP: CMPLEC(r3, 100, r0) ADD(r1, r2, r3) SUB(r1, r2, r4) BNE(r0, LOOP) XOR(r31, r31, r3) … i i + 1 i + 2 i + 3 i + 4 i + 5 i + 6 IF CMP ADD SUB BNE ? RF CMP ADD SUB BNE ALU CMP ADD SUB BNE WB CMP ADD SUB BNE What instruction should we fetch after BNE ?

  9. Branch Execution - II i i + 1 i + 2 i + 3 i + 4 i + 5 i + 6 IF CMP ADD SUB BNE ? RF CMP ADD SUB BNE ALU CMP ADD SUB BNE WB CMP ADD SUB BNE 4 3 2 1 0 PCSEL Instruction Memory A 00 PCIF IF D XOR +4 PCRF IRRF + 0 1 RA2SEL <PC> + 4 + 4C RF RA1 RA2 C<15:0> <<2 Register File BNE RD1 RD2 Z PCIF will output correct address in clock cycle .

  10. Branch Delay Slots Problem: One (or more) following instructions have been prefetched by the time a branch is taken. • Possible solutions are: • “Program around it” • Follow each branch instruction with a NOP = ADD(r31, r31, r31) instruction. • Make the compiler clever enough to move useful instructions after branches. These instructions will be executed regardless of whether the branch is taken or not. • Make pipeline “annul” instruction following branch which is taken, e.g., by disabling RF write and Memory write.

  11. Annulling Prefetched Instructions ILL OP JT XAdr 4 3 2 1 0 PCSEL Instruction Memory A 00 PCIF D NOP +4 XORC 0 1 2 AnnulIF PCRF IRRF BNE ra<20:16> rb<15:11> rc<25:21> + 0 1 RA2SEL <PC> + 4 + 4C RA1 RA2 C<15:0> <<2 Register File RD1 RD2 Z JT PCALU IRALU ADD AnnulIF = 1 if branch is taken, i.e., PCSEL = 1 Similarly for JMP instruction

  12. Data Hazards Consider the sequence: ADD(r1, r2, r3) CMPLEC(r3, 5, r0) MULC(r1, r2, r4) SUB(r1, r2, r5) … i i + 1 i + 2 i + 3 i + 4 i + 5 i + 6 IF ADD CMP MUL SUB RF ADD CMP MUL SUB ALU ADD CMP MUL SUB WB ADD CMP MUL SUB ADD writes new value in r3 during cycle i + 3, which is available beginning of cycle i + 4. Value of r3 read by CMP during cycle i + 2. CMP reads old value of r3.

  13. Insert how many NOPs here ? Data Hazards: Solution - I ADD(r1, r2, r3) CMPLEC(r3, 5, r0) • Solution I (software) • “Program around it” • Compiler inserts NOPs • Compiler rewrites instruction sequence Rewrite ADD(r1, r2, r3) ADD(r1, r2, r3) CMPLEC(r3, 5, r0) MUL(r1, r2, r4) MUL(r1, r2, r4) as SUB(r1, r2, r5) SUB(r1, r2, r5) CMPLEC(r3, 5, r0)

  14. r3 read r3 written Data Hazards: Solution - II • Solution II (hardware) • Detect problem and stall the pipeline • Freeze IF, RF stages for 2 cycles and insert NOPs into IRALU for 2 cycles ADD(r1, r2, r3) CMPLEC(r3, 5, r0) i i + 1 i + 2 i + 3 i + 4 i + 5 i + 6 IF ADD CMP MUL MUL MUL SUB RF ADD CMP CMP CMP MUL SUB ALU ADD NOP1 NOP2 CMP MUL WB ADD NOP1 NOP2 CMP

  15. Pipeline Stalls ILL OP JT XAdr 4 3 2 1 0 PCSEL Instruction Memory A 00 PCIF D MUL +4 IF PCRF IRRF ra<20:16> rb<15:11> rc<25:21> + 0 1 RA2SEL CMP RA1 RA2 C<15:0> <<2 Register File RD1 RD2 Z <PC> + 4 + 4C JT RF NOP C<15:0> BSEL 0 1 1 0 STALLALU 0 ASEL 1 PCALU IRALU A B DALU ADD A B ALU ALU ALUFN PCWB IRWB Y DWB WD A XP Data Memory rc<25:21> RD WB R/W 1 0 1 0 2 WASEL WDSEL WA WD Wr WERF Register File

  16. r3 read r3 compared to 5 r1 + r2 computed r3 written Data Hazards: Solution - III Solution III (hardware) • Bypass paths. • Extra data paths and control logic which re-route data in problem cases. ADD(r1, r2, r3) CMPLEC(r3, 5, r0) i i + 1 i + 2 i + 3 i + 4 i + 5 i + 6 IF ADD CMP MUL SUB RF ADD CMP MUL SUB ALU ADD CMP MUL SUB WB ADD CMP MUL SUB

  17. Bypass Paths - I IRRF Select bypass if OPCODERF = OP, OPC, ... AND OPCODEALU = OP, OPC, ... AND raRF = rcALU (Similar path for other ALU input) CMPLEC(r3, 5, r0) 0 1 RA2SEL RA1 RA2 Register File RD1 RD2 BSEL 0 ASEL 1 1 0 IRALU A B ADD(r1, r2, r3) A B ALU IRWB Y 1 0 2 WDSEL WA WD

  18. Bypass Paths - II IRRF Select bypass if OPCODERF = OP AND WERF = 1 AND rbRF = WA (Similar path for other ALU input) SUB(r1, r2, r3) 0 1 RA2SEL RA1 RA2 Register File RD1 RD2 BSEL 0 ASEL 1 1 0 IRALU A B ??? A B ALU IRWB Y XOR(r4, r5, r2) 1 0 2 WDSEL 1 0 WASEL WA WD Register File WERF

  19. Next Time: Pipelining Subtleties Dilbert : S. Adams

More Related