1 / 45

LECTURE 7: Multicycle CPU

EECS 318 CAD Computer Aided Design. LECTURE 7: Multicycle CPU. Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow. MIPS instructions. ALU alu $rd,$rs,$rt $rd = $rs <alu> $rt .

elom
Télécharger la présentation

LECTURE 7: Multicycle CPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 318 CADComputer Aided Design LECTURE 7: Multicycle CPU Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow CWRU EECS 318

  2. MIPS instructions ALUalu $rd,$rs,$rt$rd = $rs <alu> $rt ALUialui $rd,$rs,value$rd = $rs <alu> value Data lw $rt,offset($rs)$rt = Mem[$rs + offset]Transfer sw $rt,offset($rs)Mem[$rs + offset] = $rt Branch beq $rs,$rt,offset $pc = ($rd == $rs)? (pc+4+offset):(pc+4); Jump j addresspc = address CWRU EECS 318

  3. MIPS fixed sized instruction formats R - Format ALUalu $rd,$rs,$rt func shamt rs op rt rd rs op rt value or offset op absolute address J - Format Jump j address ALUi alui $rt,$rs,value I - Format Data lw $rt,offset($rs)Transfer sw $rt,offset($rs) Branch beq $rs,$rt,offset CWRU EECS 318

  4. Assembling Instructions func shamt rs op rt rd ALUalu $rd,$rs,$rt rs op rt value or offset 0x00400020 addu$23, $0, $31 001001:00000:11111:10111:00000:000000 ALUi alui $rt,$rs,value 0x00400024 addi $17, $0, 5 001000:00000:00101:0000000000000101 Suppose there are 32 registers, addu opcode=001001, addi op=001000 CWRU EECS 318

  5. MIPS instruction formats 1 . I m m e d i a t e a d d r e s s i n g o p r s r t I m m e d i a t e 2 . R e g i s t e r a d d r e s s i n g o p r s r t r d . . . f u n c t R e g i s t e r s R e g i s t e r 3 . B a s e a d d r e s s i n g M e m o r y o p r s r t A d d r e s s + B y t e H a l f w o r d W o r d R e g i s t e r 4 . P C - r e l a t i v e a d d r e s s i n g M e m o r y o p r s r t A d d r e s s + W o r d P C 5 . P s e u d o d i r e c t a d d r e s s i n g M e m o r y o p A d d r e s s W o r d Arithmeticaddi $rt, $rs, value add $rd,$rs,$rt Data Transferlw $rt,offset($rs)sw $rt,offset($rs) Conditional branchbeq $rs,$rt,offset Unconditional jumpj address CWRU EECS 318

  6. MIPS registers and conventions NameNumberConventional usage$0 0 Constant 0$v0-$v1 2-3 Expression evaluation & function return$a0-$a3 4-7 Arguments 1 to 4$t0-$t9 8-15,24,35 Temporary (not preserved across call)$s0-$s7 16-23 Saved Temporary (preserved across call)$k0-$k1 26-27 Reserved for OS kernel$gp 28 Pointer to global area$sp 29 Stack pointer$fp 30 Frame pointer$ra 31 Return address (used by function call) CWRU EECS 318

  7. C function to MIPS Assembly Language Exit condition of a while loop is if ( i >= y ) then goto w2 int power_2(int y) { /* compute x=2^y; */ register int x, i; x=1; i=0; while(i<y) { x=x*2; i=i+1; } return x;} Assember .s Commentsaddi $t0, $0, 1 # x=1; addu $t1, $0, $0 # i=0;w1:bge $t1,$a0,w2 # while(i<y) { /* bge= greater or equal */addu $t0, $t0, $t0 # x = x * 2; /* same as x=x+x; */addi $t1,$t1,1 # i = i + 1;beq $0,$0,w1 # }w2:addu $v0,$0,$t0 # return x;jr $ra# jump on register ( pc = ra; ) CWRU EECS 318

  8. Power_2.s: MIPS storage assignment Byte address, not word address 2 words after pc fetch after bge fetch pc is 0x00400030 plus 2 words is 0x00400038 .text 0x00400020 addi $8, $0, 1 # addi $t0, $0, 1 0x00400024 addu $9, $0, $0 # addu $t1, $0, $0 0x00400028 bge $9, $4, 2 # bge $t1, $a0, w2 0x0040002c addu $8, $8, $8 # addi $t0, $t0, $t0 0x00400030 addi $9, $9, 1 # addi $t1, $t1, 1 0x00400034 beq $0, $0, -3 # beq $0, $0, w1 0x00400038 addu $2, $0, $8 # addu $v0, $0, $t0 0x0040003c jr $31 # jr $ra CWRU EECS 318

  9. Machine Language Single Stepping Values changes after the instruction! $pc $v0 $a0 $t0 $t1 $ra $2 $4 $8 $9 $31 00400020 ? 0 ? ? 700018 addi $t0, $0, 1 0040003c 1 0 1 0 700018 jr $ra 00700018 ? 0 1 0 700018 … Assume power2(0); is called; then $a0=0 and $ra=700018 00400024 ? 0 1 ? 700018 addu $t1, $0, $0 00400028 ? 0 1 0700018 bge $t1,$a0,w2 00400038 ? 0 1 0 700018 add $v0,$0,$t0 CWRU EECS 318

  10. Von Neuman & Harvard CPU Architectures ALU I/O ALU I/O Address bus Data bus instructions and data instructions data Von Neuman architectureArea efficient but requires higher bus bandwidth because instructions and data must compete for memory. Harvard architecture was coined to describe machines with separate memories.Speed efficient: Increased parallelism. CWRU EECS 318

  11. Multi-cycle Processor Datapath CWRU EECS 318

  12. Multi-cycle Datapath: with controller CWRU EECS 318

  13. Multi-cycle using Finite State Machine Finite State Machine( hardwired control ) C o m b i n a t i o n a l c o n t r o l l o g i c D a t a p a t h c o n t r o l o u t p u t s O u t p u t s I n p u t s N e x t s t a t e I n p u t s f r o m i n s t r u c t i o n S t a t e r e g i s t e r r e g i s t e r o p c o d e f i e l d CWRU EECS 318

  14. Finite State Machine: program overview T1 T2 T3 T4 T5 Fetch Decode Rformat1 BEQ1 JUMP1 Mem1 Rformat1+1 LW2 SW2 LW2+1 CWRU EECS 318

  15. The Four Stages of R-Format Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Wr R-Format • Fetch: • Fetch the instruction from the Instruction Memory • Decode: • Registers Fetch and Instruction Decode • Exec: ALU • ALU operates on the two register operands • Update PC • Write: Reg • Write the ALU output back to the register file CWRU EECS 318

  16. R-Format State Machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Clock=1 Ifetch Reg/Dec Exec Wr R-Format Exec Fetch Write Clock=1 Clock=1 Clock=1 CWRU EECS 318

  17. The Five Stages of Load Instruction Ifetch Reg/Dec Exec Mem Wr • Fetch: • Fetch the instruction from the Instruction Memory • Decode: • Registers Fetch and Instruction Decode • Exec: Offset • Calculate the memory offset • Mem: • Read the data from the Data Memory • Wr: • Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load CWRU EECS 318

  18. R-Format & I-Format State Machine Need to check instruction format Clock=1 AND I-Format=1 Clock=1 Decode ExecOffset Exec ALU Fetch Clock=1 AND opcode=LW Write Reg Mem Read Need to check opcode WriteReg Clock=1 Clock=1 Clock=1 AND R-Format=1 Clock=1 Clock=1 CWRU EECS 318

  19. Multi-Instruction sequence Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Load Store R-type Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch CWRU EECS 318

  20. State machine stepping: T1 Fetch (Done in parallel) IRMEMORY[PC] & PC PC + 4 PC IR CWRU EECS 318

  21. T1 Fetch: State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Ifetch Reg/Dec Exec Wr R-Format Write Reg Exec Start MemRead=1, MemWrite=0IorD=1 (MemAddrPC)IRWrite=1 (IRMem[PC])ALUSrcA=0 (=PC)ALUSrcB=1 (=4)ALUOP=ADD (PC4+PC)PCWrite=1, PCSource=1 (=ALU)RegWrite=0, MemtoReg=X, RegDst=X Instruction Fetch CWRU EECS 318

  22. T2Decode (read $rs and $rt and offset+pc) & ALUOutPC+signext(IR[15-0]) <<2 PC $rs$rt offset AReg[IR[25-21]] & BReg[IR[20-16]] CWRU EECS 318

  23. T2 Decode State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Wr R-Format Write Reg Exec Fetch MemRead=0, MemWrite=0IorD=XIRWrite=0ALUSrcA=0 (=PC)ALUSrcB=3 (=signext(IR<<2))ALUOP=0 (=add)PCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Start Instr. Decode & Register Fetch CWRU EECS 318

  24. T3 ExecALU (ALU instruction) ALUOut A op(IR[31-26]) B op(IR[31-26]) CWRU EECS 318

  25. T3ExecALU State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Ifetch Reg/Dec Exec Wr R-Format Write Reg Fetch Start R-Format Execution MemRead=0, MemWrite=0IorD=XIRWrite=0ALUSrcA=1 (=A =Reg[$rs])ALUSrcB=0 (=B =Reg[$rt])ALUOP=2 (=IR[28-26])PCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X CWRU EECS 318

  26. T4WrReg (ALU instruction) Reg[ IR[15-11] ] ALUOut CWRU EECS 318

  27. T4 WrReg State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Ifetch Reg/Dec Exec Wr R-Format Fetch Start Exec R-Format Write Register MemRead=0, MemWrite=0IorD=XIRWrite=0ALUSrcA=XALUSrcB=XALUOP=X PCWrite=0, PCSource=XRegWrite=1, (Reg[$rd] ALUout) MemtoReg=0, (=ALUout)RegDst=1 (=$rd) CWRU EECS 318

  28. Review Moore Machine IR[31-0] Processor Control LinesMemReadMemWriteIorD ... Next State CWRU EECS 318

  29. Moore Output State Tables: O(State) T2 0 0 X 0 0 + 0 =PC 3 =offset 0 X 0 X X T3-R 0 0 X 0 2 =op 1 =A =$rs 0 =B =$rt 0 X 0 X X T4-R 0 0 X 0 X X X 0 X 1 0 =ALUOut 1 =$rd T1 1 0 0 =PC 1 0 + 0 =PC 1 =4 1 0 =ALU 0 X X State MemRead MemWrite MUX IorD IRWrite ALUOP MUX ALUSrcA MUX ALUSrcB PCWrite MUX PCSource RegWrite MUX MemtoReg MUX RegDst CWRU EECS 318

  30. Review: The Five Stages of Load Instruction Ifetch Reg/Dec Exec Mem Wr • Fetch: • Fetch the instruction from the Instruction Memory • Decode: • Registers Fetch and Instruction Decode • Exec: Offset • Calculate the memory offset • Mem: • Read the data from the Data Memory • Wr: • Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load CWRU EECS 318

  31. Review: R-Format & I-Format State Machine Need to check instruction format Clock=1 AND I-Format=1 Clock=1 Decode ExecOffset Exec ALU Fetch Clock=1 AND opcode=LW Write Reg Mem Read Need to check opcode WriteReg Clock=1 Clock=1 Clock=1 AND R-Format=1 Clock=1 Clock=1 CWRU EECS 318

  32. T3–I Mem1 (common to both load & store) ALUOut  A + sign_extend(IR[15-0]) CWRU EECS 318

  33. T3 Mem1 I-Format State Machine =$rs + offset Clock=1 AND I-Format=1 Decode Exec ALU Fetch Write Reg Mem Read WriteReg Clock=1 AND R-Format=1 MemRead=0, MemWrite=0IorD=XIRWrite=0ALUOP=0 +ALUSrcA=1 =A =Reg[$rs]ALUSrcB=2 =signext(IR[15-0])PCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Clock=1 AND opcode=LW I-Format ExecutionALUout=$rs+offset CWRU EECS 318

  34. T4 –LW1 : load instruction, read memory MDR  Memory[ALUOut] CWRU EECS 318

  35. T4 LW2 I-Format State Machine =Mem[ALU] Decode ExecOffset Exec ALU Fetch Write Reg WriteReg Clock=1 AND I-Format=1 Clock=1 AND R-Format=1 Clock=1 AND opcode=LW I-Format Memory Read MemRead=1, MemWrite=0IorD=1IRWrite=0ALUOP=XALUSrcA=XALUSrcB=XPCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Clock=1 AND opcode=LW CWRU EECS 318

  36. T5 –LW2 Load instruction, write to register Reg[IR[20-16]]  MDR CWRU EECS 318

  37. T5 LW2 I-Format State Machine $rt=MDR Decode ExecOffset Exec ALU Fetch Write Reg Mem Read Clock=1 AND I-Format=1 Clock=1 AND R-Format=1 Clock=1 AND opcode=LW I-Format Register Write MemRead=1, MemWrite=0IorD=1IRWrite=0ALUOP=XALUSrcA=XALUSrcB=XPCWrite=0, PCSource=XRegWrite=1, MemtoReg=1, RegDst=1 Clock=1 AND opcode=LW CWRU EECS 318

  38. T4–SW2 Store instruction, write to memory Memory[ ALUOut ]  B CWRU EECS 318

  39. T4 SW2 I-Format State Machine Mem[ALU]= Decode ExecOffset Exec ALU Fetch Write Reg Clock=1 AND I-Format=1 Clock=1 AND R-Format=1 Clock=1 AND opcode=SW I-Format Memory Write MemRead=0, MemWrite=1IorD=1IRWrite=0ALUOP=XALUSrcA=XALUSrcB=XPCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Store not Load! CWRU EECS 318

  40. T3 BEQ1 (Conditional branch instruction) If (A - B == 0) { PC  ALUOut; } Zero ALUOut = Address computed in T2 ! CWRU EECS 318

  41. T3 BEQ1 I-Format State Machine =$rs + offset Decode Exec ALU Fetch Write Reg Clock=1 AND opcode=branch Clock=1 AND R-Format=1 MemRead=0, MemWrite=0IorD=XIRWrite=0ALUOP=0 =subtractALUSrcA=1 =A =Reg[$rs]ALUSrcB=0 =B =Reg[$rt]PCWrite=0, PCWriteCond=1, PCSource=1 =ALUoutRegWrite=0, MemtoReg=X, RegDst=X B-Format Execution CWRU EECS 318

  42. T3 Jump1 (Jump Address) PC  PC[31-28] || IR[25-0]<<2 CWRU EECS 318

  43. Moore Output State Tables: O(State) T4-SW 0 1 1=ALU 0 X X X 0 X 0 X X T4-R 0 0 X 0 X X X 0 X 1 0=ALU 1=$rd T2 0 0 X 0 0 0 3 0 X 0 X X T3-R 0 0 X 0 2=op 1=A=$rs 0=B=$rt 0 X 0 X X T4-LW 1 0 1=ALU 0 X X X 0 X 0 X X T1 1 0 0=PC 1 0=+ 0=PC 1=4 1 0=AL 0 X X T3-I 0 0 X 0 0=add 1=A=$rs 2=sign 0 X 0 X X T5-LW 0 0 X 0 X X X 0 X 1 1=MDR 1=$rt State MemRead MemWrite MUX IorD IRWrite ALUOP MUX ALUSrcA MUX ALUSrcB PCWrite MUX PCSource RegWrite MUX MemtoReg MUX RegDst CWRU EECS 318

  44. Multi-cycle: 5 execution steps • T1 (a,lw,sw,beq,j) Instruction Fetch • T2 (a,lw,sw,beq,j) Instruction Decode and Register Fetch • T3 (a,lw,sw,beq,j) Execution, Memory Address Calculation, or Branch Completion • T4 (a,lw,sw) Memory Access or R-type instruction completion • T5 (a,lw) Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! CWRU EECS 318

  45. Multi-cycle Approach All operations in each clock cycle Ti are done in parallel not sequential! For example, T1, IR = Memory[PC] and PC=PC+4 are done simultaneously! T1 T2 T3 T4 T5 Between Clock T2 and T3 the microcode sequencer will do a dispatch 1 CWRU EECS 318

More Related