1 / 28

Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller

EECS 322: Computer Architecture. Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller. MIPS instruction formats. 1. I. m. m. e. d. i. a. t. e. a. d. d. r. e. s. s. i. n. g. 2. R. e. g. i. s. t. e. r. a. d. d. r. e. s. s. i. n. g. R. e. g.

zeal
Télécharger la présentation

Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 322: Computer Architecture Single-cycleMulti-cycle FSM controllerMulti-cycle microcontroller CWRU EECS 322 March 6, 2000

  2. MIPS instruction formats 1 . I m m e d i a t e a d d r e s s i n g 2 . R e g i s t e r a d d r e s s i n g R e g i s t e r s R e g i s t e r 3 . B a s e a d d r e s s i n g M e m o r y B y t e H a l f w o r d W o r d R e g i s t e r 4 . P C - r e M e m o r y W o r d M e m o r y W o r d o p r s r t I m m e d i a t e Arithmetic add $rd,$rs,$rt o p r s r t r d . . . f u n c t Data Transfer lw $rd,offset($rs) sw $rd,offset($rs) o p r s r t A d d r e s s + l a t i v e a d d r e s s i n g o p r s r t A d d r e s s Conditional branch beq $rd,$rs,raddr + P C 5 . P s e u d o d i r e c t a d d r e s s i n g Unconditional jump j addr o p A d d r e s s P C CWRU EECS 322 March 6, 2000

  3. Single Cycle Implementation • Calculate instruction cycle time assuming negligible delays except: • memory (2ns), ALU and adders (2ns), register file access (1ns) Adder2: PCPC+signext(IR[15-0]) <<2 Adder3: Arithmetic ALU Adder1: PC  PC + 4 Single Cycle = 2 adders + 1 ALU CWRU EECS 322 March 6, 2000

  4. Single/Multi-Clock Comparison add = 6ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+RegW(2ns) lw = 8ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemR(2ns)+RegW(2ns) sw = 7ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemW(2ns) beq = 5ns = Fetch(2ns)+RegR(1ns)+ALU(2ns) j = 2ns = Fetch(2ns) Architectural improved performance without speeding up the clock! CWRU EECS 322 March 6, 2000

  5. Some Design Trade-offs High level design techniques Algorithms: change instruction usage minimize  ninstruction * tinstruction Architecture: Datapath, FSM, Microprogramming adders: ripple versus carry lookahead multiplier types, … Lower level design techniques (closer to physical design) clocking: single verus multi clock technology: layout tools: better place and route process technology: 0.5 micron to .18 micron CWRU EECS 322 March 6, 2000

  6. Single-cycle problems • Single Cycle Problems: • what if we had a more complicated instruction like floating point? (fadd = 30ns, fmul=100ns) • wasteful of area (2 adders + 1 ALU) • One Solution: • use a “smaller” cycle time (if the technology can do it) • have different instructions take different numbers of cycles • a “multicycle” datapath (1 ALU) • Multi-cycle approach • We will be reusing functional units: ALU used to increment PC (Adder1) and to compute address (Adder2) • Memory used for instruction and data CWRU EECS 322 March 6, 2000

  7. Reality Check: Intel 8086 clock cycles Arithmetic 3 add reg16, reg16 118-133 mul dx:ax, reg16 very slow!! 128-154 imul dx:ax, reg16 114-162 div dx:ax, reg16 165-184 idiv dx:ax, reg16Data Transfer 14 mov reg16, mem16 15 mov mem16, reg16Conditional Branch 4/16 je displacement8Unconditional Jump 15 jmp segment:offset16 CWRU EECS 322 March 6, 2000

  8. Multi-cycle Datapath Multi-cycle = 1 ALU + Controller CWRU EECS 322 March 6, 2000

  9. Multi-cycle Datapath: with controller CWRU EECS 322 March 6, 2000

  10. Multi-cycle: 5 execution steps • T1 (a,lw,sw,beq,j) Instruction Fetch • T2 (a,lw,sw,beq,j) Instruction Decode and Register Fetch • T3 (a,lw,sw,beq,j) Execution, Memory Address Calculation, or Branch Completion • T4 (a,lw,sw) Memory Access or R-type instruction completion • T5 (a,lw) Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! CWRU EECS 322 March 6, 2000

  11. Multi-cycle Approach All operations in each clock cycle Ti are done in parallel not sequential! For example, T1, IR = Memory[PC] and PC=PC+4 are done simultaneously! T1 T2 T3 T4 T5 Between Clock T2 and T3 the microcode sequencer will do a dispatch 1 CWRU EECS 322 March 6, 2000

  12. Multi-cycle using Microprogramming Microcode controller Finite State Machine( hardwired control ) M i c r o c o d e s t o r a g e C o m b i n a t i o n a l c o n t r o l l o g i c D a t a p a t h c o n t r o l o u t p u t s D a t a p a t h c o n t r o l O u t p u t s firmware o u t p u t s O u t p u t s I n p u t 1 I n p u t s S e q u e n c i n g M i c r o p r o g r a m c o u n t e r c o n t r o l A d d e r N e x t s t a t e A d d r e s s s e l e c t l o g i c I n p u t s f r o m i n s t r u c t i o n S t a t e r e g i s t e r r e g i s t e r o p c o d e f i e l d I n p u t s f r o m i n s t r u c t i o n r e g i s t e r o p c o d e f i e l d Requires microcode memory to be faster than main memory CWRU EECS 322 March 6, 2000

  13. Microcode: Trade-offs • Distinction between specification and implementation is sometimes blurred • Specification Advantages: • Easy to design and write (maintenance) • Design architecture and microcode in parallel • Implementation (off-chip ROM) Advantages • Easy to change since values are in memory • Can emulate other architectures • Can make use of internal registers • Implementation Disadvantages, SLOWER now that: • Control is implemented on same chip as processor • ROM is no longer faster than RAM • No need to go back and make changes CWRU EECS 322 March 6, 2000

  14. Microinstruction format CWRU EECS 322 March 6, 2000

  15. Microinstruction format: Maximally vs. Minimally Encoded • No encoding: • 1 bit for each datapath operation • faster, requires more memory (logic) • used for Vax 780 — an astonishing 400K of memory! • Lots of encoding: • send the microinstructions through logic to get control signals • uses less memory, slower • Historical context of CISC: • Too much logic to put on a single chip with everything else • Use a ROM (or even RAM) to hold the microcode • It’s easy to add new instructions CWRU EECS 322 March 6, 2000

  16. Microprogramming: program CWRU EECS 322 March 6, 2000

  17. Microprogramming: program overview T1 T2 T3 T4 T5 Fetch Fetch+1 Dispatch 1 Rformat1 BEQ1 JUMP1 Mem1 Dispatch 2 Rformat1+1 LW2 SW2 LW2+1 CWRU EECS 322 March 6, 2000

  18. Microprogram steping: T1 Fetch (Done in parallel) IRMEMORY[PC] & PC PC + 4 Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqFetch add pc 4 ReadPCALU Seq CWRU EECS 322 March 6, 2000

  19. T2 Fetch + 1 AReg[IR[25-21]] & BReg[IR[20-16]] & ALUOutPC+signext(IR[15-0]) <<2 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq add pc ExtSh Read D#1 CWRU EECS 322 March 6, 2000

  20. T3 Dispatch 1: Mem1 ALUOut  A + sign_extend(IR[15-0]) Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqMem1 add A ExtSh D#2 CWRU EECS 322 March 6, 2000

  21. T4 Dispatch 2: LW2 MDR  Memory[ALUOut] Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqLW2 ReadALU Seq CWRU EECS 322 March 6, 2000

  22. T5 LW2+1 Reg[ IR[20-16] ]  MDR Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqWMDR Fetch CWRU EECS 322 March 6, 2000

  23. T4 Dispatch 2: SW2 Memory[ ALUOut ]  B Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqSW2 WriteALU Fetch CWRU EECS 322 March 6, 2000

  24. T3 Dispatch 1: Rformat1 ALUOut A op(IR[31-26]) B op(IR[31-26]) Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqRf...1 op A B Seq CWRU EECS 322 March 6, 2000

  25. T4 Dispatch 1: Rformat1+1 Reg[ IR[15-11] ] ALUOut Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WALU Fetch CWRU EECS 322 March 6, 2000

  26. T3 Dispatch 1: BEQ1 If (A - B == 0) { PC  ALUOut; } ALUOut = Address computed in T2 ! Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqBEQ1 subt A B ALUOut-0 Fetch CWRU EECS 322 March 6, 2000

  27. T3 Dispatch 1: Jump1 PC  PC[31-28] || IR[25-0]<<2 Label ALU SRC1 SRC2 RCntl Memory PCwrite SeqJump1 Jaddr Fetch CWRU EECS 322 March 6, 2000

  28. The Big Picture CWRU EECS 322 March 6, 2000

More Related