CS1104 – Computer Organization

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Designing the Control for Single- and Multicycle Datapaths

0 M u x A L U A d d 1 r e s u l t A d d S h i f t l e f t 2 R e g D s t 4 B r a n c h M e m R e a d M e m t o R e g I n s t r u c t i o n [ 3 1 – 2 6 ] C o n t r o l A L U O p M e m W r i t e A L U S r c R e g W r i t e I n s t r u c t i o n [ 2 5 – 2 1 ] R e a d R e a d r e g i s t e r 1 P C R e a d a d d r e s s d a t a 1 I n s t r u c t i o n [ 2 0 – 1 6 ] R e a d Z e r o r e g i s t e r 2 I n s t r u c t i o n 0 R e g i s t e r s A L U R e a d A L U [ 3 1 – 0 ] 0 R e a d W r i t e M d a t a 2 A d d r e s s r e s u l t 1 d a t a I n s t r u c t i o n r e g i s t e r M u M u m e m o r y x u I n s t r u c t i o n [ 1 5 – 1 1 ] W r i t e x 1 D a t a x d a t a 1 m e m o r y 0 W r i t e d a t a 1 6 3 2 I n s t r u c t i o n [ 1 5 – 0 ] S i g n e x t e n d A L U c o n t r o l I n s t r u c t i o n [ 5 – 0 ] Single-cycle Datapath with Control

Control 2 00: lw, sw 01: beq 10: add, sub, and, or, slt 000: and 001: or 010: add 110: sub 111: set on less than Control 1 ALU Control: Two-level implementation bit 31 6 Opcode 2 26 ALUop instruction register 3 ALUcontrol 5 6 Funct. 0

ALU Operation class, computed from instruction type ALU Operations: Control1 • Must describe hardware to compute 3-bit ALU control input • given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic • function code for arithmetic • Describe it using a truth table (can turn into gates):

Control1 • Simple combinational logic (truth tables) Only four of the six bits from the function field are required

Deriving the Control2 signals 9 control (output) signals Input Determine these control signals directly from the opcodes:R-format: 0 lw: 35 sw: 43 beq: 4

Control 2 • PLA example implementation

I n s t r u c t i o n r e g i s t e r D a t a P C A d d r e s s A R e g i s t e r # I n s t r u c t i o n A L U A L U O u t M e m o r y R e g i s t e r s o r d a t a R e g i s t e r # M e m o r y d a t a B D a t a r e g i s t e r R e g i s t e r # Multicycle Datapath • Single Cycle Problems: • clock cycle time has to be long enough to accommodate longest instruction • no sharing of functional units or resources • Solution: • multicycle datapath IR MDR

Multicycle Approach • Break up the instructions into steps, each step takes a cycle • balance the amount of work to be done • restrict each cycle to use only one major functional unit • At the end of a cycle • store values for use in later cycles (easiest thing to do) • introduce additional “internal” registers • Notice: we distinguish • processor state: programmer visible registers • internal state: programmer invisible registers (like IR, MDR, A, B, and ALUout)

P C 0 0 I n s t r u c t i o n R e a d M M A d d r e s s [ 2 5 – 2 1 ] r e g i s t e r 1 u u x x R e a d A I n s t r u c t i o n R e a d Z e r o M e m o r y 1 d a t a 1 1 [ 2 0 – 1 6 ] r e g i s t e r 2 A L U A L U A L U O u t 0 M e m D a t a R e g i s t e r s r e s u l t I n s t r u c t i o n W r i t e M R e a d [ 1 5 – 0 ] r e g i s t e r B u 0 d a t a 2 I n s t r u c t i o n W r i t e x [ 1 5 – 1 1 ] M I n s t r u c t i o n 4 1 W r i t e d a t a 1 u r e g i s t e r d a t a 2 x 0 I n s t r u c t i o n 3 [ 1 5 – 0 ] M u x M e m o r y 1 1 6 3 2 d a t a S h i f t S i g n r e g i s t e r l e f t 2 e x t e n d Multicycle Datapath

Five Execution Steps • Instruction Fetch • Instruction Decode and Register Fetch • Execution, Memory Address Computation, or Branch Completion • Memory Access or R-type instruction completion • Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

Step 1: Instruction Fetch • Use PC to get instruction and put it in the Instruction Register • Increment the PC by 4 and put the result back in the PC IR = Memory[PC]; PC = PC + 4;

Step 2: Instruction Decode and Register Fetch • Read registers rs and rt in case we need them • Compute the branch address in case the instruction is a branch • Previous two are optimistic actions – might not be needed A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC+(sign-extend(IR[15-0])<< 2);

Step 3 (instruction dependent) • ALU is performing one of four functions, based on instruction type • Memory Reference: ALUOut = A + sign-extend(IR[15-0]); • R-type: ALUOut = A op B; • Branch: if (A==B) PC = ALUOut; • Jump: PC = PC[31-28] || (IR[25-0]<<2)

Step 4 (R-type or memory-access) • Loads and stores access memory MDR = Memory[ALUOut]; or Memory[ALUOut] = B; • R-type instructions finish Reg[IR[15-11]] = ALUOut;The write actually takes place at the end of the cycle on the edge

Write-back step • Memory read completion stepReg[IR[20-16]]= MDR; Executed in the case of Load instructions. Recall that the register MDR contains the data read out of the memory.

Summary of the five execution steps

Adding Control Signals to the Datapath IRWrite RegDst ALUSrcA MemRead RegWrite IorD 4 control MemWrite ALUSrcB MemtoReg ALUOp 0 1 All muxs are numbered as:

The Detailed Control

The Control Signals: IRWrite: When asserted, output of memory written to IR PCWrite: When asserted, PC is written; the source is controlled by PCSource PCWriteCond: When asserted, PC is written if the Zero output from ALU is also active. ALUOp: 00 (add), 01 (subtract), 10 (funct field of instr determines operation) The other signals are apparent from the figure.

Finite state machines (FSMs) • Finite state machines: • a set of states • next state function (determined by current state and the input) • output function (determined by current state and possibly input) N e x t s t a t e N e x t - s t a t e C u r r e n t s t a t e f u n c t i o n C l o c k I n p u t s O u t p u t O u t p u t s f u n c t i o n

input i = 0 i = 0 i = 1 i = 1 s = 0 s = 1 state Finite state machines FSMs) • State is an abstraction • You may consider the state of a FSM to be a variable or a function, or a collection of variables or functions • If the output depends only on the current state, then it is a Moore machine. If the output depends on the state and the input then it is a Mealy machine output = 0 output = 1 This machine has two states. How does the output behave when the input = 1?

N e x t s t a t e N e x t - s t a t e C u r r e n t s t a t e f u n c t i o n C l o c k I n p u t s O u t p u t O u t p u t s f u n c t i o n Moore machine • The output function depends only on the current state • The next state function depends on the current state and the input

Implementing the Control • Value of control signals is dependent upon: • what instruction is being executed • which step is being performed • Use the information we have accumulated to specify a finite state machine (FSM) • specify the finite state machine graphically, or • use microprogramming • Implementation can be derived from specification

FSM: high level view Start/reset Instruction fetch, decode and register fetch Memory access instructions R-type instructions Branch instruction Jump instruction

I n s t r u c t i o n d e c o d e / I n s t r u c t i o n f e t c h r e g i s t e r f e t c h 0 M e m R e a d 1 A L U S r c A = 0 I o r D = 0 A L U S r c A = 0 I R W r i t e A L U S r c B = 1 1 S t a r t A L U S r c B = 0 1 A L U O p = 0 0 A L U O p = 0 0 P C W r i t e P C S o u r c e = 0 0 ) ) ' e Q p ) y ' t E - J R B ' = ' p = = O ( ) ' p p M e m o r y a d d r e s s W S O ' B r a n c h O = J u m p ( p ( O c o m p u t a t i o n ( r E x e c u t i o n c o m p l e t i o n o c o m p l e t i o n ) ' W L ' = p 2 6 8 9 O ( A L U S r c A = 1 A L U S r c A = 1 A L U S r c B = 0 0 A L U S r c A = 1 P C W r i t e A L U S r c B = 1 0 A L U O p = 0 1 A L U S r c B = 0 0 P C S o u r c e = 1 0 A L U O p = 0 0 P C W r i t e C o n d A L U O p = 1 0 P C S o u r c e = 0 1 ( O ) p ' = W ' L S ' W = ' ) p M e m o r y M e m o r y O ( a c c e s s a c c e s s R - t y p e c o m p l e t i o n 3 5 7 R e g D s t = 1 M e m R e a d M e m W r i t e R e g W r i t e I o r D = 1 I o r D = 1 M e m t o R e g = 0 W r i t e - b a c k s t e p 4 R e g D s t = 0 R e g W r i t e M e m t o R e g = 1 Specifying the FSM

Finite State Machine for Control Implementation:

PLA (programmed logic array) Implementation opcode AND plane (computes minterms) current state datapath control OR plane (computes sum terms) next state

ROM Implementation • ROM = "Read Only Memory" • values of memory locations are fixed ahead of time • A ROM can be used to implement a truth table • if the address is m-bits, we can address 2m entries in the ROM • our outputs are the bits of data that the address points to address data ROM 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 n bits m bits m is the "heigth", and n is the "width"

ROM Implementation • How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) • How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs • ROM is 210 x 20 = 20K bits (very large and a rather unusual size) • Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored

ROM Implementation Cheaper implementation: • Exploit the fact that the FSM is a Moore machine ==> • Control outputs only depend on current state and not on other incoming control signals ! • Next state depends on all inputs • Break up the table into two parts — 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM — Total number of bits: 4.3K bits of ROM

ROM vs PLA • PLA is much smaller • can share product terms (ROM has an entry (=address) for every product term • only need entries that produce an active output • can take into account don't cares • Size of PLA:(#inputs ´ #product-terms) + (#outputs ´ #product-terms) • For this example: (10x17)+(20x17) = 460 PLA cells • PLA cells usually slightly bigger than the size of a ROM cell

Another Implementation Style • Real machines have many instructions => complex FSM with many states • Graphical specification becomes cumbersome • Specify control as an instruction • microinstructions • built out of separate fields (for controlling ALU, SRC1, SCR2, etc) • Exploit the fact that usually the next state is the next microinstruction (just like in a sequential programming language) • default sequencing • use micro program counter (indicating next state = next instr.)

Another Implementation Style • Complex instructions: the "next state" is often current state + 1

Micro-programming What are the “microinstructions” ?

Microinstruction format • Each microinstruction contains 7 fields

Microinstruction format

Microprogramming • A specification methodology • appropriate if hundreds of opcodes, modes, cycles, etc. • signals specified symbolically using microinstructions

P L A o r R O M 1 S t a t e A d d e r A d d r C t l M u x 3 2 1 0 0 D i s p a t c h R O M 2 D i s p a t c h R O M 1 A d d r e s s s e l e c t l o g i c p O I n s t r u c t i o n r e g i s t e r o p c o d e f i e l d Details

Details

Maximally vs. Minimally Encoded • No encoding (also called horizontal encoding, or 1-hot encoding): • 1 bit for each datapath operation • faster, requires more memory (logic) • used for Vax 780 — an astonishing 400K of memory! • Lots of encoding (also called vertical encoding): • send the microinstructions through logic to get control signals • uses less memory, slower • Historical context of CISC: • Too much logic to put on a single chip with everything else • Use a ROM (or even RAM) to hold the microcode • It’s easy to add new instructions

Microcode: Trade-offs • Distinction between specification and implementation is sometimes blurred • Specification Advantages: • Easy to design and write • Design architecture and microcode in parallel • Implementation (off-chip ROM) Advantages • Easy to change since values are in memory • Can emulate other architectures • Can make use of internal registers • Implementation Disadvantages, SLOWER now that: • Control is implemented on same chip as processor • ROM is no longer faster than RAM • No need to go back and make changes

I n i t i a l F i n i t e s t a t e M i c r o p r o g r a m r e p r e s e n t a t i o n d i a g r a m S e q u e n c i n g E x p l i c i t n e x t M i c r o p r o g r a m c o u n t e r c o n t r o l s t a t e f u n c t i o n + d i s p a t c h R O M S L o g i c L o g i c T r u t h r e p r e s e n t a t i o n e q u a t i o n s t a b l e s I m p l e m e n t a t i o n P r o g r a m m a b l e R e a d o n l y t e c h n i q u e l o g i c a r r a y m e m o r y The Big Picture

CS1104 – Computer Organization