800 likes | 932 Vues
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank. Competency Area 5: Processor: Datapath & Control. Introduction. We have discussed: Performance Instruction Sets Computer Arithmetic
 
                
                E N D
Computer Architecture Lecture Notes Spring 2005Dr. Michael P. Frank Competency Area 5: Processor: Datapath & Control
Introduction • We have discussed: • Performance • Instruction Sets • Computer Arithmetic • Now, processor implementation (i.e. hardware for implementing instructions) through study of the datapath and control components of a computer.
D a t a R e g i s t e r # A d d r e s s P C I n s t r u c t i o n R e g i s t e r s A L U A d d r e s s R e g i s t e r # I n s t r u c t i o n D a t a m e m o r y m e m o r y R e g i s t e r # D a t a Introduction • Typical MIPS implementation includes the following components: • For every instruction, the first two steps are the same: • Instruction Fetch  Fetch instruction from memory @PC • Read Registers  Select which register(s) to read (for loads • stores, and immediate ops, only read one register) • Use ALU to  - Calculate Address (mem-ref instructions) • - Execute operations (arithmetic-logic) • - Compare registers (branches)
Introduction • If instruction is arithmetic-logical, the result from the ALU is written to a register. • If instruction is a load/store, use a path from memory to registers (for reading memory) and from registers to memory (for writing memory). • Branches will use the ALU output to determine the next instruction. We’ll look at more details later. • Clocking Methodologies • It is important to understand logic implementations and clocking • when designing machines. • We’ll introduce some terminology used for this next lecture, • most of which involves understanding of combinational logic.
Timing Considerations • Clocking Methodologies • It is important to understand logic implementations and clocking • when designing machines. A clocking methodology defines • when signals can be read and written. • Some Terminology: • - a “logically asserted” signal indicates a logic “true” • - To assert indicates a signal should be driven to “true” • Any processor consists of two types of elements: Combinational Elements Given a set of inputs, they produce the same set of outputs for each Execution  No internal storage (e.g. ALU) Sequential or State Elements Has internal storage which allows values to be saved and synchronized (e.g. register file, instruction and data memories)
falling edge cycle time rising edge State Elements • A state element has at least two inputs (value to be written and clock) and one output (value that was written from earlier clock cycle). • For edge-triggered clocking methodologies values that are stored in the machine are updated on a clock edge.
State Elements State Element 1 State Element 2 Combinational Logic Clock Cycle • Combinational logic elements must have their data coming from state elements. • Inputs are values written in the previous clock cycle; outputs are values that can be used in the following clock cycle. A clocked system is also called a synchronous system wherein the signals that are written into state elements must be valid when the active clock edge occurs. (i.e. a signal is valid if it is stable or unchanging)
State Elements R Q __ Q S • How do we construct state elements? - We use latches and flip-flops. - Latches: State changes whenever input changes, and the clock is asserted. - Flip-flops: State changes only on a clock edge. • The simplest memory elements are unclocked which means that they don’t have any clock input. • Example: The set-reset (S-R) latch has an output that depends on present and past inputs (not on clock signal).
C Q __ Q D D Latch • In computers we use clocked memory storage elements. • In particular, we use the D latch and D flip-flops. • Consider the D Latch: • Two inputs: • the data value to be stored (D) • the clock signal (C) indicating when to read & store D • Two outputs: • the value of the internal state (Q) and it's complement • We use flip-flops to build registers, which become the basic building blocks of smaller memories.
D Latch C Q __ Q D When the clock input, C, is asserted, the latch is open and the Q output assumes the value of the D input: (Logical Equation)
D Q __ Q C D Flip-flop • Falling edge-triggered flip-flop where output changes only on the clock edge
Register File R e a d r e g i s t e r R e a d n u m b e r 1 d a t a 1 R e a d r e g i s t e r n u m b e r 2 R e g i s t e r f i l e W r i t e r e g i s t e r R e a d d a t a 2 W r i t e d a t a W r i t e • The register file contains a set of registers that can be read and written by supplying a register number to be accessed. • The register file could be built using D flip-flops. • In practice, simpler clocked storage elements are used instead • E.g., SRAM cells • Since reading a register does not change the state, we need only supply the register number as input and the output is the data contained in that register.
Register File R e a d r e g i s t e r n u m b e r 1 R e g i s t e r 0 R e g i s t e r 1 M u R e a d d a t a 1 x R e g i s t e r n – 1 R e g i s t e r n R e a d r e g i s t e r n u m b e r 2 M u R e a d d a t a 2 x • The read port can be implemented using a pair of multiplexors:
Register File W r i t e C 0 R e g i s t e r 0 1 D n - t o - 1 C R e g i s t e r n u m b e r d e c o d e r R e g i s t e r 1 D n – 1 n C R e g i s t e r n – 1 D C R e g i s t e r n D R e g i s t e r d a t a • Writing to a register is a little more complicated. • In the write port, we use a decoder to determine which register to write to. • When the ‘write’ signal is asserted, the “clock” input to only the selected register is asserted. • An active edge on the C input only occurs for the selected register.
Building the Datapath P C S r c M A d d u x A L U A d d 4 r e s u l t S h i f t l e f t 2 R e g i s t e r s A L U o p e r a t i o n 3 R e a d M e m W r i t e A L U S r c R e a d r e g i s t e r 1 P C R e a d a d d r e s s R e a d M e m t o R e g d a t a 1 Z e r o r e g i s t e r 2 I n s t r u c t i o n A L U A L U R e a d W r i t e R e a d A d d r e s s r e s u l t M d a t a r e g i s t e r d a t a 2 M u I n s t r u c t i o n u x W r i t e m e m o r y D a t a x d a t a m e m o r y W r i t e R e g W r i t e d a t a 3 2 1 6 S i g n M e m R e a d e x t e n d • Our ultimate goal is understand how to build a datapath (i.e. the processor component that performs arithmetic operations)in MIPS hardware as illustrated below.
MUX P C I n s t r u c t i o n a d d r e s s multiplexor I n s t r u c t i o n A d d S u m I n s t r u c t i o n m e m o r y P r o g r a m c o u n t e r M e m W r i t e A L U c o n t r o l 5 3 R e a d r e g i s t e r 1 y I n s t r u c t i o n m e m o r . A d d e r R e a d d a t a 1 5 R e g i s t e r R e a d R e a d A d d r e s s Z e r o d a t a r e g i s t e r 2 n u m b e r s R e g i s t e r s D a t a A L U A L U 1 6 3 2 5 W r i t e S i g n r e s u l t D a t a W r i t e r e g i s t e r e x t e n d R e a d m e m o r y d a t a d a t a 2 W r i t e D a t a Sign Extension Unit d a t a R e g W r i t e U . A L M e m R e a d D a t a m e m o r y u n i t R e g i s t e r s Simple Implementation • These are some of the functional units that we need for our instructions.
P C I n s t r u c t i o n a d d r e s s I n s t r u c t i o n A d d S u m I n s t r u c t i o n m e m o r y P r o g r a m c o u n t e r y I n s t r u c t i o n m e m o r . A d d e r Let’s Design… • Path for instruction fetch and PC increment… 4 32
A L U c o n t r o l 5 3 R e a d r e g i s t e r 1 R e a d d a t a 1 5 R e g i s t e r R e a d Z e r o r e g i s t e r 2 n u m b e r s R e g i s t e r s D a t a A L U A L U 5 W r i t e r e s u l t r e g i s t e r R e a d d a t a 2 W r i t e D a t a d a t a R e g W r i t e U . A L R e g i s t e r s Let’s Design… • Datapath for R-type instructions (add, sub, etc.) Inst[21:25] Inst[16:20] Inst[11:15] Inst[0:31]
MUX multiplexor M e m W r i t e A L U c o n t r o l 5 3 R e a d r e g i s t e r 1 R e a d d a t a 1 5 R e g i s t e r R e a d R e a d A d d r e s s Z e r o d a t a r e g i s t e r 2 n u m b e r s R e g i s t e r s D a t a A L U A L U 5 W r i t e r e s u l t D a t a W r i t e r e g i s t e r R e a d m e m o r y d a t a d a t a 2 W r i t e D a t a d a t a R e g W r i t e U . A L M e m R e a d D a t a m e m o r y u n i t R e g i s t e r s Let’s Design… • Datapath for load word/store word (lw/sw) Inst[21:25] Inst[16:20] Inst[0:15]
MUX P C I n s t r u c t i o n a d d r e s s multiplexor I n s t r u c t i o n A A d d d d S S u u m m I n s t r u c t i o n m e m o r y P r o g r a m c o u n t e r y I n s t r u c t i o n m e m o r . . A A d d d d e e r r Let’s Design… • Datapath for BEQ instructions… 4 Inst[0:15]
P C I n s t r u c t i o n a d d r e s s I n s t r u c t i o n A A d d d d S S u u m m I n s t r u c t i o n m e m o r y P r o g r a m c o u n t e r y I n s t r u c t i o n m e m o r . . A A d d d d e e r r Let’s Design… • Datapath for J (jump) instruction… control MUX 4 Inst[0:15] 32 Inst[0:25] JDest[2:27] Jdest[0:1] 00 JDest[28:31] PC[28:31]
Simple Implementation (10/28) • Recall from last time that we designed a datapath sequence for instruction fetch, R-type instructions (add, sub, and, etc), load and store word instructions, and branch-equal and jump instructions. • We used the following elements for individual designs. • To build a complete datapath, we need to combine the separate datapaths and add some control signals to create a single datapath for instructions.
Simple Implementation • Many instructions use the same functional units in their datapath construction. We can use this information to share datapaths for different instructions. • When we build a single datapath, we can use a mux to select different source inputs. • Consider, the datapath for R-type instructions:
Simple Implementation • The datapath for memory-reference instructions: • We can combine these instructions by using a mux to select which source data to use (either sign-extended input or Read data 2 input. • Also we need a mux to select whether data is written to memory or to a register.
Simple Implementation • If we include the instruction fetch hardware, the modified datapath for R-type and memory-reference instructions is:
Simple Implementation • The datapath hardware implementation for all 3 instruction classes (R-type, memory-references, and branches/jumps) is given as:
Control • We have designed a single datapath for all instructions. How do we determine which instruction gets executed? • We design control units to specify desired instructions. • Recall, that ALU operation has 3 inputs: • To design the ALU control unit, we use as inputs, the function • field of the instruction and a 2-bit control field called ALUOp.
Control R-type Instructions: 4 0 rs rt rd shamt funct Bits 31-26 25-21 20-16 15-11 10-6 5-0 Load or Store Instruction: 35 or 43 rs rt address Bits 31-26 25-21 20-16 15-0 Branch (beq) Instruction: rs rt address Bits 31-26 25-21 20-16 15-0 • Recall that the instruction formats for the 3 different instruction • classes are:
Control • The figure illustrates the ALU control unit with Instruction • bits 5-0 identified as the function field for R-type instructions as input to the ALU Control.
Control • The following table illustrates how to set the ALU inputs for desired instructions. • Note that the ALUOp bits are determined by the main control unit, but in general • for loads/stores (00), beq (01), and R-type instructions (10), which indicates that • the operation is encoded in the function field. • Only for ALUOp=10, is the function field used to determine the desired ALU • action.
Control • We must generate a mapping of the 2-bit ALUOp and the 6-bit function • code inputs of the ALU control unit to the 3-bit ALU operation. • We can use a truth table. Noting that a ’11’ ALUOp is not used so we can • substitute a don’t care entry: • From this truth table, we can generate a hardware implementation of the • ALU Control unit using basic logic gates.
Control Let’s consider some examples: Given the following instruction: lw $s3, 10($s2) i) Identify the machine code for this instruction. ii) Determine the 2-bit ALUOp, 6-bit Function Field, and the 4-bit ALU operation for this instruction. iii) Using the given figure, identify the appropriate datapath for the given signal.
Opcode ALUOp Instruction Operation Function Field Desired ALU Action ALU Control LW 00 Load word xxxxxx=001010 add 010 (decimal) (binary) 100011 35 10010 18 10011 19 0000 0000 0000 1010 10 Example 1 There were 9 different examples (a) – (i). We’ll look at 4 of them. Example 1:lw $s3, 10($s2) • Identify the machine code for this instruction. ii) Determine the 2-bit ALUOp, 6-bit Function Field, and the 3-bit ALU operation for this instruction
Example 1 Example 1:lw $s3, 10($s2) iii) Using the figure given, identify the correct datapath for this instruction. Control Signals: RegDst = 1 ALUOp = 00 ALUSrc = 0 MemtoReg = 1 PCSrc = 1
Opcode ALUOp Instruction Operation Function Field Desired ALU Action ALU Control Example 2 ADDI 00 Add immediate* xxxxxx=010000 add 010 001000 8 10001 17 10010 18 0000 0000 1001 0000 144 Example 2:addi $s1, $s2, 144 • Identify the machine code for this instruction. ii) Determine the 2-bit ALUOp, 6-bit Function Field, and the 3-bit ALU operation for this instruction (decimal) (binary)
Example 2 Example 2:addi $s1, $s2, 144 iii) Using the figure given, identify the correct datapath for this instruction. Control Signals: RegDst = 1 ALUOp = 00 ALUSrc = 0 MemtoReg = 0 PCSrc = 1
Opcode ALUOp Instruction Operation Function Field Desired ALU Action ALU Control Example 3 R-type 10 Subtract 100010 Sub 110 0 000000 01100 20 01001 9 10010 18 0 00000 34 100010 Example 3:sub $s2, $s4, $t1 • Identify the machine code for this instruction. ii) Determine the 2-bit ALUOp, 6-bit Function Field, and the 3-bit ALU operation for this instruction (decimal) (binary)
Example 3 Example 3:sub $s2, $s4, $t1 • Using the figure given, identify the correct datapath for this instruction. Control Signals: RegDst = 0 ALUOp = 10 ALUSrc = 1 MemtoReg = 0 PCSrc = 1
Opcode ALUOp Instruction Operation Function Field Desired ALU Action ALU Control Example 4 BEQ 01 Branch on equal xxxxxx=110000 Sub 110 000100 4 10000 16 10001 17 0111 0101 0011 0000 30000 Example 4:beq $s0, $s1, exit # assume ‘exit’ is located at 30,000 • Identify the machine code for this instruction. ii) Determine the 2-bit ALUOp, 6-bit Function Field, and the 3-bit ALU operation for this instruction (decimal) (binary)
Example 4 Example 4:beq $s0, $s1, exit # assume ‘exit’ is located at 30,000 iii) Using the figure given, identify the correct datapath for this instruction. Control Signals: RegDst = 1 ALUOp = 01 ALUSrc = 1 MemtoReg = X (set from previous instruction) PCSrc = 1, if branch not taken = 0, if branch taken
Main Control Unit (11/02) • We are now ready to discuss that main control unit…
Main Control Unit • Consider the control signals for the main control unit. • There are 7 control signals that can be set in the main control unit (9 in all, if we include the 2-bit ALUOp)
Main Control Unit • All but one of the control signals are completely determined by the opcode [bits 31-26]. Do you know which one? • The following table illustrates the truth table for the control signals for different instruction classes:
Main Control Unit • Note that Instruction bits [31-26] is the input for the main control unit. • Also note that for branches, if the zero detect signal is asserted then the PC is updated with the branch target address, hence the need for the AND gate.
Main Control Unit • Since the opcode completely characterizes the control unit (with the exception of the PCSrc signal), we can create a truth table that maps the opcode into control signals.
Logic for Control Units • Simple combinational logic (truth tables)
Single Cycle Implementation • Recall the basic implementation of a single-cycle datapath implementation as given below.
Single-cycle versus Multicycle • Remember that single-cycle hardware implementations have many drawbacks • including • (1) Functional Unit Delay increases as program complexity increases • (2) Violation of Design Principle 1 – Simplicity favors regularity • (3) Inefficient in performance, cost, and hardware utilization • - Multiple redundant memory units, adders, etc. • There are two main alternates to single-cycle implementations: • multicycle implementation • pipelining (will cover later, if time) • Multicycle implementations improve performance by breaking instructions into short “steps” each of which is executed in a shorter clock cycle. • - Instructions that require fewer steps can then finish in less time. • We must now define what the “steps” of the instruction are...
Multicycle Implementation (11/04) • Here is a basic datapath for a multicycle implementation. • Control signals are omitted for now, for simplicity. Intra-cycle logic I n s t r u c t i o n r e g i s t e r D a t a P C A d d r e s s A R e g i s t e r # I n s t r u c t i o n A L U A L U O u t M e m o r y R e g i s t e r s o r d a t a R e g i s t e r # M e m o r y d a t a B D a t a r e g i s t e r R e g i s t e r # Cycles#1,4 Cycles#2,5 Cycles#1,2,3 Inter-cycle clocked registers