290 likes | 508 Vues
CPU Design Steps. 1. Analyze instruction set operations using independent RTN => datapath requirements. 2. Select set of datapath components & establish clock methodology. 3. Assemble datapath meeting the requirements.
E N D
CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements. 2. Select set of datapath components & establish clock methodology. 3. Assemble datapath meeting the requirements. 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic.
=> Instruction Set Architecture processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates CPU Design & Implantation Process • Bottom-up Design: • Assemble components in target technology to establish critical timing. • Top-down Design: • Specify component behavior from high-level requirements. • Iterative refinement: • Establish a partial solution, expand and improve.
Instruction<31:0> Inst Memory <0:15> <16:20> <11:15> <21:25> Adr Rs Rt Rd Imm16 RegDst nPC_sel ALUctr MemWr MemtoReg Equal Rt Rd 0 1 Rs Rt 4 RegWr 5 5 5 busA Adder Rw Ra Rb = busW 00 32 32 32-bit Registers ALU 0 32 busB Mux 32 0 PC 32 Mux Mux Clk 32 Adder WrEn Adr 1 Clk 1 Data In Data Memory Extender imm16 PC Ext 32 16 imm16 Clk ExtOp ALUSrc Single Cycle MIPS Datapath: CPI = 1, Long Clock Cycle
Drawback of Single Cycle Processor • Long cycle time. • All instructions must take as much time as the slowest: • Cycle time for load is longer than needed for all other instructions. • Real memory is not as well-behaved as idealized memory • Cannot always complete data access in one (short) cycle.
Main Control op ALU control fun ALUSrc Equal ExtOp MemRd MemWr MemWr RegWr RegDst nPC_sel ALUctr Reg. Wrt ALU Register Fetch Ext Mem Access PC Instruction Fetch Next PC Result Store Data Mem Abstract View of Single Cycle CPU
Arithmetic & Logical PC Inst Memory Reg File ALU setup mux mux Load PC Inst Memory Reg File ALU Data Mem setup mux mux Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Single Cycle Instruction Timing
Reducing Cycle Time: Multi-Cycle Design • Cut combinational dependency graph by inserting registers / latches. • The same work is done in two or more fast cycles, rather than one slow cycle. storage element storage element Acyclic Combinational Logic (A) Acyclic Combinational Logic => storage element Acyclic Combinational Logic (B) storage element storage element
Clk . . . . . . . . . . . . Clock Cycle Time & Critical Path • Critical path: the slowest path between any two storage devices • Cycle time is a function of the critical path • must be greater than: • Clock-to-Q + Longest Path through the Combination Logic + Setup
Instruction Fetch Next Instruction Instruction Decode Execute Result Store Instruction Processing Cycles } Obtain instruction from program storage Common steps for all instructions Update program counter to address of next instruction Determine instruction type Obtain operands from registers Compute result value or status Store result in register/memory if needed (usually called Write Back).
Partitioning The Single Cycle Datapath Add registers between smallest steps MemWr MemWr MemRd RegWr RegDst ExtOp nPC_sel ALUSrc ALUctr Reg. File Exec Operand Fetch Instruction Fetch Mem Access PC Next PC Result Store Data Mem
MemToReg RegWr RegDst MemRd MemWr nPC_sel ALUctr ExtOp ALUSrc Equal Reg. File Ext ALU A Reg File R PC IR Next PC B Mem Access M Data Mem Instruction Fetch Result Store Operand Fetch Example Multi-cycle Datapath Registers added: IR: Instruction register A, B: Two registers to hold operands read from register file. R: or ALUOut, holds the output of the ALU M: or Memory data register (MDR) to hold data read from data memory
Operations In Each Cycle Logic Immediate IR ¬ Mem[PC] A ¬ R[rs] R ¬ A OR ZeroExt[imm16] R[rt] ¬ R PC ¬ PC + 4 Store IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] R ¬ A + SignEx(Im16) Mem[R] ¬ B PC ¬ PC + 4 Load IR ¬ Mem[PC] A ¬ R[rs] R ¬ A + SignEx(Im16) M ¬ Mem[R] R[rd] ¬ M PC ¬ PC + 4 R-Type IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] R ¬ A + B R[rd] ¬ R PC ¬ PC + 4 Branch IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] If Equal = 1 PC ¬ PC + 4 + (SignExt(imm16) x4) else PC ¬ PC + 4 Instruction Fetch Instruction Decode Execution Memory Write Back
inputs (conditions) Next State Logic Control State Output Logic outputs (control points) Finite State Machine (FSM) Control Model • State specifies control points for Register Transfer. • Transfer occurs upon exiting state (same falling edge). State X Register Transfer Control Points Depends on Input
“instruction fetch” IR ¬ MEM[PC] “decode / operand fetch” A ¬ R[rs] B ¬ R[rt] LW BEQ & Equal R-type ORi SW BEQ & ~Equal PC ¬ PC + SX || 00 PC ¬ PC + 4 Execute R ¬ A fun B R ¬ A or ZX R ¬ A + SX R ¬ A + SX M ¬ MEM[R] MEM[R] ¬ B PC ¬ PC + 4 Memory R[rd] ¬ R PC¬ PC + 4 R[rt] ¬ R PC ¬ PC + 4 R[rt] ¬ M PC ¬ PC + 4 Write-back Control Specification For Multi-cycle CPUFinite State Machine (FSM) To instruction fetch To instruction fetch To instruction fetch
Traditional FSM Controller next state control points state op cond Truth or Transition Table next State control points 11 Equal 6 State 4 To datapath op datapath State
Traditional FSM Controller datapath + state diagram => control • Translate RTN statements into control points. • Assign states. • Implement the controller.
IR ¬ MEM[PC] 0000 “instruction fetch” imem_rd, IRen “decode / operand fetch” A ¬ R[rs] B ¬ R[rt] 0001 Aen, Ben ALUfun, Sen LW BEQ & Equal R-type ORi SW BEQ & ~Equal PC ¬ PC + SX || 00 0010 R ¬ A + SX 1011 R ¬ A + SX 1000 R ¬ A fun B 0100 R ¬ A or ZX 0110 PC ¬ PC + 4 0011 Execute RegDst, RegWr, PCen M ¬ MEM[S] 1001 MEM[S] ¬ B PC ¬ PC + 4 1100 Memory R[rt] ¬ R PC ¬ PC + 4 0111 R[rd] ¬ R PC¬ PC + 4 0101 R[rt] ¬ M PC ¬ PC + 4 1010 Write-back Mapping RTNs To Control Points Examples& State Assignments To instruction fetch state 0000 To instruction fetch state 0000 To instruction fetch state 0000
Detailed Control Specification State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 1 0001 BEQ 0 0011 1 1 0001 BEQ 1 0010 1 1 0001 R-type x 0100 1 1 0001 orI x 0110 1 1 0001 LW x 1000 1 1 0001 SW x 1011 1 1 0010 xxxxxx x 0000 1 1 0011 xxxxxx x 0000 1 0 0100 xxxxxx x 0101 0 1 fun 1 0101 xxxxxx x 0000 1 0 0 1 1 0110 xxxxxx x 0111 0 0 or 1 0111 xxxxxx x 0000 1 0 0 1 0 1000 xxxxxx x 1001 1 0 add 1 1001 xxxxxx x 1010 1 0 0 1010 xxxxxx x 0000 1 0 1 1 0 1011 xxxxxx x 1100 1 0 add 1 1100 xxxxxx x 0000 1 0 0 1 BEQ R ORI LW SW
PCWr PCWrCond PCSrc BrWr Zero ALUSelA Target IorD MemWr IRWr RegDst RegWr 1 32 Mux 32 PC 0 0 Zero 32 Rs Mux Ra 0 32 RAdr 5 32 Rt Mux Rb busA 1 32 ALU Ideal Memory 32 Reg File 5 32 Instruction Reg ALU Out 0 1 4 Rt 0 Rw 32 Mux WrAdr 32 1 32 Rd 32 Din Dout busW busB 32 1 2 32 ALU Control Mux 1 0 3 << 2 Extend Imm 16 32 ALUOp ALUSelB ExtOp MemtoReg Alternative Multiple Cycle Datapath (In Textbook) • Miminizes Hardware: 1 memory, 1 adder
Alternative Multiple Cycle Datapath (In Textbook) • Shared instruction/data memory unit • A single ALU shared among instructions • Shared units require additional or widened multiplexors • Temporary registers to hold data between clock cycles of the instruction: • Additional registers: Instruction Register (IR), • Memory Data Register (MDR), A, B, ALUOut
Logic Immediate IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A OR ZeroExt[imm16] R[rt] ¬ ALUout Store IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + SignEx(Im16) Mem[ALUout] ¬ B Load IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + SignEx(Im16) M ¬ Mem[ALUout] R[rd] ¬ Mem R-Type IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + B R[rd] ¬ ALUout Branch IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) If Equal = 1 PC ¬ ALUout Instruction Fetch Instruction Decode Execution Memory Write Back Operations In Each Cycle
High-Level View of Finite State Machine Control • First steps are independent of the instruction class • Then a series of sequences that depend on the instruction opcode • Then the control returns to fetch a new instruction. • Each box above represents one or several state.
Branch Instruction Single State Jump Instruction Single State
If A = B then PC ¬ ALUout 0010 Finite State Machine (FSM) Specification IR ¬ MEM[PC] PC ¬ PC + 4 “instruction fetch” 0000 A ¬ R[rs] B ¬ R[rt] ALUout¬ PC +SX “decode” 0001 R-type BEQ LW ORi SW ALUout¬ A fun B ALUout¬ A op ZX ALUout¬ A + SX ALUout¬ A + SX Execute 1000 0100 0110 1011 M ¬MEM[ALUout] Memory MEM[ALUout] ¬ B To instruction fetch 1001 1100 R[rd] ¬ ALUout R[rt] ¬ ALUout Write-back R[rt] ¬ M 0101 0111 1010 To instruction fetch To instruction fetch
MIPS Multi-cycle Datapath Performance Evaluation • What is the average CPI? • State diagram gives CPI for each instruction type • Workload below gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 Better than CPI = 5 if all instructions took the same number of clock cycles (5).