450 likes | 599 Vues
CS151B Computer Systems Architecture Winter 2002 TuTh 2-4pm - 2444 BH. Lecture 9 Designing a Multicycle Processor. Instructor: Prof. Jason Cong <cong@cs.ucla.edu>. Recap: Processor Design is a Process. Bottom-up assemble components in target technology to establish critical timing
 
                
                E N D
CS151BComputer Systems ArchitectureWinter 2002 TuTh 2-4pm - 2444 BH Lecture 9 Designing a Multicycle Processor Instructor: Prof. Jason Cong <cong@cs.ucla.edu>
Recap: Processor Design is a Process • Bottom-up • assemble components in target technology to establish critical timing • Top-down • specify component behavior from high-level requirements • Iterative refinement • establish partial solution, expand and improve  Instruction Set Architecture processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates Jason Cong
Recap: A Single Cycle Datapath Instruction<31:0> nPC_sel Instruction Fetch Unit Rd Rt <21:25> <16:20> <11:15> <0:15> Clk RegDst 1 0 Mux Rt Rs Rd Imm16 Rs Rt RegWr ALUctr 5 5 5 MemtoReg busA Equal MemWr Rw Ra Rb busW 32 32 32-bit Registers 0 ALU 32 busB 32 0 Clk Mux 32 Mux 32 1 WrEn Adr 1 Data In 32 Data Memory Extender imm16 32 16 Clk ALUSrc ExtOp Jason Cong
RegDst func ALUSrc ALUctr ALU Control (Local) op 6 Main Control : 3 6 ALUop 3 op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 R-type ori lw sw beq jump RegDst 1 0 0 x x x ALUSrc 0 1 1 1 0 x MemtoReg 0 0 1 x x x RegWrite 1 1 1 0 0 0 MemWrite 0 0 0 1 0 0 Branch 0 0 0 0 1 0 Jump 0 0 0 0 0 1 ExtOp x 0 1 1 x x ALUop (Symbolic) “R-type” Or Add Add xxx Subtract ALUop <2> 1 0 0 0 x 0 ALUop <1> 0 1 0 0 x 0 ALUop <0> 0 0 0 0 x 1 Recap: The “Truth Table” for the Main Control Jason Cong
. . . . . . op<5> op<5> op<5> op<5> op<5> op<5> . . . . . . <0> <0> <0> <0> <0> op<0> R-type ori lw sw beq jump RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<0> Recap: PLA Implementation of the Main Control Jason Cong
Recap: Systematic Generation of Control OPcode Control Logic / Store (PLA, ROM) • In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction” • in general, the controller is a finite state machine • microinstruction can also control sequencing (see later) Decode microinstruction Conditions Instruction Control Points Datapath Jason Cong
The Big Picture: Where are We Now? • The Five Classic Components of a Computer • Today’s Topic: Designing the Datapath for the Multiple Clock Cycle Datapath • This lecture and next one slightly different from the book Processor Input Control Memory Datapath Output Jason Cong
Abstract View of our single cycle processor Main Control op • looks like a FSM with PC as state ALU control fun ALUSrc Equal ExtOp MemWr MemWr MemRd RegWr RegDst nPC_sel ALUctr Reg. Wrt ALU Register Fetch Ext Mem Access PC Instruction Fetch Next PC Result Store Data Mem Jason Cong
What’s wrong with our CPI=1 processor? Arithmetic & Logical PC Inst Memory Reg File ALU setup • Long Cycle Time • All instructions take as much time as the slowest • Real memory is not as nice as our idealized memory • cannot always get the job done in one (short) cycle mux mux Load PC Inst Memory Reg File ALU Data Mem setup mux mux Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Jason Cong
Memory Access Time Storage Array • Physics => fast memories are small (large memories are slow) • question: register file vs. memory • => Use a hierarchy of memories selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 time-period 20 - 50 time-periods 2-3 time-periods Jason Cong
storage element Acyclic Combinational Logic storage element Reducing Cycle Time • Cut combinational dependency graph and insert register / latch • Do same work in two fast cycles, rather than one slow one • May be able to short-circuit path and remove some components for some instructions! storage element Acyclic Combinational Logic (A)  storage element Acyclic Combinational Logic (B) storage element Jason Cong
Basic Limits on Cycle Time • Next address logic • PC <= branch ? PC + offset : PC + 4 • Instruction Fetch • InstructionReg <= Mem[PC] • Register Access • A <= R[rs] • ALU operation • R <= A + B Control MemWr MemWr MemRd RegWr RegDst nPC_sel ALUctr ALUSrc ExtOp Reg. File Exec Operand Fetch Instruction Fetch Mem Access PC Next PC Result Store Data Mem Jason Cong
Equal Partitioning the CPI=1 Datapath • Add registers between smallest steps • Place enables on all registers MemWr MemWr MemRd RegWr RegDst nPC_sel ALUSrc ExtOp ALUctr Reg. File Exec Operand Fetch Instruction Fetch Mem Access PC Next PC Result Store Data Mem Jason Cong
MemToReg RegWr RegDst MemRd MemWr ALUctr ALUSrc ExtOp Reg. File Ext ALU S Mem Access M Data Mem Result Store Example Multicycle Datapath • Critical Path ? Equal nPC_sel E Reg File A PC IR Next PC B Instruction Fetch Operand Fetch Jason Cong
Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control Jason Cong
Time A S B M Step 4: R-rtype (add, sub, . . .) inst Logical Register Transfers ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 • Logical Register Transfer • Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] ADDU A<– R[rs]; B <– R[rt] S <– A + B R[rd] <– S; PC <– PC + 4 E Reg. File Reg File Exec PC IR Next PC Inst. Mem Mem Access Data Mem Jason Cong
Time A S M Step 4: Logical immed inst Logical Register Transfers ORI R[rt] <– R[rs] OR ZExt(Im16); PC <– PC + 4 • Logical Register Transfer • Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] ORI A<– R[rs]; B <– R[rt] S <– A or ZExt(Im16) R[rt] <– S; PC <– PC + 4 E Reg. File Reg File Exec PC IR Next PC Inst. Mem B Mem Access Data Mem Jason Cong
inst Physical Register Transfers IR <– MEM[pc] LW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16) M <– MEM[S] R[rd] <– M; PC <– PC + 4 Time A S M Step 4 : Load inst Logical Register Transfers LW R[rt] <– MEM[R[rs] + SExt(Im16)]; PC <– PC + 4 • Logical Register Transfer • Physical Register Transfers E Reg. File Reg File Exec PC IR Next PC Inst. Mem B Mem Access Data Mem Jason Cong
Time A S M Step 4 : Store inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] <– R[rt]; PC <– PC + 4 • Logical Register Transfer • Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] SW A<– R[rs]; B <– R[rt] S <– A + SExt(Im16); MEM[S] <– B PC <– PC + 4 E Reg. File Reg File Exec PC IR Next PC Inst. Mem B Mem Access Data Mem Jason Cong
Time S M Step 4 : Branch inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 • Logical Register Transfer • Physical Register Transfers inst Physical Register Transfers IR <– MEM[pc] BEQE<– (R[rs] = R[rt]) if E then PC <– PC + 4 else PC <–PC+4+SExt(Im16)||00 E Reg. File Reg File A Exec PC IR Next PC Inst. Mem B Mem Access Data Mem Jason Cong
Target 32 0 Mux 0 Mux 1 0 1 Mux 32 1 ALU Control Mux 1 0 << 2 Extend 16 Alternative datapath (book): Multiple Cycle Datapath PCWr PCWrCond PCSrc BrWr • Miminizes Hardware: 1 memory, 1 adder Zero ALUSelA IorD MemWr IRWr RegDst RegWr 1 Mux 32 PC 0 Zero 32 Rs Ra 32 RAdr 5 32 Rt Rb busA 32 ALU Ideal Memory 32 Reg File 5 32 Instruction Reg ALU Out 4 Rt 0 Rw 32 WrAdr 32 1 32 Rd Din Dout busW busB 32 2 32 3 Imm 32 ALUOp ExtOp MemtoReg ALUSelB Jason Cong
Our Control Model • State specifies control points for Register Transfer • Transfer occurs upon exiting state (same falling edge) inputs (conditions) Next State Logic State X Register Transfer Control Points Control State Depends on Input Output Logic outputs (control points) Jason Cong
Execute Memory Write-back Step 4  Control Specification for multicycle proc “instruction fetch” IR <= MEM[PC] “decode / operand fetch” A <= R[rs] B <= R[rt] LW R-type ORi SW BEQ PC <= Next(PC,Equal) S <= A fun B S <= A or ZX S <= A + SX S <= A + SX M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 Jason Cong
Traditional FSM Controller next state state op cond control points Truth Table next State control points 11 Equal 6 State 4 op datapath State Jason Cong
Step 5  (datapath + state diagram control) • Translate RTs into control points • Assign states • Then go build the controller Jason Cong
Execute Memory Write-back Mapping RTs to Control Points IR <= MEM[PC] “instruction fetch” imem_rd, IRen A <= R[rs] B <= R[rt] “decode” Aen, Ben, Een LW R-type ORi SW BEQ S <= A fun B PC <= Next(PC,Equal) S <= A or ZX S <= A + SX S <= A + SX ALUfun, Sen M <= MEM[S] MEM[S] <= B PC <= PC + 4 R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 Jason Cong
Execute Memory Write-back Assigning States “instruction fetch” IR <= MEM[PC] 0000 “decode” A <= R[rs] B <= R[rt] 0001 LW R-type ORi SW BEQ PC <= Next(PC) S <= A fun B S <= A or ZX S <= A + SX S <= A + SX 0100 0110 1000 1011 0011 M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 0101 0111 1010 Jason Cong
(Mostly) Detailed Control Specification (missing0) State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B E Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 1 0001 BEQ x 0011 1 1 1 0001 R-type x 0100 1 1 1 0001 ORI x 0110 1 1 1 0001 LW x 1000 1 1 1 0001 SW x 1011 1 1 1 0011 xxxxxx 0 0000 1 0 x 0 x 0011 xxxxxx 1 0000 1 1 x 0 x 0100 xxxxxx x 0101 0 1 fun 1 0101 xxxxxx x 0000 1 0 0 1 1 0110 xxxxxx x 0111 0 0 or 1 0111 xxxxxx x 0000 1 0 0 1 0 1000 xxxxxx x 1001 1 0 add 1 1001 xxxxxx x 1010 1 0 1 1010 xxxxxx x 0000 1 0 1 1 0 1011 xxxxxx x 1100 1 0 add 1 1100 xxxxxx x 0000 1 0 0 1 0 -all same in Moore machine BEQ: R: ORi: LW: SW: Jason Cong
Performance Evaluation • What is the average CPI? • state diagram gives CPI for each instruction type • workload gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 Jason Cong
sequencer control datapath control microinstruction micro-PC sequencer Controller Design • The state digrams that arise define the controller for an instruction set processor are highly structured • Use this structure to construct a simple “microsequencer” • Control reduces to programming this very simple device  microprogramming Jason Cong
Example: Jump-Counter i i 0000 i+1 Map ROM None of above: Do nothing (for wait states) op-code zero inc load Counter Jason Cong
Execute Memory Write-back Using a Jump Counter “instruction fetch” IR <= MEM[PC] 0000 inc “decode” A <= R[rs] B <= R[rt] 0001 load LW R-type ORi SW BEQ PC <= Next(PC) S <= A fun B S <= A or ZX S <= A + SX S <= A + SX 0100 0110 1000 1011 0011 inc inc inc inc zero M <= MEM[S] MEM[S] <= B PC <= PC + 4 1001 1100 inc R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 zero 0101 0111 1010 zero zero zero Jason Cong
Our Microsequencer taken datapath control Z I L Micro-PC op-code Map ROM Jason Cong
Microprogram Control Specification µPC Taken Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ? inc 1 0001 0 load 1 1 0011 0 zero 1 0 0011 1 zero 1 1 0100 x inc 0 1 fun 1 0101 x zero 1 0 0 1 1 0110 x inc 0 0 or 1 0111 x zero 1 0 0 1 0 1000 x inc 1 0 add 1 1001 x inc 1 0 1 1010 x zero 1 0 1 1 0 1011 x inc 1 0 add 1 1100 x zero 1 0 0 1 0 BEQ R: ORi: LW: SW: Jason Cong
Mapping ROM R-type 000000 0100 BEQ 000100 0011 ori 001101 0110 LW 100011 1000 SW 101011 1011 Jason Cong
Example: Controlling Memory PC addr InstMem_rd Instruction Memory IM_wait data Inst. Reg IR_en Jason Cong
Execute Memory Write-back Controller handles non-ideal memory “instruction fetch” IR <= MEM[PC] wait ~wait “decode / operand fetch” A <= R[rs] B <= R[rt] LW R-type ORi SW BEQ PC <= Next(PC) S <= A fun B S <= A or ZX S <= A + SX S <= A + SX M <= MEM[S] MEM[S] <= B ~wait wait wait ~wait R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 PC <= PC + 4 Jason Cong
Really Simple Time-State Control IR <= MEM[PC] instruction fetch wait ~wait A <= R[rs] B <= R[rt] decode LW R-type ORi SW BEQ Execute S <= A fun B S <= A or ZX S <= A + SX S <= A + SX Memory M <= MEM[S] MEM[S] <= B wait wait R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 R[rt] <= M PC <= PC + 4 PC <= Next(PC) write-back PC <= PC + 4 Jason Cong
A S B M Time-state Control Path • Local decode and control at each stage Valid IRex IR IRwb Inst. Mem IRmem WB Ctrl Dcd Ctrl Ex Ctrl Mem Ctrl Equal Reg. File Reg File Exec PC Next PC Mem Access Data Mem Jason Cong
Overview of Control • Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Finite State Diagram Microprogram Sequencing Control Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Representation Logic Equations Truth Tables Implementation PLA ROM Technique “hardwired control” “microprogrammed control” Jason Cong
Summary • Disadvantages of the Single Cycle Processor • Long cycle time • Cycle time is too long for all instructions except the Load • Multiple Cycle Processor: • Divide the instructions into smaller steps • Execute each step (instead of the entire instruction) in one cycle • Partition datapath into equal size chunks to minimize cycle time • ~10 levels of logic between latches • Follow same 5-step method for designing “real” processor Jason Cong
Summary (cont’d) • Control is specified by finite state digram • Specialize state-diagrams easily captured by microsequencer • simple increment & “branch” fields • datapath control fields • Control design reduces to Microprogramming • Control is more complicated with: • complex instruction sets • restricted datapaths (see the book) • Simple Instruction set and powerful datapath  simple control • could try to reduce hardware (see the book) • rather go for speed => many instructions at once! Jason Cong
Where to get more information? • Next two lectures: • Multiple Cycle Controller: Appendix C of your text book. • Microprogramming: Section 5.5 of your text book. • D. Patterson, “Microprograming,” Scientific American, March 1983. • D. Patterson and D. Ditzel, “The Case for the Reduced Instruction Set Computer,” Computer Architecture News 8, 6 (October 15, 1980) Jason Cong
Acknowledgements • The majority of slides in this lecture are from UC Berkeley for their CS152 course (David Patterson, John Kubiatowicz, …) Jason Cong
Results from Mini-Questionaire • What do you think about the pace of the class • Moving too fast: 42% • Moving too slow: 0% • About right: 58% • What do you think about the instructor’s explanation • Too much detail: 19% • Too little detail 23% • About right 58% • Are office hours convenient to you? • Most said yes (77%) • Any other comments about the class so far (e.g. about TA and TA sessions) • TAs and TA sessions are good! • Some don’t’ like Power Point slides • Some want more examples • Ask more questions • … Jason Cong