180 likes | 315 Vues
This chapter delves into the architecture of single-cycle and multicycle CPUs, focusing on performance metrics like CPI and clock cycles. It explains how the longest path impacts clock speed, with loads requiring multiple functional units leading to inefficiencies. The advantages of pipelining and breaking down instruction execution into multiple cycles are explored. Register management, memory access, and control signals in CPU operations are analyzed, highlighting the necessity of finite state machines (FSMs) for control unit design. Finally, it considers instruction extension and CPI calculation.
E N D
Chapter 5: Datapath and Control(Part 3) CS 447 Jason Bakos
Single-Cycle CPU • CPI of the single cycle CPU from the last lecture had a CPI of 1 • Clock cycle is determined by the longest possible path in the machine • loads are the worst – they use 5 functional units in series • Performance, utilization, and efficiency are not going to be good, because most instructions don’t need such a long clock cycle • A variable-speed clock could be used to solve this problem, but hinders parallelism • Pipelining overlaps instruction executions
Multicycle Implementation • Break instructions into steps, where each step requires one clock cycle • We want to reuse functional units within an instruction instead of just across instructions • Reduces hardware • Use single memory for instructions and data • Single ALU instead of one ALU and two adders • Add registers to functional units to hold intermediate results (state data) for future cycles • Use within instruction executions • Register file and memory hold state data to be used across instruction executions • These are programmer-visible • We will need a FSM to control CPU
Registers • Locations of registers is determined by the following: • What combinatorial units will fit in one clock cycles • Assume memory access, regfile access (two reads or one write), or ALU operation • Any data needed by these operations must be stored in a temporary register • Instruction Register, Memory Data Register, A, B, and ALUOut registers added to design • All these except IR only need to hold data between two adjacent clock cycles • What data are needed in later cycles implementing the instruction
Multiplexors • Need to add extra multiplexors (or expand existing muxes) to facilitate the reuse of the ALU within instructions • Add mux to first ALU input • Expand mux to second ALU input
Breaking Instruction Execution into Clock Cycles • Goal is to balance the latency of the operations performed during each clock cycle • At most one of the following can occur in series: • One ALU operation • One register file access (or multiple in parallel) • One memory access (this is a joke, but we’ll accept this for now)
Execution Stages • In order to clearly define the CPU operation for each step in the operation, we’ll use RTL (register transfer language) • Architecture research has defined 5 standard phases of instruction execution • Instruction fetch • Decode • Fetch register values from register file • Execute • Perform arithmetic/logic operation • Memory • Load/Store memory • Write back • Write register result back to register file
Execution Stages • Fetch • IR=Memory[PC] • PC=PC+4 • Decode • A=Reg[IR[25..21]] • B=Reg[IR[20..16]] • ALUOut=PC+(sign_extend(IR[15..0]) << 2
Execution Stages • Execute • Memory access • ALUOut=A+sign_extend(IR[15..0]) • R-type • ALUOut=A op B • Branch (beq) • if (A==B) PC=ALUOut • PC=PC[31..28] || (IR[25..0]<<2)
Execution Stages • Memory Access/Write Back • Load • MDR=Memory[ALUOut] • Store • Memory[ALUOut]=B • R-type • Reg[IR[15..11]]=ALUOut • Memory Read Completion • Load • Reg[IR[20..16]]=MDR
Control Signals • Control Unit signals • Refer to figure 5.34 (pg. 384) in the book • ALU Control signals • Provide an appropriate ALUOp signal based on what the ALU is being used for (if for an R-type, perform lookup based on function code)
Control Signals • All that’s left is for us to build the control unit as a FSM and the ALU control as a lookup table
Control Unit • The fetch and decode stages are the same for every instruction...
Control Unit • Here’s the states and transitions for the memory-reference instructions
Control Unit • Here’s the states and transitions for R-type, branch, and jump instructions
Control Unit • Final control unit FSM...
Problems to Think About • How could we add bne, blt, and bgez instructions to our CPU? • Do do you calculate CPI for our CPU if we are given instruction-type distributions?