1 / 52

Single Cycle Datapath

Single Cycle Datapath. Lecture notes from MKP, H. H. Lee and S. Yalamanchili. Reading. Section 4.1-4.4 Appendices C.7, C.8, C.11, D.2 Practice Problems: 1, 4, 6, 9. Introduction. We will examine two MIPS implementations A simplified version  this module

jerold
Télécharger la présentation

Single Cycle Datapath

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Single Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili

  2. Reading • Section 4.1-4.4 • Appendices C.7, C.8, C.11, D.2 • Practice Problems: 1, 4, 6, 9

  3. Introduction • We will examine two MIPS implementations • A simplified version  this module • A more realistic pipelined version • Simple subset, shows most aspects • Memory reference: lw, sw • Arithmetic/logical: add, sub, and, or, slt • Control transfer: beq, j

  4. Instruction Execution • PC  instruction memory, fetch instruction • Register numbers register file, read registers • Depending on instruction class • Use ALU to calculate • Arithmetic result • Memory address for load/store • Branch target address • Access data memory for load/store • PC  An address or PC + 4 An Encoded Program 8d0b0000  014b5020   21080004   2129ffff   1520fffc   000a082a   ….. ….. Address

  5. Basic Ingredients • Include the functional units we need for each instruction – combinational and sequential A L U c o n t r o l 5 3 R e a d r e g i s t e r 1 R e a d d a t a 1 5 R e g i s t e r R e a d Z e r o r e g i s t e r 2 n u m b e r s R e g i s t e r s D a t a A L U A L U 5 W r i t e r e s u l t r e g i s t e r R e a d d a t a 2 W r i t e D a t a d a t a R e g W r i t e a . R e g i s t e r s b . A L U

  6. D Q Clk Clk D Q Sequential Elements (4.2, C.7, C.11) • Register: stores data in a circuit • Uses a clock signal to determine when to update the stored value • Edge-triggered: update when Clk changes from 0 to 1 falling edge rising edge

  7. Clk D Q Write Write D Clk Q Sequential Elements • Register with write control • Only updates on clock edge when write control input is 1 • Used when stored value is required later cycle time

  8. Clocking Methodology • Combinational logic transforms data during clock cycles • Between clock edges • Input from state elements, output to state element • Longest delay determines clock period • Synchronous vs. Asynchronous operation Recall: Critical Path Delay

  9. Register File (C.8) • Built using D flip-flops (remember ECE 2030!)

  10. Register File • Note: we still use the real clock to determine when to write

  11. Building a Datapath (4.3) • Datapath • Elements that process data and addressesin the CPU • Registers, ALUs, mux’s, memories, … • We will build a MIPS datapath incrementally • Refining the overview design

  12. High Level Description Control • Single instruction single data stream model of execution (Remember Flynn’s Taxonomy) • Serial execution model • Commonly known as the von Neumann execution model • Stored program model • Instructions and data share memory Fetch Instructions Execute Instructions Memory Operations Data Streams SISD SIMD Instruction Streams MISD MIMD

  13. Instruction Fetch Increment by 4 for next instruction clk 32-bit register cycle time Start instruction fetch Complete instruction fetch clk

  14. R-Format Instructions • Read two register operands • Perform arithmetic/logical operation • Write register result op rs rt rd shamt funct

  15. Executing R-Format Instructions 3 A L U c o n t r o l 5 R e a d r e g i s t e r 1 R e a d d a t a 1 5 R e a d Z e r o r e g i s t e r 2 A L U A L U 5 W r i t e r e s u l t r e g i s t e r R e a d d a t a 2 W r i t e d a t a R e g W r i t e op rs rt rd shamt funct

  16. Load/Store Instructions • Read register operands • Calculate address using 16-bit offset • Use ALU, but sign-extend offset • Load: Read memory and update register • Store: Write register value to memory op rs rt 16-bit constant

  17. R e a d r e g i s t e r 1 Executing I-Format Instructions M e m W r i t e R e a d r e g i s t e r 2 R e a d A d d r e s s W r i t e d a t a r e g i s t e r D a t a W r i t e m e m o r y d a t a R e g W r i t e 1 6 3 2 S i g n M e m R e a d e x t e n d op rs rt 16-bit constant

  18. Branch Instructions • Read register operands • Compare operands • Use ALU, subtract and check Zero output • Calculate target address • Sign-extend displacement • Shift left 2 places (word displacement) • Add to PC + 4 • Already calculated by instruction fetch op rs rt 16-bit constant

  19. Branch Instructions Justre-routes wires Sign-bit wire replicated op rs rt 16-bit constant

  20. Updating the Program Counter Branch 0 M A d d u x Computation of the branch address A L U 1 4 A d d r e s u l t S h i f t I n s t r u c t i o n [ 2 5 – 2 1 ] R e a d P C I n s t r u c t i o n [ 2 0 – 1 6 ] a d d r e s s I n s t r u c t i o n [ 3 1 – 0 ] loop: beq $t0, $0, exit addi $t0, $t0, -1 lw $a0, arg1($t1) lw $a1, arg2($t2) jal func add $t3, $t3, $v0 addi $t1, $t1, 4 addi $t2, $t2, 4 j loop I n s t r u c t i o n I n s t r u c t i o n [ 1 5 – 1 1 m e m o r y 1 6 3 2 S i g n I n s t r u c t i o n [ 1 5 – 0 ] e x t e n d

  21. Composing the Elements • First-cut data path does an instruction in one clock cycle • Each datapath element can only do one function at a time • Hence, we need separate instruction and data memories • Use multiplexers where alternate data sources are used for different instructions An Encoded Program 014b5020 21080004   2129ffff   1520fffc   000a082a   ….. ….. PC Address

  22. Full Single Cycle Datapath Destination register is “instruction-specific”

  23. 0 4 35 or 43 rs rs rs rt rt rt rd address address shamt funct 31:26 31:26 31:26 25:21 25:21 25:21 20:16 20:16 20:16 15:11 10:6 15:0 15:0 5:0 The Main Control Unit • Control signals derived from instruction R-type Load/Store Branch opcode always read read, except for load write for R-type and load sign-extend and add

  24. ALU Control (4.4, D.2) • ALU used for • Load/Store: Functon= add • Branch: Function = subtract • R-type: Function depends on funct field

  25. ALU Control • Assume 2-bit ALUOp derived from opcode • Combinational logic derives ALU control • How do we turn this description into gates?

  26. A L U c o n t r o l 3 ALUOp Funct field ALU Control ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0 Z e r o A L U A L U 0 0 X X X X X X 010 r e s u l t X 1 X X X X X X 110 1 X X X 0 0 0 0 010 1 X X X 0 0 1 0 110 1 X X X 0 1 0 0 000 1 X X X 0 1 0 1 001 1 X X X 1 0 1 0 111 ALU Controller ALUOp funct = inst[5:0] ALU control lw/sw add beq sub add sub arith and or slt Generated from Decoding inst[31:26] inst[5:0]

  27. ALU Control • Simple combinational logic (truth tables)

  28. Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Datapath With Control (4.5) Use rt not rd

  29. Commodity Processors ARM 7 Single Cycle Datapath

  30. Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Control Unit Signals Inst[31:26] To harness the datapath

  31. Controller Implementation LIBRARY IEEE; USE IEEE.STD_LOGIC_1164.ALL; USE IEEE.STD_LOGIC_ARITH.ALL; USE IEEE.STD_LOGIC_SIGNED.ALL; ENTITY control IS PORT( SIGNAL Opcode : IN STD_LOGIC_VECTOR( 5 DOWNTO 0 ); SIGNAL RegDst : OUT STD_LOGIC; SIGNAL ALUSrc : OUT STD_LOGIC; SIGNAL MemtoReg : OUT STD_LOGIC; SIGNAL RegWrite : OUT STD_LOGIC; SIGNAL MemRead : OUT STD_LOGIC; SIGNAL MemWrite : OUT STD_LOGIC; SIGNAL Branch : OUT STD_LOGIC; SIGNAL ALUop: OUT STD_LOGIC_VECTOR( 1 DOWNTO 0 ); SIGNAL clock, reset : IN STD_LOGIC ); END control;

  32. Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Controller Implementation (cont.) ARCHITECTURE behavior OF control IS SIGNAL R_format, Lw, Sw, Beq : STD_LOGIC; BEGIN -- Code to generate control signals using opcode bits R_format <= '1' WHEN Opcode = "000000" ELSE '0'; Lw <= '1' WHEN Opcode = "100011" ELSE '0'; Sw <= '1' WHEN Opcode = "101011" ELSE '0'; Beq <= '1' WHEN Opcode = "000100" ELSE '0'; RegDst <= R_format; ALUSrc <= Lw OR Sw; MemtoReg <= Lw; RegWrite <= R_format OR Lw; MemRead <= Lw; MemWrite <= Sw; Branch <= Beq; ALUOp( 1 ) <= R_format; ALUOp( 0 ) <= Beq; END behavior; Implementation of each table column

  33. R-Type Instruction

  34. Load Instruction

  35. Branch-on-Equal Instruction

  36. 2 address 31:26 25:0 Implementing Jumps • Jump uses word address • Update PC with concatenation of • Top 4 bits of old PC • 26-bit jump address • 00 • Need an extra control signal decoded from opcode Jump

  37. Datapath With Jumps Added

  38. Energy Behavior combinational activity storage read/write access

  39. Vdd PMOS Vin Vout NMOS Ground Recall Hierarchy of Energy Models Aggregate energy into higher level modules ALU a Aggregate energy expenditure into gate level estimates x b c y Switch level activity (dynamic) and leakage (static) energy costs

  40. A Simple Architecture Energy Model • To a first order, we can use the per-access energy of each major component • Obtain this for a technology generation • Use this per-access energy to compute the energy of each instruction • Note: • This is a high level approximation. The actual physics is more complicated. • However, this useful for several purposes • What components do each instruction exercise?

  41. Example: Updating the PC Branch 0 M A d d u x A L U 1 4 A d d r e s u l t What is the energy cost of this operation? S h i f t R e g W r i t e l e f t 2 I n s t r u c t i o n [ 2 5 – 2 1 ] R e a d r e g i s t e r 1 R e a d M e m W r i t e R e a d P C d a t a 1 I n s t r u c t i o n [ 2 0 – 1 6 ] a d d r e s s R e a d M e m t o R e g A L U S r c r e g i s t e r 2 Z e r o I n s t r u c t i o n R e a d 0 A L U A L U [ 3 1 – 0 ] 0 R e a d W r i t e d a t a 2 1 A d d r e s s r e s u l t M r e g i s t e r M d a t a u M I n s t r u c t i o n u I n s t r u c t i o n [ 1 5 – 1 1 ] x W r i t e u x m e m o r y R e g i s t e r s x 1 d a t a 1 D a t a 0 W r i t e m e m o r y R e g D s t d a t a 1 6 3 2 S i g n I n s t r u c t i o n [ 1 5 – 0 ] e x t e n d A L U M e m R e a d c o n t r o l I n s t r u c t i o n [ 5 – 0 ] A L U O p

  42. Example: Register Instructions Branch 0 M A d d u x A L U 1 4 A d d r e s u l t What is the energy cost of this operation? S h i f t R e g W r i t e l e f t 2 I n s t r u c t i o n [ 2 5 – 2 1 ] R e a d r e g i s t e r 1 R e a d M e m W r i t e R e a d P C d a t a 1 I n s t r u c t i o n [ 2 0 – 1 6 ] a d d r e s s R e a d M e m t o R e g A L U S r c r e g i s t e r 2 Z e r o I n s t r u c t i o n R e a d 0 A L U A L U [ 3 1 – 0 ] 0 R e a d W r i t e d a t a 2 1 A d d r e s s r e s u l t M r e g i s t e r M d a t a u M I n s t r u c t i o n u I n s t r u c t i o n [ 1 5 – 1 1 ] x W r i t e u x m e m o r y R e g i s t e r s x 1 d a t a 1 D a t a 0 W r i t e m e m o r y R e g D s t d a t a 1 6 3 2 S i g n I n s t r u c t i o n [ 1 5 – 0 ] e x t e n d A L U M e m R e a d c o n t r o l I n s t r u c t i o n [ 5 – 0 ] A L U O p

  43. Example: I-type Instructions Branch 0 M A d d u x A L U 1 4 A d d r e s u l t What is the energy cost of this operation? S h i f t R e g W r i t e l e f t 2 I n s t r u c t i o n [ 2 5 – 2 1 ] R e a d r e g i s t e r 1 R e a d M e m W r i t e R e a d P C d a t a 1 I n s t r u c t i o n [ 2 0 – 1 6 ] a d d r e s s R e a d M e m t o R e g A L U S r c r e g i s t e r 2 Z e r o I n s t r u c t i o n R e a d 0 A L U A L U [ 3 1 – 0 ] 0 R e a d W r i t e d a t a 2 1 A d d r e s s r e s u l t M r e g i s t e r M d a t a u M I n s t r u c t i o n u I n s t r u c t i o n [ 1 5 – 1 1 ] x W r i t e u x m e m o r y R e g i s t e r s x 1 d a t a 1 D a t a 0 W r i t e m e m o r y R e g D s t d a t a 1 6 3 2 S i g n I n s t r u c t i o n [ 1 5 – 0 ] e x t e n d A L U M e m R e a d c o n t r o l I n s t r u c t i o n [ 5 – 0 ] A L U O p

  44. 0 M A d d u x A L U 1 4 A d d r e s u l t S h i f t R e g W r i t e l e f t 2 Branch I n s t r u c t i o n [ 2 5 – 2 1 ] R e a d r e g i s t e r 1 R e a d M e m W r i t e R e a d P C d a t a 1 I n s t r u c t i o n [ 2 0 – 1 6 ] a d d r e s s R e a d M e m t o R e g A L U S r c r e g i s t e r 2 Z e r o I n s t r u c t i o n R e a d 0 A L U A L U [ 3 1 – 0 ] 0 R e a d W r i t e d a t a 2 1 A d d r e s s r e s u l t M r e g i s t e r M d a t a u M I n s t r u c t i o n u I n s t r u c t i o n [ 1 5 – 1 1 ] x W r i t e u x m e m o r y R e g i s t e r s x 1 d a t a 1 D a t a 0 W r i t e m e m o r y R e g D s t d a t a 1 6 3 2 S i g n I n s t r u c t i o n [ 1 5 – 0 ] e x t e n d A L U M e m R e a d c o n t r o l I n s t r u c t i o n [ 5 – 0 ] A L U O p Example: I-Type for Branches What is the energy cost of this operation?

  45. Converting Energy to Power • For this data path, except for data memory, all components are active every cycle, and dissipating energy on every cycle • Later we will see how data paths can be made more energy efficient • Computing power • Compute the total energy consumed over all cycles (instructions) • Divide energy by time to get power in watts Example:

  46. ITRS Roadmap for Logic Devices From: “ExaScaleComputing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

  47. Our Simple Control Structure • All of the logic is combinational • We wait for everything to settle down, and the right thing to be done • ALU might not produce “right answer” right away • we use write signals along with clock to determine when to write • Cycle time determined by length of the longest path We are ignoring some details like setup and holdtimes

  48. Performance Issues • Longest delay determines clock period • Critical path: load instruction • Instruction memory  register file  ALU  data memory  register file • Not feasible to vary period for different instructions • Violates design principle • Making the common case fast • We will improve performance by pipelining

  49. Summary • Single cycle datapath • All instructions execute in one clock cycle • Not all instructions take the same amount of time • Software sees a simple interface • Can memory operations really take one cycle? • Improve performance via pipelining, multi-cycle operation, parallelism or customization • We will address these next

  50. Study Guide • Given an instruction, be able to specify the values of all control signals required to execute that instruction • Add new instructions: modify the datapath and control to affect its execution • E.g., jal, jr, shift, etc. • Modify the VHDL controller • Given delays of various components, determine the cycle time of the datapath • Distinguish between those parts of the datapath that are unique to each instruction and those components that are shared across all instructions

More Related