1 / 46

CPE 626: Advanced VLSI Design L02

CPE 626: Advanced VLSI Design L02. Department of Electrical and Computer Engineering University of Alabama in Huntsville. Outline. Simple Processor – MU0 Datapath Design Control Logic ALU Design Pipeline Processor – DLX ISA Registers Addressing Modes and Data Types Instruction Format

Télécharger la présentation

CPE 626: Advanced VLSI Design L02

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPE 626: Advanced VLSI DesignL02 Department of Electrical and Computer Engineering University of Alabama in Huntsville

  2. Outline • Simple Processor – MU0 • Datapath Design • Control Logic • ALU Design • Pipeline Processor – DLX • ISA • Registers • Addressing Modes and Data Types • Instruction Format • Instruction Set • Non-pipeline Implementation • Pipeline Implementation UAH-CPE631

  3. MU0 – A Simple Processor • Instruction format • Instruction set UAH-CPE631

  4. MU0 Logic Design • Follow an approach to separate the design into two components • Datapath – all the components carrying, storing or processing bits including the accumulator, program counter, ALU, and instruction register • Control logic – everything that does not fit comfortably into datapath • Datapath design: many ways to do this • Assume that memory access is limiting factor, and assume that memory access will take exactly one clock cycle UAH-CPE631

  5. Program Counter – PC Accumulator - ACC Arithmetic-Logic Unit – ALU Instruction Register Instruction Decode andControl Logic MU0 Datapath Example Follow the principle that the memory will be limiting factor in design: each instruction takes exactly the number of clock cycles defined by the number of memory accesses it must take. Note: We do not have a dedicated PC incrementer! Why? UAH-CPE631

  6. Assume that each instruction starts when it has arrived in the IR Step 1: EX (execute) LDA S: ACC <- Mem[S] STO S: Mem[S] <- ACC ADD S: ACC <- ACC + Mem[S] SUB S: ACC <- ACC - Mem[S] JMP S: PC <- S JGE S: if (ACC >= 0) PC <- S JNE S: if (ACC != 0) PC <- S Step 2: IF (fetch the next instruction) Either PC or the address in the IR is issued to fetch the next instruction address is incremented in the ALU and value saved into the PC Initialization Reset input to start executing instructions from a known address; here it is 000hex provide zero at the ALU output and then load it into the PC register MU0 Datapath Design UAH-CPE631

  7. Control Logic Asel Bsel ACCce (ACC change enable) PCce (PC change enable) IRce (IR change enable) ACCoe (ACC output enable) ALUfs (ALU function select) MEMrq (memory request) RnW (read/write) Ex/ft (execute/fetch) MU0 RTL Organization UAH-CPE631

  8. MU0 control logic UAH-CPE631

  9. LDA S (0000) Ex/ft = 0 Ex/ft = 1 B B+1 UAH-CPE631

  10. STO S (0001) Ex/ft = 0 Ex/ft = 1 x B+1 UAH-CPE631

  11. ADD S (0010) Ex/ft = 0 Ex/ft = 1 A+B B+1 UAH-CPE631

  12. SUB S (0011) Ex/ft = 0 Ex/ft = 1 A-B B+1 UAH-CPE631

  13. JMP S (0100) Ex/ft = 0 B+1 UAH-CPE631

  14. JGE S (0101) Ex/ft = 0, ACC15 = 1 Ex/ft = 0, ACC15 = 0 B+1 B+1 UAH-CPE631

  15. JNE S (0110) Ex/ft = 0, ACCz = 1 Ex/ft = 0, ACCz = 0 B+1 B+1 UAH-CPE631

  16. STP (001) Ex/ft = 0 x UAH-CPE631

  17. Reset Ex/ft = 0 0 UAH-CPE631

  18. ALU functions: A+B, A-B, B, B+1, 0 (used only when reset is active) => 4 functions Aen (enable operand A) Binv (invert operand B) MU0 ALU Design UAH-CPE631

  19. Another Example:DLX Architecture

  20. DLX Registers • GPRs with load-store architecture • GPR: 32 32-bit named R0, R1,... R31, R0=0 • FPR (floating point registers): • single precision:32 32-bit named F0, F1,... F31 (accessed independently) • double precision:16 64-bit named F0, F2,... F30 (accessed in pairs) • Instructions which support transfers between GPRs and FPRs • Other status registers, e.g., floating-point status register (hold information about the results of FP ops) UAH-CPE631

  21. Addressing Modes and Data Types • Immediate with a 16-bit value field • Displacement with a 16-bit displacement • register deferred derived when disp=0 • absolute derived from displacement with R0 • Byte addressable in big-endian with 32-bit address • All memory references are load/store through GPR or FPR and must be aligned • Data types • 8-bit bytes, 16-bit half words (loaded into registers with either zeros or the sign bit replicated to fill 32 bits) • 32-bit integers • 32-bit single precision and 64-bit double-precision for FP UAH-CPE631

  22. 6 5 5 16 Opcode rs1 rd immediate 6 5 5 16 rd Opcode rs1 rs2 func Instruction Formats • I-type: load, store, arithmetic, logic, relational, shift, branch • R-type: arithmetic, logic, relational • J-type: jump, jump and link, trap, return from exception I-type instruction Encodes: Loads and stores of bytes, words, half words All immediates (rdrs1 op immediate) Conditional branch instructions (rs1 is register, rd is unused) Jump register, jump and link register (rd=0, rs=destination, imm.=0) R-type instruction Reg-reg ALU operations: (rdrs1 func rs2); func={add, sub,...} Read/write special registers and moves J-type instruction 6 26 Offset added to PC Opcode Jump and jump and link; Trap and return from exception UAH-CPE631

  23. Instructions for Data Transfers UAH-CPE631

  24. Arithmetic/logical instructions • All ALU instructions are register-register • add, sub, and, or, xor, shift • Immediate forms also available • LHI loads immediate value into most significant 16 bits • R0 used to synthesise other operations • Loading constant is an immediate =>add with R0 as one source • Register-register move is an add with R0 as one source • Compare operations put 1 ("true") in destination if condition is met UAH-CPE631

  25. Arithmetic/logical instructions (cont’d) UAH-CPE631

  26. Control-flow instructions • Jump can use 26-bit signed offset from PC or contents of register • Jump-and-link saves PC in R31 • Conditional branches test source for zero/non-zero and use 16-bit signed offset UAH-CPE631

  27. Floating-point instructions in DLX • Moves between floating point (32-bit) and double-precision (64-bit) registers • Operations: add, subtract, multiply, divide • Also, integer multiply/divide on floating point regs UAH-CPE631

  28. A Simple Implementationof DLX

  29. Instruction Execution • Process of “instruction execution” is usually broken up into stages (“divide and conquer”) • smaller stages are easier to design • easy to optimize (change) one stage without touching the others • 5 main stages for DLX; each stage takes one clock cycle • Instruction Fetch (IF) • Instruction Decode / Register fetch cycle (ID) • Execution / Effective address cycle (EX) • Memory access / Branch completion cycle (MEM) • Write-back cycle (WB) UAH-CPE631

  30. Instruction Fetch (IF) • Send out PC and fetch the instruction from the memory into instruction register (IR) • IR is used to hold the instruction • Increment the PC by 4 to address the next sequential instruction • NPC is used to hold the next sequential address IR  Mem[PC] NPC  PC + 4 UAH-CPE631

  31. Instruction Decode (ID) • Decode the instruction to determine instruction type (Opcode field - 6 ms bits of the instruction) • Read in data from all necessary registers • temporary registers A, B hold outputs of GPR • Imm is used to hold sign-extended lower 16-bits of the IR • decoding is done in parallel with reading registers since these fields are at fixed locations • a register may be read even we do not use it A  Regs[IR6..10] B  Regs[IR11..15] Imm  (IR16)16##IR16..31 UAH-CPE631

  32. Execution [EX] (1/2) • Register-register ALU instruction • ALU performs the operation specified by the opcode on the values in registers A and B;the result is placed in the temporary register ALUOutput • Register-immediate ALU instruction • ALU performs the operation specified by the opcode on the value in register A and on the value in register Imm;the result is placed in the temporary register ALUOutput ALUOutput A op B ALUOutput A op Imm UAH-CPE631

  33. Execution [EX] (2/2) • Memory reference • ALU adds the operands to form effective address and places the result into the temporary register ALUOutput • Branch • ALU adds the NPC to the Imm to compute the address of the branch target • Register A is checked to determine whether the branch is taken (for BEQZ op is “==“; for BNEZ op is “!=“;) • Cond is 1-bit register (1 - branch is taken, 0 - not taken) ALUOutput A + Imm ALUOutput NPC + Imm Cond  (A op 0) UAH-CPE631

  34. Memory access (MEM) • Memory reference • load • store • Branch • if the instruction branches, the PC is replaced with the branch destination; otherwise, it is replaced with NPC LMD  Mem[ALUOutput] Mem[ALUOutput]  B if (cond) PC  ALUOutput else PC  NPC UAH-CPE631

  35. Write-back (WB) • Register-register ALU • Register-immediate ALU • Load instruction Regs[IR16..20]  ALUOutput Regs[IR11..15]  ALUOutput Regs[IR11..15]  LMD UAH-CPE631

  36. Datapath Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC M U X Next SEQ PC Add NPC Zero? 4 RS1 M U X InstructionMemory RS2 A Reg. File IR PC ALU ALUoutput RD M U X B LMD DataMemory M U X Sign Extend Imm Imm WB Data UAH-CPE631

  37. IF IF IF ID ID ID Ex Ex Ex Mem Mem Mem WB WB WB Sequential Execution Time [clocks] 10 5 Ii Ii+1 Ii+2 Instructions Sequential execution for these 3 instructions (Ii, Ii+1, Ii+2) takes 15 clock cycles UAH-CPE631

  38. IF IF IF IF IF ID ID ID ID ID Ex Ex Ex Ex Ex Mem Mem Mem Mem Mem WB WB WB WB WB Pipelined Execution Time [clocks] 10 5 Analogy with automobile assembly line • many steps, each contributing something to the construction of the car • each step operates in parallel with other steps, though on a different car Ii Ii+1 Ii+2 Ii+3 Ii+4 Instructions Pipe stages (segments) Pipelined execution for instructions Ii, Ii+1, and Ii+2 takes 7 clock cycles UAH-CPE631

  39. IF IF IF ID ID ID Ex Ex Ex Mem Mem Mem WB WB WB Pipelining Lessons • Pipelining does not help latency of single instruction, it helps throughputof entire workload • Multiple instructions operating simultaneously using different resources • Potential speedup = Number pipe stages • Time to “fill” pipeline and time to “drain” reduce speedup: 2.15X vs. 5X in this example Time [clocks] 5 Ii Ii+1 Ii+2 Instructions Latency & Throughput: • Latency: ...how long it takes to execute an instruction • Throughput: ...how often an instruction exits the pipeline UAH-CPE631

  40. IF IF IF ID ID ID Ex Ex Ex Mem Mem Mem WB WB WB Pipelining Lessons (cont’d) • Pipeline stages are hooked together => all stages must be ready to proceed at the same time • Machine cycle – the time required between moving an instruction one step down the pipeline (usually one clock cycle) • The length of a machine cycle is determined by the time required for the slowest stage • Unbalanced lengths of pipe stages also reduces speedup Time [clocks] 5 Ii Ii+1 Ii+2 Instructions UAH-CPE631

  41. Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU IM IM IM DM DM DM DM Visualizing Pipeline Time (clock cycles) CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 1 I n s t r. O r d e r IM UAH-CPE631

  42. MEM/WB EX/MEM ID/EX IF/ID Pipeline Datapath Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC M U X Next SEQ PC Add Zero? 4 IR6..10 IR11..15 M U X InstructionMemory IR Reg. File PC ALU M U X DataMemory M U X Sign Extend Imm MEM/WB.IR11..15 or MEM/WB.IR16..20 WB Data UAH-CPE631

  43. Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU IM IM IM IM DM DM DM DM Instruction Flow through Pipeline Regs Time (clock cycles) CC 4 CC 3 CC 1 CC 2 Sub R6,R5,R7 Lw R4,0(R2) Xor R9,R8,R1 Add R1,R2,R3 Nop Add R1,R2,R3 Lw R4,0(R2) Sub R6,R5,R7 Nop Add R1,R2,R3 Nop Lw R4,0(R2) Nop Nop Nop Add R1,R2,R3 UAH-CPE631

  44. DLX Pipeline Definition: IF, ID • Stage IF • IF/ID.IR  Mem[PC]; • if EX/MEM.cond {IF/ID.NPC, PC  EX/MEM.ALUOUT} else {IF/ID.NPC, PC  PC + 4}; • Stage ID • ID/EX.A  Regs[IF/ID.IR6…10]; ID/EX.B  Regs[IF/ID.IR11…15]; • ID/EX.Imm  (IF/ID.IR16)16 ## IF/ID.IR16…31; • ID/EX.NPC  IF/ID.NPC; ID/EX.IR  IF/ID.IR; UAH-CPE631

  45. DLX Pipeline Definition: IE • ALU • EX/MEM.IR  ID/EX.IR; • EX/MEM.ALUOUT  ID/EX.A func ID/EX.B; orEX/MEM.ALUOUT  ID/EX.A func ID/EX.Imm; • EX/MEM.cond  0; • load/store • EX/MEM.IR  ID/EX.IR;EX/MEM.B  ID/EX.B; • EX/MEM.ALUOUT  ID/EX.A  ID/EX.Imm; • EX/MEM.cond  0; • branch • EX/MEM.NPC  ID/EX.A  ID/EX.Imm; • EX/MEM.cond  (ID/EX.A func 0); UAH-CPE631

  46. DLX Pipeline Definition: MEM, WB • Stage MEM • ALU • MEM/WB.IR  EX/MEM.IR; • MEM/WB.ALUOUT  EX/MEM.ALUOUT; • load/store • MEM/WB.IR  EX/MEM.IR; • MEM/WB.LMD  Mem[EX/MEM.ALUOUT] orMem[EX/MEM.ALUOUT]  EX/MEM.B; • Stage WB • ALU • Regs[MEM/WB.IR16…20]  MEM/WB.ALUOUT; orRegs[MEM/WB.IR11…15]  MEM/WB.ALUOUT; • load • Regs[MEM/WB.IR11…15]  MEM/WB.LMD; UAH-CPE631

More Related