CPE 626: Advanced VLSI Design L02

CPE 626: Advanced VLSI DesignL02 Department of Electrical and Computer Engineering University of Alabama in Huntsville

Outline • Simple Processor – MU0 • Datapath Design • Control Logic • ALU Design • Pipeline Processor – DLX • ISA • Registers • Addressing Modes and Data Types • Instruction Format • Instruction Set • Non-pipeline Implementation • Pipeline Implementation UAH-CPE631

MU0 – A Simple Processor • Instruction format • Instruction set UAH-CPE631

MU0 Logic Design • Follow an approach to separate the design into two components • Datapath – all the components carrying, storing or processing bits including the accumulator, program counter, ALU, and instruction register • Control logic – everything that does not fit comfortably into datapath • Datapath design: many ways to do this • Assume that memory access is limiting factor, and assume that memory access will take exactly one clock cycle UAH-CPE631

Program Counter – PC Accumulator - ACC Arithmetic-Logic Unit – ALU Instruction Register Instruction Decode andControl Logic MU0 Datapath Example Follow the principle that the memory will be limiting factor in design: each instruction takes exactly the number of clock cycles defined by the number of memory accesses it must take. Note: We do not have a dedicated PC incrementer! Why? UAH-CPE631

Assume that each instruction starts when it has arrived in the IR Step 1: EX (execute) LDA S: ACC <- Mem[S] STO S: Mem[S] <- ACC ADD S: ACC <- ACC + Mem[S] SUB S: ACC <- ACC - Mem[S] JMP S: PC <- S JGE S: if (ACC >= 0) PC <- S JNE S: if (ACC != 0) PC <- S Step 2: IF (fetch the next instruction) Either PC or the address in the IR is issued to fetch the next instruction address is incremented in the ALU and value saved into the PC Initialization Reset input to start executing instructions from a known address; here it is 000hex provide zero at the ALU output and then load it into the PC register MU0 Datapath Design UAH-CPE631

Control Logic Asel Bsel ACCce (ACC change enable) PCce (PC change enable) IRce (IR change enable) ACCoe (ACC output enable) ALUfs (ALU function select) MEMrq (memory request) RnW (read/write) Ex/ft (execute/fetch) MU0 RTL Organization UAH-CPE631

MU0 control logic UAH-CPE631

LDA S (0000) Ex/ft = 0 Ex/ft = 1 B B+1 UAH-CPE631

STO S (0001) Ex/ft = 0 Ex/ft = 1 x B+1 UAH-CPE631

ADD S (0010) Ex/ft = 0 Ex/ft = 1 A+B B+1 UAH-CPE631

SUB S (0011) Ex/ft = 0 Ex/ft = 1 A-B B+1 UAH-CPE631

JMP S (0100) Ex/ft = 0 B+1 UAH-CPE631

JGE S (0101) Ex/ft = 0, ACC15 = 1 Ex/ft = 0, ACC15 = 0 B+1 B+1 UAH-CPE631

JNE S (0110) Ex/ft = 0, ACCz = 1 Ex/ft = 0, ACCz = 0 B+1 B+1 UAH-CPE631

STP (001) Ex/ft = 0 x UAH-CPE631

Reset Ex/ft = 0 0 UAH-CPE631

ALU functions: A+B, A-B, B, B+1, 0 (used only when reset is active) => 4 functions Aen (enable operand A) Binv (invert operand B) MU0 ALU Design UAH-CPE631

Another Example:DLX Architecture

DLX Registers • GPRs with load-store architecture • GPR: 32 32-bit named R0, R1,... R31, R0=0 • FPR (floating point registers): • single precision:32 32-bit named F0, F1,... F31 (accessed independently) • double precision:16 64-bit named F0, F2,... F30 (accessed in pairs) • Instructions which support transfers between GPRs and FPRs • Other status registers, e.g., floating-point status register (hold information about the results of FP ops) UAH-CPE631

Addressing Modes and Data Types • Immediate with a 16-bit value field • Displacement with a 16-bit displacement • register deferred derived when disp=0 • absolute derived from displacement with R0 • Byte addressable in big-endian with 32-bit address • All memory references are load/store through GPR or FPR and must be aligned • Data types • 8-bit bytes, 16-bit half words (loaded into registers with either zeros or the sign bit replicated to fill 32 bits) • 32-bit integers • 32-bit single precision and 64-bit double-precision for FP UAH-CPE631

6 5 5 16 Opcode rs1 rd immediate 6 5 5 16 rd Opcode rs1 rs2 func Instruction Formats • I-type: load, store, arithmetic, logic, relational, shift, branch • R-type: arithmetic, logic, relational • J-type: jump, jump and link, trap, return from exception I-type instruction Encodes: Loads and stores of bytes, words, half words All immediates (rdrs1 op immediate) Conditional branch instructions (rs1 is register, rd is unused) Jump register, jump and link register (rd=0, rs=destination, imm.=0) R-type instruction Reg-reg ALU operations: (rdrs1 func rs2); func={add, sub,...} Read/write special registers and moves J-type instruction 6 26 Offset added to PC Opcode Jump and jump and link; Trap and return from exception UAH-CPE631

Instructions for Data Transfers UAH-CPE631

Arithmetic/logical instructions • All ALU instructions are register-register • add, sub, and, or, xor, shift • Immediate forms also available • LHI loads immediate value into most significant 16 bits • R0 used to synthesise other operations • Loading constant is an immediate =>add with R0 as one source • Register-register move is an add with R0 as one source • Compare operations put 1 ("true") in destination if condition is met UAH-CPE631

Arithmetic/logical instructions (cont’d) UAH-CPE631

Control-flow instructions • Jump can use 26-bit signed offset from PC or contents of register • Jump-and-link saves PC in R31 • Conditional branches test source for zero/non-zero and use 16-bit signed offset UAH-CPE631

Floating-point instructions in DLX • Moves between floating point (32-bit) and double-precision (64-bit) registers • Operations: add, subtract, multiply, divide • Also, integer multiply/divide on floating point regs UAH-CPE631

A Simple Implementationof DLX

Instruction Execution • Process of “instruction execution” is usually broken up into stages (“divide and conquer”) • smaller stages are easier to design • easy to optimize (change) one stage without touching the others • 5 main stages for DLX; each stage takes one clock cycle • Instruction Fetch (IF) • Instruction Decode / Register fetch cycle (ID) • Execution / Effective address cycle (EX) • Memory access / Branch completion cycle (MEM) • Write-back cycle (WB) UAH-CPE631

Instruction Fetch (IF) • Send out PC and fetch the instruction from the memory into instruction register (IR) • IR is used to hold the instruction • Increment the PC by 4 to address the next sequential instruction • NPC is used to hold the next sequential address IR  Mem[PC] NPC  PC + 4 UAH-CPE631

Instruction Decode (ID) • Decode the instruction to determine instruction type (Opcode field - 6 ms bits of the instruction) • Read in data from all necessary registers • temporary registers A, B hold outputs of GPR • Imm is used to hold sign-extended lower 16-bits of the IR • decoding is done in parallel with reading registers since these fields are at fixed locations • a register may be read even we do not use it A  Regs[IR6..10] B  Regs[IR11..15] Imm  (IR16)16##IR16..31 UAH-CPE631

Execution [EX] (1/2) • Register-register ALU instruction • ALU performs the operation specified by the opcode on the values in registers A and B;the result is placed in the temporary register ALUOutput • Register-immediate ALU instruction • ALU performs the operation specified by the opcode on the value in register A and on the value in register Imm;the result is placed in the temporary register ALUOutput ALUOutput A op B ALUOutput A op Imm UAH-CPE631

Execution [EX] (2/2) • Memory reference • ALU adds the operands to form effective address and places the result into the temporary register ALUOutput • Branch • ALU adds the NPC to the Imm to compute the address of the branch target • Register A is checked to determine whether the branch is taken (for BEQZ op is “==“; for BNEZ op is “!=“;) • Cond is 1-bit register (1 - branch is taken, 0 - not taken) ALUOutput A + Imm ALUOutput NPC + Imm Cond  (A op 0) UAH-CPE631

Memory access (MEM) • Memory reference • load • store • Branch • if the instruction branches, the PC is replaced with the branch destination; otherwise, it is replaced with NPC LMD  Mem[ALUOutput] Mem[ALUOutput]  B if (cond) PC  ALUOutput else PC  NPC UAH-CPE631

Write-back (WB) • Register-register ALU • Register-immediate ALU • Load instruction Regs[IR16..20]  ALUOutput Regs[IR11..15]  ALUOutput Regs[IR11..15]  LMD UAH-CPE631

Datapath Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC M U X Next SEQ PC Add NPC Zero? 4 RS1 M U X InstructionMemory RS2 A Reg. File IR PC ALU ALUoutput RD M U X B LMD DataMemory M U X Sign Extend Imm Imm WB Data UAH-CPE631

IF IF IF ID ID ID Ex Ex Ex Mem Mem Mem WB WB WB Sequential Execution Time [clocks] 10 5 Ii Ii+1 Ii+2 Instructions Sequential execution for these 3 instructions (Ii, Ii+1, Ii+2) takes 15 clock cycles UAH-CPE631

IF IF IF IF IF ID ID ID ID ID Ex Ex Ex Ex Ex Mem Mem Mem Mem Mem WB WB WB WB WB Pipelined Execution Time [clocks] 10 5 Analogy with automobile assembly line • many steps, each contributing something to the construction of the car • each step operates in parallel with other steps, though on a different car Ii Ii+1 Ii+2 Ii+3 Ii+4 Instructions Pipe stages (segments) Pipelined execution for instructions Ii, Ii+1, and Ii+2 takes 7 clock cycles UAH-CPE631

IF IF IF ID ID ID Ex Ex Ex Mem Mem Mem WB WB WB Pipelining Lessons • Pipelining does not help latency of single instruction, it helps throughputof entire workload • Multiple instructions operating simultaneously using different resources • Potential speedup = Number pipe stages • Time to “fill” pipeline and time to “drain” reduce speedup: 2.15X vs. 5X in this example Time [clocks] 5 Ii Ii+1 Ii+2 Instructions Latency & Throughput: • Latency: ...how long it takes to execute an instruction • Throughput: ...how often an instruction exits the pipeline UAH-CPE631

IF IF IF ID ID ID Ex Ex Ex Mem Mem Mem WB WB WB Pipelining Lessons (cont’d) • Pipeline stages are hooked together => all stages must be ready to proceed at the same time • Machine cycle – the time required between moving an instruction one step down the pipeline (usually one clock cycle) • The length of a machine cycle is determined by the time required for the slowest stage • Unbalanced lengths of pipe stages also reduces speedup Time [clocks] 5 Ii Ii+1 Ii+2 Instructions UAH-CPE631

Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU IM IM IM DM DM DM DM Visualizing Pipeline Time (clock cycles) CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 1 I n s t r. O r d e r IM UAH-CPE631

MEM/WB EX/MEM ID/EX IF/ID Pipeline Datapath Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC M U X Next SEQ PC Add Zero? 4 IR6..10 IR11..15 M U X InstructionMemory IR Reg. File PC ALU M U X DataMemory M U X Sign Extend Imm MEM/WB.IR11..15 or MEM/WB.IR16..20 WB Data UAH-CPE631

Reg Reg Reg Reg Reg Reg Reg Reg ALU ALU ALU ALU IM IM IM IM DM DM DM DM Instruction Flow through Pipeline Regs Time (clock cycles) CC 4 CC 3 CC 1 CC 2 Sub R6,R5,R7 Lw R4,0(R2) Xor R9,R8,R1 Add R1,R2,R3 Nop Add R1,R2,R3 Lw R4,0(R2) Sub R6,R5,R7 Nop Add R1,R2,R3 Nop Lw R4,0(R2) Nop Nop Nop Add R1,R2,R3 UAH-CPE631

DLX Pipeline Definition: IF, ID • Stage IF • IF/ID.IR  Mem[PC]; • if EX/MEM.cond {IF/ID.NPC, PC  EX/MEM.ALUOUT} else {IF/ID.NPC, PC  PC + 4}; • Stage ID • ID/EX.A  Regs[IF/ID.IR6…10]; ID/EX.B  Regs[IF/ID.IR11…15]; • ID/EX.Imm  (IF/ID.IR16)16 ## IF/ID.IR16…31; • ID/EX.NPC  IF/ID.NPC; ID/EX.IR  IF/ID.IR; UAH-CPE631

DLX Pipeline Definition: IE • ALU • EX/MEM.IR  ID/EX.IR; • EX/MEM.ALUOUT  ID/EX.A func ID/EX.B; orEX/MEM.ALUOUT  ID/EX.A func ID/EX.Imm; • EX/MEM.cond  0; • load/store • EX/MEM.IR  ID/EX.IR;EX/MEM.B  ID/EX.B; • EX/MEM.ALUOUT  ID/EX.A  ID/EX.Imm; • EX/MEM.cond  0; • branch • EX/MEM.NPC  ID/EX.A  ID/EX.Imm; • EX/MEM.cond  (ID/EX.A func 0); UAH-CPE631

DLX Pipeline Definition: MEM, WB • Stage MEM • ALU • MEM/WB.IR  EX/MEM.IR; • MEM/WB.ALUOUT  EX/MEM.ALUOUT; • load/store • MEM/WB.IR  EX/MEM.IR; • MEM/WB.LMD  Mem[EX/MEM.ALUOUT] orMem[EX/MEM.ALUOUT]  EX/MEM.B; • Stage WB • ALU • Regs[MEM/WB.IR16…20]  MEM/WB.ALUOUT; orRegs[MEM/WB.IR11…15]  MEM/WB.ALUOUT; • load • Regs[MEM/WB.IR11…15]  MEM/WB.LMD; UAH-CPE631

CPE 626: Advanced VLSI Design L02

CPE 626: Advanced VLSI Design L02

Presentation Transcript

Advanced VLSI Design Project Verification

CSCE 612: VLSI System Design

VLSI Physical Design Automation

ELEC 7770 Advanced VLSI Design Spring 2007 Introduction

ELEC 7770 Advanced VLSI Design Spring 2007 Clock Skew Problem

EE 4325 VLSI DESIGN

ECE 124a/256c Advanced VLSI Design

ELEC 7770 Advanced VLSI Design Spring 2007 Logic Equivalence

ELEC 7770 Advanced VLSI Design Spring 2012 Power and Ground

VLSI Digital Systems Design

ELEC 7770 Advanced VLSI Design Spring 2007 Zero - Skew Design

ELEC 7770 Advanced VLSI Design Spring 2012 Gate Sizing

Class Representation For Advanced VLSI Course

ELEC 7770 Advanced VLSI Design Spring 2007 Power Consumption in a Memory

Advanced VLSI Design Workshop

ELEC 7770 Advanced VLSI Design Spring 2007 Clock Skew Problem

ELEC 7770 Spring 2014 Advanced VLSI Design

Spezielle Anwendungen des VLSI – Entwurfs Applied VLSI design

vlsi

Overview