Processor architectures

Processor architectures SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic

Overview • Introduction • Basic structure of a processor • Basic Operations • Pipelining • Registers • Example design on an application-specific processor • General purpose processors • Example of FIR on a general purpose processor • Datapath of a MIPS processor

What is “Computer Architecture”? • Coordination of many levels of abstraction • Under a rapidly changing set of forces Application Operating System Compiler Firmware Instruction Set Architecture Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design Layout

Levels of abstraction • Delving into the depths reveals more information • An abstraction omits unneeded detail, helps us cope with complexity

Basic Structure of a Computer References[Patterson04]

Basic Structure of a Computer (2) • Input Unit • Keyboards, joysticks, trackballs, microphones and mice • Output Unit • Printers and graphic displays • Memory Unit • Primary (cache, RAM, HDD) and secondary (CD-ROM, tape drives) • Arithmetic and Logic Unit (ALU) • Executions completed here and stored in fast-access registers • Control Unit (CU) • Provides control to all other units, including timing signals References[Patterson04]

Basic Operation of a Computer • The computer accepts information in the form of programs and data through an input unit and stores it in memory • The information stored in memory is fetched, under program control, and processed in an ALU • The processed information leaves the computer through an output unit • All activities inside the computer are directed by the control unit References[Patterson04]

Detailed Instruction Cycle Copied from References[Patterson04]

Detailed Instruction Cycle (2) • Instruction address calculation • Determines the address of the next instruction to be executed • Instruction fetch • Reads the instruction from its memory location into the processor • Instruction operation decoding • Analyzes the instruction to determine the type of operation to be performed and the operand(s) to be used • Operand address calculation • Determines the address of the operand (if needed) • Operand fetch • Fetches the operand from memory or read it from I/O • Data operation • Performs the operation indicated in the instruction • Operand store • Write the results into memory or out to I/O References[Patterson04]

Fast, Pipelined Instruction Interpretation Next Instruction NI NI NI NI NI IF IF IF IF IF D D D D D Instruction Fetch E E E E E W W W W W Decode & Operand Fetch Execute Store Results Instruction Address Instruction Register Time Operand Registers Result Registers Registers or Mem Copied from References[Culler-Slides]

Visualizing Pipelining Reg Reg Reg Reg Reg Reg Reg Reg Ifetch Ifetch Ifetch Ifetch DMem DMem DMem DMem ALU ALU ALU ALU Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Time (clock cycles) I n s t r. O r d e r Copied from References[Culler-Slides]

Terminology • Performance - Time • MIPS • MFLOPS • Cycles per Instruction (CPI) • Architectures • RISC – Reduced Instruction Set Computer • CISC – Complex Instruction Set Computer • Scalar • Superscalar • Very-long instruction word

Comparison: CISC, RISC, VLIW Copied from [Philips]

Sequential application specific processor • A processor tuned only for a particular application • Can be used for low-power implementations • Word lengths can be adjusted to the current problem. • Example: FIR filter

Direct form FIR filter Copied from [Wanhammer99]

Transposed FIR Copied from [Wanhammer99]

Assignment • Design an N-tap transposed linear-phase FIR filter as a sequential application specific processor. Use only one multiplier and show how processing time can be decreased twice. Hint: design a transposed FIR filter structure as in the previous slide but allow for generating the sums in reversed order PSN-1, PSN-2, …, PS1, y(n). Copied from [Wanhammer99]

General purpose processor architecture • FIR example • We will study RISC architectures • Single-cycle processor • Implementation of add and load instructions • Pipelined implementation • Why do all instructions have the same number of cycles

Example: Digital Filtering • The basic FIR Filter equation is Where h[k] is an array of constants y[n]=0; For (n=0; n<N;n++) { For (k = 0;k<N;k++) //inner loop y[n] = y[n] + h[k]*x[n-k];} Only Multiply and Accumulate (MAC) is needed! In C language

MAC using General Purpose Processor (GPP) R0 R2 44 X R1

The MIPS Instruction Formats 31 26 21 16 11 6 0 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 immediate op rs rt 6 bits 5 bits 5 bits 16 bits 31 26 0 op target address 6 bits 26 bits • All MIPS instructions are 32 bits long. The three instruction formats are: • R-type • I-type • J-type • The different fields are: • op: operation of the instruction • rs, rt, rd: the source and destination register • shamt: shift amount • funct: selects the variant of the operation in the “op” field • address / immediate: address offset or immediate value • target address: target address of the jump instruction Copied from References[Shulte-Slides]

Translating MIPS Assembly into Machine Language • Humans see instructions as words (assembly language), but the computer sees them as ones and zeros (machine language). • An assembler translates from assembly language to machine language. • For example, the MIPS instruction add $t0, $s1, $s2 is translated as follows Assembly Comment add op = 0, shamt = 0, funct = 32 $t0 rd = 8 $s1 rs = 17 $s2 rt = 18 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct Copied from References[Shulte-Slides]

MIPS Addressing Modes/Instruction Formats • All MIPS instructions are 32 bits wide - fixed length add $s1, $s2, $s3 Register (direct) op rs rt rd register Immediate addi $s1, $s2, 200 op rs rt immed Base+index op rs rt immed Memory register + lw $s1, 200($s2) PC-relative op rs rt immed Memory PC + beq $s1, $s2, 200 Copied from References[Shulte-Slides]

Clk PC Instruction address Instruction Memory Instruction Rd Rt Rs Imm 5 5 5 16 32 Rw Ra Rb 32 32-bit registers Data address Data Memory 32 32 Data out Data in 32 Clk 32 Clk Architecture of the MIPS core Copied from [Meerbergen-Slides]

31 26 21 16 11 6 0 Op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits • add rd, rs, rt • mem[PC] • R[rd] = R[rs] + R[rt] • PC = PC + 4 Rd Rt Rs 5 Reg Wr 5 5 ALUctr BusA 32 Rw Ra Rb 32 32-bit registers Bus W Result 32 32 BusB 32 Clk Example 1 : R - type : add instruction Copied from [Meerbergen-Slides]

Critical path R-type operation Clk PC Instruction address Instruction Memory Instruction Rd Rt Rs Imm 5 5 5 16 32 Rw Ra Rb 32 32-bit registers Data address Data Memory 32 32 Data out Data in Clk 32 Clk Copied from [Meerbergen-Slides]

Critical path R-type operation Clock Clock-to-Q PC New value Old value Instruction memory access time Rs, rt, rd op, funct Old value New value RFile access time Bus A,B Old value New value ALU delay Bus W Old value New value Set up + skew Write into RFile Copied from [Meerbergen-Slides]

31 26 21 16 0 Op rs rt immediate 6 bits 5 bits 5 bits 16 bits Rd Rt RedDst dc (Rt) Rs 5 Reg Wr 5 5 ALUctr MemtoReg BusA 32 Rw Ra Rb 32 32-bit registers Bus W Result 32 32 MemWr BusB 32 Clk WrEn Adr Data Memory Data In 32 Imm 16 16 32 Extender Clk ExtOp ALUSrc Example 2 : I-type : load word • lw rs, rt, imm16 • mem[PC] • addr = R[rs] + ext[imm16] • R[rt] = mem[addr] • PC = PC + 4 Copied from [Meerbergen-Slides]

Critical path load operation Clock Clock-to-Q PC Old value New value Instruction memory access time Rs, rt, rd op, funct Old value New value RFile access time Bus A,B Old value New value ALU delay address Old value New value Mem access time Bus W Old value New value set up+skew Copied from [Meerbergen-Slides]

cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 Ifetch RF read ALU dmem RF write E.g. load 5 stages Architecture of the MIPS core • problem : long critical path • defined by the slowest instruction (load) • solution ? • = pipelining • break the instruction into smaller steps • all steps have about the same critical path Copied from [Meerbergen-Slides]

Pipelining lw instructions [Hennessy&Patterson] cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 lw Ifetch RF read ALU dmem RF write lw Ifetch RF read ALU dmem RF write lw Ifetch RF read ALU dmem RF write • One instructions enters the pipeline every clock cycle • One instructions leaves the pipeline every clock cycle • => CPI = 1 (Cycles per Instruction) Copied from [Meerbergen-Slides]

I I I I I R R R R R A A A A A M M M M M W W W W W Pipelining lw instructions I R A M W Instructions Data Current CPU cycle Copied from [Meerbergen-Slides]

4 stages of R-type instruction cycle 1 cycle 2 cycle 3 cycle 4 Ifetch RF read ALU RF write E.g. ADD Copied from [Meerbergen-Slides]

Resource conflict on the write port of the Rfile Pipelining lw and R-type instructions [Hennessy&Patterson] cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 lw Ifetch RF read ALU dmem RF write add Ifetch RF read ALU RF write Copied from [Meerbergen-Slides]

cycle 1 cycle 2 cycle 3 cycle 4 cycle 5 cycle 6 cycle 7 lw Ifetch RF read ALU dmem RF write add Ifetch RF read ALU dmem RF write add Ifetch RF read ALU dmem RF write Solution: stretch R-type to 5 stages Ifetch RF read ALU dmem RF write Dummy op (noop) Copied from [Meerbergen-Slides]

mem wr Ifetch exec Reg/dec RegWr branch Next PC Rfile + 4 flags Rs BusA Ra Rt Rb BusB adr Prog mem Di Rw Data mem Dout ext. Imm16 Din Rt Rd MemtoReg [Hennessy&Patterson] MemWr RegDst ALUSrc ExtOp ALUop Copied from [Meerbergen-Slides]

DM DM DM DM DM RF RF RF RF RF IM IM IM IM IM RF RF RF RF RF Data dependencies : R-type instructions [Hennessy&Patterson] R1 = ... … = R1 + ... … = R1 + ... … = R1 + ... … = R1 + ... Copied from [Meerbergen-Slides]

References [Culler-Slides] D. E. Culler, Computer Architecture, Lecture slide, Computer Science at Berkeley. [Hamacher01] C. Hamacher, Z. Vranesic, S. Zaky, Computer Organization, McGraw-Hill Science/Engineering/Math; 5th edition, August 2, 2001. [Patterson04] D. A. Patterson, J. L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Morgan Kaufmann; 3rd edition, August 2, 2004. [Shulte-Slides] M. Schulte Computer Architecture ECE 201, Lecture slides. The other reference can be found at: www.site.uottawa.ca/~mbolic/elg6131/References.htm

Processor architectures