EENG 449bG/CPSC 439bG Computer Systems Lecture 2 Instruction Set Architectures

EENG 449bG/CPSC 439bG Computer SystemsLecture 2 Instruction Set Architectures January 13, 2005 Prof. Andreas Savvides Spring 2005

Announcements • Class syllabus changes • Instruction Set Architectures Review • Pipelining Review • Comparison of microprocessor families • Multicore chips survey • Simulators • Embedded OS and Real-time systems • Battery technologies • Low power considerations • Tools for software synthesis on embedded processors • Processors in networks • Verification topics • Reading material for this lecture, Chapter 2 of Patterson and Hennessy

The Instruction Set: a Critical Interface The instruction set is what the Programmer sees about the architecture software instruction set hardware

Instruction Set Aspects(Sections 2.1 – 2.12 of text) • Classes of Instruction Set Architectures • Memory Addressing Issues • Addressing Modes • Instruction Operators • Instruction Set Encoding • Role of Compilers

ISA Level FUs & Interconnect Organization Logic Designer's View • Capabilities & Performance Characteristics of Principal Functional Units • (e.g., Registers, ALU, Shifters, Logic Units, ...) • Ways in which these components are interconnected • Information flows between components • Logic and means by which such information flow is controlled. • Choreography of FUs to realize the ISA • Register Transfer Level (RTL) Description

Memory-Memory ISA Classes • Stack Push A Push B Add Pop C No registers => less hardware but slow, cannot pipeline • Accumulator – one operand is implicitly the accumulator (special purpose register e.g 6502 microntroller) Load A Add B Store C No registers => less hardware, high memory traffic, slow/bottleneck

Register based ISA Classes • Register-Memory Load R1, A Add R1, B Store C, R1 • Load-Store Load R1, A Load R2, B Add R3, R1, R2 Store C, R3 • Registers are faster than memory • Registers are more efficient for compilers • Compiler writers prefer all registers to be equivalent

Instruction Sets in Modern Computers • Most modern computers have 16 or more general purpose registers • Specialized processors such as DSP processors still have several special purpose registers • New ISAs are register based BUT there are still tradeoffs • The ideal number of operands depends on the type of processor and application • Fewer decisions on instruction formats easier for compiler writers

Memory Interpretation: Endianess • Byte ordering becomes an issue when exchanging data between computers • Little Endian – least significant bit to the right 7 6 5 4 3 2 1 0 Intel 80x86, ARM, some 8-bit microcontollers (e.g Atmel’s AVR series) • Big Endian – least significant bit to the left 0 1 2 3 4 5 6 7 IBM 370, Motorola 68K, MIPS, Sparc, HP

Memory Interpretation: Alignment • Objects larger than 1 byte must be aligned • Misaligned memory accesses take multiple aligned memory references • What do we mean by “aligned” An object of size s bytes at byte address A is aligned IF A mod s = 0

Addressing Modes Register Add R3, R4 R3<- R3+R4 Immediate Add R4, #3 R4<- R4+3 Displacement Add R4, 100(R1) R4<- R4+Mem[100+R1] Register Indirect Add R4, (R1) R4<- Mem[R1] Indexed Add R3, (R1+R2) R3<- Mem[R1+R2] Direct or absolute Add R1, (1001) R1<- R1+Mem[1001] Memory indirect Add R1, @(R3) R1<- R1+Mem[Mem[R3]] Autoincrement Add R1, (R2)+ R1<-R1+Mem[R2]; R2<-R2+d Autodecrement Add R1, -(R2) R2<-R2-d;R1<-R1+Mem[R2] Scaled Add R1, 100(R2)[R3] R1<-R1+Mem[100+R2+R3xd] Modes 1-4 account for 93% of all VAX operands

Special Addressing Modes in DSP Processors • Modulo or circular addressing • For reading buffers • Bit reverse addressing • Address 100 becomes 001 • Special feature for DSPs • Although DSP processors used to require customized programming, DSP programmers are coming closer to traditional compilers • Automatic code-generation is also becoming a reality (e.g DSP algorithms in MATLAB can be automatically converted to DSP native code)

Instruction Operations Arithmetic and logical- add, subtract, and, or multiply, divide Data transfer - load-stores Control - Branch, jump, procedure call, return traps System – OS call, virtual memory management instructions Floating point – add, multiply, divide, compare Decimal – add, multiply, decimal to character conversion String – move,compare, search Graphics – Pixel and vector operations, compress/decompress operations

Control Flow Instructions 4 basic types • Conditional branches • Jumps • Procedure calls • Procedure returns

Instruction Encoding • How are instructions encoded in a binary representation to be executed by the processor • Need to tell the processor the type of operation, and the addressing modes • Many competing forces • Desire to have as many registers and addressing modes as possible • Register sizes and addressing modes impact instruction and program size • Instruction encoding should facilitate pipelining • Instruction sizes multiple of bytes • Fixed length instructions may yield better performance but result in larger average code sizes

Instruction Formats … Variable: Fixed: Hybrid: • Addressing modes • each operand requires addess specifier => variable format • code size => variable length instructions • performance => fixed length instructions • simple decoding, predictable operations • With load/store instruction arch, only one memory address and few addressing modes • => simple format, address mode given by opcode

Encoding the Instruction Set

Role of Compilers • ISA is a compiler target • Architectural decisions affect the quality of code generated by the compiler • Compiler goals • Priority • Correctness • Speed • Fast compilation, debugging support & interoperability among languages

Role of Compilers

Register Allocation • Probably the most important optimization in compilers why? • Most register allocation algorithms are based on graphcoloring • An optimization method • Construct a graph with the possible candidates for allocation • Use a limited number of colors so that no two adjacent nodes have the same color (so registers are represented by colors) • Why is register allocation important? • You tell me… • Why do compiler writers want lots of registers? • Register allocation with methods like graph coloring work better when more registers are available. • Refer to Figure 2.25 of text for a list of the major types of compiler optimizations

DSPs and Media Processors • Both typically used in embedded applications • Main difference from other microprocessors is real-time performance • Worst case performance vs. average performance • Infinite, continuous streams of data vs. fixed data set • Small number of key kernels is critical, often supplied by manufacturer • Libraries are important, widely used • Include tricks to improve performance for targetted kernels but no compiler will generate

DSP Introduction • Digital Signal Processing: Application of mathematical operations to digitally represented signals • Signals represented as sequences of samples • Digital signals obtained from physical signals via transducers (e.g microphones, accelerometers, seismic sensors) and analog-to-digital converters(ADC) • Digital signals converted back to physical signals via digital-to-analog converters (DAC) • Digital Signal Processor (DSP): electronic system that provides digital signals Interaction with the physical world is becoming one of the most exciting fields for computer engineering!!!

Common DSP Algorithms and Applications • Applications – Instrumentation and measurement • Communications • Audio and video processing • Graphics, image enhancement, 3-D rendering • Navigation, radar, GPS • Control – robotics, machine vision, guidance • Algorithms • Frequency domain filtering – FIR and IIR • Frequency – time transformations • Correlation

Some Project Ideas… • Embedded Operating Systems • Implement a small monitor/debugger for an embedded processor • Develop a power saving scheme inside an embedded OS to utilize the power saving features of the specific processor • Hardware/Software Interfacing • Experiment with high frequency analog sampling • Design & develop interfaces and/or instructions for different sensors, control external entities for mobile nodes • Algorithms & Protocols Oriented Projects • Develop a protocol for a new sensor network application

The MIPS Architecture Features: • GPRs with load-store • Displacement, Immediate and Register Indirect Addressing Modes • Data sizes: 8-, 16-, 32-, 64-bit integers and 64-bit floating point numbers • Simple instructions: load, store, add, sub, move register-register, shift • Compare equal, compare not equal, compare less, branch, jump call and return • Fixed instruction encoding for performance, variable instruction encoding for size • Provide at least 16 general purpose registers

MIPS Architecture Features Registers: • 32 64-bit GPRs (R0, R1…R31) • Note: R0 is always 0 !!! • 32 64-bit Floating Point Registers (F0,F1… F31) Data types: • 8-bit bytes, 16-bit half words • 32-bit single precision and 64-bit double precision floating point instructions Addressing Modes: • Immediate (Add R4, R3 --- Regs[R4]<-Regs[R4]+3 • Displacement (Add R4, 100(R1) – Regs[R4]<-Mem[100+Regs[R1]] • Register indirect (place 0 in the displacement field) • E.g Add R4, 0(R1) • Absolute Addressing (place R0 as the base register) • E.g Add R4, 1000(R0)

MIPS Instruction Format op – opcode (basic operation of the instruction) rs – first register operant rt – second register operant rd – register destination operant shamnt – shift amount funct – Function Example: LW t0, 1200($t1) 9 8 35 1200 binary 01001 01000 100011 0000 0100 1011 0000 Note: The numbers for these examples are form “Computer Organization & Design”, Chapter 3

MIPS Instruction Format op – opcode (basic operation of the instruction) rs – first register operant rt – second register operant rd – register destination operant shamnt – shift amount funct – Function Example: Add $t0, $s2,$t0 18 8 8 0 32 0 binary 00000 10010 01000 01000 00000 100000 Note: The numbers for these examples are form “Computer Organization & Design”, Chapter 3

MIPS Instruction Format op – opcode (basic operation of the instruction) rs – first register operant rt – second register operant rd – register destination operant shamnt – shift amount funct – Function Example: j 10000 10000 2 binary ? ? You fill it in!

MIPS Operations Four broad classes supported: • Loads and stores (figure 2.28) • Different data sizes: LD, LW, LH, LB, LBU … • ALU Operations (figure 2.29) • Add, sub, and, or … • They are all register-register operations • Control Flow Instructions (figure 2.30) • Branches (conditional) and Jumps (unconditional) • Floating Point Operations

Levels of Representation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program lw $t15, 0($t2) lw $t16, 4($t2) sw $t16,0($t2) sw $t15,4($t2) Compiler Assembly Language Program Assembler 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Language Program Machine Interpretation Control Signal Specification ° °

Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Execution Cycle Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor instruction

Adder 4 Address Inst ALU 5 Steps of MIPS Datapath Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory Access Write Back Next PC MUX Next SEQ PC Zero? RS1 Reg File MUX RS2 Memory Data Memory L M D RD MUX MUX Sign Extend Imm WB Data

EENG 449bG/CPSC 439bG Computer Systems Lecture 2 Instruction Set Architectures