ECE 15B Computer Organization Spring 2010 Dmitri Strukov

ECE 15B Computer OrganizationSpring 2010Dmitri Strukov Lecture 4: Arithmetic / Data Transfer Instructions Partially adapted from Computer Organization and Design, 4th edition, Patterson and Hennessy, and classes taught by Ryan Kastner at UCSB

Agenda • Review of last lecture • Load/store operations • Multiply and divide instructions ECE 15B Spring 2010

Last Lecture ECE 15B Spring 2010

Assembly Language • Basic job of a CPU: execute lots of instructions • Instructions are the primitive operations that the CPU may execute • Different CPUs implement different sets of instructions • Instruction Set Architecture (ISA) is a set of instructions a particular CPU implements • Examples: Intel 80x86 (Pentium 4), IBM/Motorola Power PC (Macintosh), MIPS, Intel IA64, ARM ECE 15B Spring 2010

Assembly Variables: Registers • Unlike HLL like C or Java, assembly cannot use variables • Why not? Keep hardware simple • Assembly Operands are registers • Limited number of special locations built directly into the hardware • Operations can only be performed on these • Benefit: Since registers file is small, it is very fast ECE 15B Spring 2010

Assembly Variables: Registers • By convention, each register also has a name to make it easier to code • For now: $16 - $23  $s0 - $s7 (correspond to C variables) $8 - $15  $t0 - $t7 (correspond to temporary variables) Will explain other 16 register names later • In general, use names to make your code more readable ECE 15B Spring 2010

MIPS Syntax • Instruction Syntax: [Label:] Op-code [oper. 1], [oper. 2], [oper.3], [#comment] (0) (1) (2) (3) (4) (5) • Where 1) operation name 2,3,4) operands 5) comments 0) label field is optional, will discuss later • For arithmetic and logic instruction 2) operand getting result (“destination”) 3) 1st operand for operation (“source 1”) 4) 2nd operand for operation (source 2” • Syntax is rigid • 1 operator, 3 operands • Why? Keep hardware simple via regularity ECE 15B Spring 2010

Addition and Subtraction of Integers • Addition in assembly • Example: add $s0, $s1, $s2 (in MIPS) • Equivalent to: a = b + c (in C) • Where MIPS registers $s0, $s1, $s2 are associated with C variables a, b, c • Subtraction in Assembly • Example Sub $s3, $s4, S5 (in MIPS) • Equivalent to: d = e - f (in C) • Where MIPS registers $s3, $s4, $s5 are associated with C variables d, e, f ECE 15B Spring 2010

Addition and Subtraction of Integers • How do we do this? f = (g + h) – (i + j) Use intermediate temporary registers add $t0, $s1, $s2 #temp = g + h add $t1, $s3, $s4 #temp = I + j sub $s0, $t0, $t1 #f = (g+h)-(i+j) ECE 15B Spring 2010

Immediates • Immediates are numerical constants • They appear often in code, so there are special instructions for them • Add immediate: addi $s0, $s1, 10 # f= g + 10 (in C) • Where MIPS registers $s0 and $s1 are associated with C variables f and g • Syntax similar to add instruction, except that last argument is a number instead of register ECE 15B Spring 2010

Load and Store Instructions ECE 15B Spring 2010

CPU Overview ECE 15B Spring 2010

… with muxes • Can’t just join wires together • Use multiplexers ECE 15B Spring 2010

… with muxes ECE 15B Spring 2010

Memory Operands • Main memory used for composite data • Arrays, structures, dynamic data • To apply arithmetic operations • Load values from memory into registers • Store result from register to memory • Memory is byte addressed • Each address identifies an 8-bit byte • Words are aligned in memory • Address must be a multiple of 4 • MIPS is Big Endian • Most-significant byte at least address of a word • c.f. Little Endian: least-significant byte at least address ECE 15B Spring 2010

Data Transfer: Memory to Register • MIPS load Instruction Syntax lw register#, offset(register#) (1) (2) (3) (4) Where 1) operation name 2) register that will receive value 3) numerical offset in bytes 4) register containing pointer to memory lw– meaning Load Word 32 bits or one word are loaded at a time ECE 15B Spring 2010

Data Transfer: Register to Memory • MIPS store Instruction Syntax sw register#, offset(register#) (1) (2) (3) (4) Where 1) operation name 2) register that will be written in memory 3) numerical offset in bytes 4) register containing pointer to memory sw– meaning Store Word 32 bits or one word are stored at a time ECE 15B Spring 2010

Memory Operand Example 1 • C code: g = h + A[8]; • g in $s1, h in $s2, base address of A in $s3 • Compiled MIPS code: • Index 8 requires offset of 32 • 4 bytes per word lw $t0, 32($s3) # load wordadd $s1, $s2, $t0 offset base register ECE 15B Spring 2010

Memory Operand Example 2 • C code: A[12] = h + A[8]; • h in $s2, base address of A in $s3 • Compiled MIPS code: • Index 8 requires offset of 32 lw $t0, 32($s3) # load wordadd $t0, $s2, $t0sw $t0, 48($s3) # store word ECE 15B Spring 2010

Registers vs. Memory • Registers are faster to access than memory • Operating on memory data requires loads and stores • More instructions to be executed • Compiler must use registers for variables as much as possible • Only spill to memory for less frequently used variables • Register optimization is important! ECE 15B Spring 2010

Byte/Halfword Operations • MIPS byte/halfword load/store • String processing is a common case lb rt, offset(rs) lhrt, offset(rs) • Sign extend to 32 bits in rt lburt, offset(rs) lhurt, offset(rs) • Zero extend to 32 bits in rt sbrt, offset(rs) shrt, offset(rs) • Store just rightmost byte/halfword Why do we need them? characters and multimedia data are expressed by less than 32 bits; having dedicated 8 and 16 bits load and store instructions results in faster operation ECE 15B Spring 2010

Two’s Compliment RepresentationMultiply and Divide ECE 15B Spring 2010

Unsigned Binary Integers • Given an n-bit number • Range: 0 to +2n – 1 • Example • 0000 0000 0000 0000 0000 0000 0000 10112= 0 + … + 1×23 + 0×22 +1×21 +1×20= 0 + … + 8 + 0 + 2 + 1 = 1110 • Using 32 bits • 0 to +4,294,967,295 ECE 15B Spring 2010

2s-Complement Signed Integers • Given an n-bit number • Range: –2n – 1 to +2n – 1 – 1 • Example • 1111 1111 1111 1111 1111 1111 1111 11002= –1×231 + 1×230 + … + 1×22 +0×21 +0×20= –2,147,483,648 + 2,147,483,644 = –410 • Using 32 bits • –2,147,483,648 to +2,147,483,647 ECE 15B Spring 2010

2s-Complement Signed Integers • Bit 31 is sign bit • 1 for negative numbers • 0 for non-negative numbers • –(–2n – 1) can’t be represented • Non-negative numbers have the same unsigned and 2s-complement representation • Some specific numbers • 0: 0000 0000 … 0000 • –1: 1111 1111 … 1111 • Most-negative: 1000 0000 … 0000 • Most-positive: 0111 1111 … 1111 ECE 15B Spring 2010

Signed Negation • Complement and add 1 • Complement means 1 → 0, 0 → 1 • Example: negate +2 • +2 = 0000 0000 … 00102 • –2 = 1111 1111 … 11012 + 1 = 1111 1111 … 11102 ECE 15B Spring 2010

Sign Extension • Representing a number using more bits • Preserve the numeric value • In MIPS instruction set • addi: extend immediate value • lb, lh: extend loaded byte/halfword • beq, bne: extend the displacement • Replicate the sign bit to the left • c.f. unsigned values: extend with 0s • Examples: 8-bit to 16-bit • +2: 0000 0010 => 0000 00000000 0010 • –2: 1111 1110 => 1111 11111111 1110 ECE 15B Spring 2010

Integer Addition • Example: 7 + 6 ECE 15B Spring 2010

Integer Subtraction • Add negation of second operand • Example: 7 – 6 = 7 + (–6) +7: 0000 0000 … 0000 0111–6: 1111 1111 … 1111 1010+1: 0000 0000 … 0000 0001 ECE 15B Spring 2010

1000 × 1001 1000 0000 0000 1000 1001000 Multiplication • Start with long-multiplication approach multiplicand multiplier product Length of product is the sum of operand lengths ECE 15B Spring 2010

Multiplication Hardware Initially 0 ECE 15B Spring 2010

Stopped here… will start next lecture from here ECE 15B Spring 2010

Optimized Multiplier • Perform steps in parallel: add/shift • One cycle per partial-product addition • That’s ok, if frequency of multiplications is low ECE 15B Spring 2010

Faster Multiplier • Uses multiple adders • Cost/performance tradeoff • Can be pipelined • Several multiplication performed in parallel ECE 15B Spring 2010

MIPS Multiplication • Two 32-bit registers for product • HI: most-significant 32 bits • LO: least-significant 32-bits • Instructions • mult rs, rt / multu rs, rt • 64-bit product in HI/LO • mfhi rd / mflo rd • Move from HI/LO to rd • Can test HI value to see if product overflows 32 bits • mul rd, rs, rt • Least-significant 32 bits of product –> rd ECE 15B Spring 2010

Division • Check for 0 divisor • Long division approach • If divisor ≤ dividend bits • 1 bit in quotient, subtract • Otherwise • 0 bit in quotient, bring down next dividend bit • Restoring division • Do the subtract, and if remainder goes < 0, add divisor back • Signed division • Divide using absolute values • Adjust sign of quotient and remainder as required quotient dividend 1001 1000 1001010 -1000 10 101 1010 -1000 10 divisor remainder n-bit operands yield n-bitquotient and remainder ECE 15B Spring 2010

Division Hardware Initially divisor in left half Initially dividend ECE 15B Spring 2010

Optimized Divider • One cycle per partial-remainder subtraction • Looks a lot like a multiplier! • Same hardware can be used for both ECE 15B Spring 2010

Faster Division • Can’t use parallel hardware as in multiplier • Subtraction is conditional on sign of remainder • Faster dividers (e.g. SRT devision) generate multiple quotient bits per step • Still require multiple steps ECE 15B Spring 2010

MIPS Division • Use HI/LO registers for result • HI: 32-bit remainder • LO: 32-bit quotient • Instructions • div rs, rt / divu rs, rt • No overflow or divide-by-0 checking • Software must perform checks if required • Use mfhi, mflo to access result ECE 15B Spring 2010

Conclusions • In MIPS assembly language • Register replace C variables • One instruction (simple operation) per line • Simpler is faster • Memory is byte-addressable, but lw and sw access one word at a time • A pointer (used by lw and sw) is just a memory address, so we can add to it or subtract from it (using offset) ECE 15B Spring 2010

Review • Instructions so far: add, addi, sub mult, div, mfhi, mflo, lw, sw, lb, lbu, lh, lhu • Registers so far C variables: $s0 - $s7 Temporary variables: $t0 - $t9 Zero: $zero ECE 15B Spring 2010

ECE 15B Computer Organization Spring 2010 Dmitri Strukov