Computer arithmetic

Computer arithmetic Second try at third grade

What is stored and intended • Bit patterns of several sizes, nothing more • Almost all modern machines store and manipulate 1,2,4,8 byte quantities • older ones had 12 ,40 ,36,60-bit words • or decimal digit strings, or variable length BCD • The bit patterns can represent: • Numbers – fixed or floating point • Text data – ascii, extended ascii or unicode • Graphic data – pixels • Bit patterns – I/O register contents – often packed • Specialized data, sound or other signals, genomes

What operations are done • These operations are usually in the ISA: • Arithmetic operations on numbers – fixed or floating point • The standard four functions (+, -, *, / %) • Relational operations and comparison • Conversions – float to fixed • Logical operations on bit patterns – I/O register contents – often packed • Bit operations – set, clear, flip • Shifting and other bit field isolating methods • Packing and unpacking are done with logic and shifting

Data for which few operations are defined • Text data – ascii, extended ascii or unicode • Graphic data – pixels, bitmaps, vector graphics • Specialized data, sound or other signals, genomes

Number representation • Several common varieties of multidigit number representation exist • Most are directly descended from multidigit integer (positional notation) • Several examples • Sign and magnitude (people think this way) • Two’s complement (or ten’s complement) • Gray code (used to convert mechanical motion to glitchless binary)

Positional notation A number is represented by a string of digits in an understood base 2134510 The value of the number is the value of the resulting polynomial 2 * 104 + 1 * 103 + 3 * 102 + 4 * 101 + 5 * 100 In nested form this is (((((2 )* 10 + 1) * 10 + 3) * 10 * 4) * 10 * 5 The operations are the polynomial operations – addition, subtraction, multiplication, division with carry propagation so the value is preserved, but the result contains only valid digits again. Example – 21345 + 32767 = 54112 2 * 104 + 1 * 103 + 3 * 102 + 4 * 101 + 5 * 100 3 * 104 + 2 * 103 + 7 * 102 + 6 * 101 + 7 * 100 5 * 104 + 3 * 103 + 10 * 102 + 10 * 101 + 12 * 100 Or, after carry propagation - 5 * 104 + 4 * 103 + 1 * 102 + 1 * 101 + 2 * 100 Notice this carry propagation is recursive and is done from right to left

Operations on numbers • Numbers are represented in positional notation – a string of digits • Numbers are interpreted as a value • Operations in a computer manipulate the bit strings so as to generate a result that has the expected value – A + B, etc. • The algorithms are what you learned early in life – the same in any number base

Ai Bi One-digitadder Cin Cout Si The primitive addition element This configuration is valid for any base2 (binary), 16 (hex), decimal (10), or 2m Note that 2m corresponds to word sizesof 16, 32, or 64 bits; long word sizes are Needed for many cryptographic algorithms This logic can be used in writing multiword arithmetic operations

The multiply primitive This element can be used asa cell of an n x x array multiplier Although the number of cells is O(n2)for n =32 this is not too large for VLSINote that the carry propagationproperties are similar to the same number of digits of additionOther fast or multidigit multiply configurations exist Ai Bi One-digitmultiplier Cout Qi Ai Qi-1 One-digitadder Cout Cin Si

Multidigit add – any base • Cn  Cn-1 indicates signed overflow in binary • Note the long path for carry propagation An-1 Bn-1 An-2 Bn-2 A0 B0 Cn Cn-1 C1 C0 One-digitadder One-digitadder One-digitadder Sn-1 Sn-2 S0 Cn-1 Many stages go here

Why sign-and-magnitude notation is avoided 2 1 3 4 5 3 2 7 6 7 5 3 10 10 12 5 41 11 11 2 Subtraction is done the same way, but not exactly First the two numbers are compared; then the one withthe smaller magnitude is subtracted from the other, andthe sign is that associated with the large magnitude 2 1 3 4 5 -3 2 7 6 7 -3 2 7 6 72 1 3 4 5 -1 1 4 2 2 This requires at least one more comparison, so most integerarithmetic is done in complement notation rather thansign-and-magnitude.

Complement notation is easier and faster Note that (for example) 0 = -100000 + 99999 + 1 Or in binary 0 = -100000000 + 11111111 + 1 Subtracting a number from all 9’s or all ones is just flipping all the digits (or bits) and is done bit-by-bit, with no carry propagation Thus –A in binary is 11111111 – A + 1 or NOT (A) + 1, ignoring the carry Also, A – B is A + NOT(B) + 1 and no comparison step is needed B Add/Subtract A Complementer ADDER S

What’s wrong with S&M ? • Addition in S&M takes multiple steps – • Compare signs • If signs are different compare magnitudes • Sign is that of larger quantity • Subtract smaller from larger Input 1 Input 2 Compare and redirect ALU output

2’s complement representation Representation – note that 111111 – V is V– the bit flip (not) of VTo represent A (now positive or negative) we use A if A<=0 and 2m + A if A<0. Since 2m is 1 + 111111 = 1000000, 2m + A is 1 + A Examples in 6 bits-3 = -000011 = 1+111100 = 111101, -31 = -011111= 1 + 100000 = 100001Note that negative numbers are represented by bit patterns that representlarge positive numbers – for example in 6-bit we represent negatives by unsigned numbers > 31 and positives by themselves. Addition in 6 bits5 + (– 3) is 000101 + 111101 = 000010 – 2 -5 + 3 is 000011 + 111011 = 111110 = - 2

The multiply primitive - again This element can be used asa cell of an n x n array multiplier Although the number of cells is O(n2)for n =32 this is not too large for VLSINote that the carry propagationproperties are similar to the same number of digits of additionOther fast or multidigit multiply configurations exist Ai Bi One-digitmultiplier Cout Qi Ai Qi-1 One-digitadder Cout Cin Si

Human multiplication is much like the computer variety, but not exactly Multiplication is actually repeated addition, but we are used to its format In nonbinary bases In the multiplication primitive (previous slide) a one-digit multiplier is used The human also has such a multiplier installed by the Departamento de Educación. 2 1 3 4 5 21345 x3 2 7 6 7 32767 14 7 21 28 35 149415 12 6 18 24 30 127380 14 7 21 28 35 149415 4 2 6 8 10 42690 6 3 9 12 15 64035 699304715 Note: The human can add a column of one-digit numbers, thecomputer can’t add more than two numbers at a time, but theycan be multidigit.

How a decimal machine multiplies this example 21345 32767 0 149415 149415 127380 1423215 149415 16364715 42690 59054715 64035 699304714 21345 32767 0 149415 7 * 21345 14941 5 add and shift 127380 6 * 21345 14232 15 add and shift 149415 7 * 21345 16364 715 add and shift 42690 2 * 21345 5905 4715 add and shift 64035 3 * 21345 06993 04714 add and shift Note that the adder width is only 5 digits plus carry logic The bits shifted out are not changed in later steps

Multiplicand Adder Result Low-order bits Multiplier A binary unsigned example • = 1310 • 1011 = 1110 • 0000 • 1101 • 1101 • 0110 1 • 1101 • 10011 1 • 1001 11 • 0100 111 • 1101 • 10001 111 • 1000 1111 = 14310 All three of these registers shift at each step

Multiplicand Adder Multiplicand Result Low-order bits Multiplier Adder Result Low-order bits coming in Multiplierbits going out This can be done with one long shift register

Multiplicand Adder Remainder Low-order bits going out Quotient bits coming in Division is almost the inverse of multiplication Divisor Adder Subtraction tells if sign changesif so, this quotient bit is zero andresult is ignored – if not, result isretained and quotient bit is oneBoth registers shift left at each step Remainder Low-order bits Quotient

Some comments • Algorithms • The same form in any base – you learned the format many years ago • The inverse of a multistep process is usually the inverse steps in inverse order • Binary means only additions rather than one-digit multiplication is done • Representations • Memory contents are just bit strings – meaning is assumed when you do operations – logical or arithmetic

More comments • Addition/subtraction are basic • Multiplication and division are just repeated adds and subtracts • Clock rate of a pipelined machine is limited by the time needed for an addition. • Thus, fast carry propagation techniques are essential • In earlier times several formats were used • Bit-serial, binary, sign-and-magnitude • Now integer arithmetic is always two’s complement binary – floating-point uses several representations • http://www.bitsavers.org has data on many historic machines • Don’t waste time doing computer arithmetic by hand except to learn principles • Especially don’t do binary-decimal or such conversions

Speeding Up Addition With Carry Lookahead • Speed of digital addition depends on carries • A base b = 2k divides length of carry chain by k • Two level logic for base b digit becomes complex quickly as k increases • If we could compute the carries quickly, the full adders compute result with 2 more gate delays • Carry lookahead computes carries quickly • It is based on two ideas: —a digit position generates a carry —a position propagates a carry in to the carry out This and the next six slides are taken from the textbook slides

Binary Propagate and Generate Signals • In binary, the generate for digit j is Gj = xjyj • Propagate for digit j is Pj = xj+yj • Of course xj+yj covers xjyj but it still corresponds to a carry out for a carry in • Carries can then be written: c1 = G0 + P0c0 • c2 = G1 + P1G0 + P1P0c0 • c3 = G2 + P2G1 + P2P1G0 + P2P1P0c0 • c4 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0c0 • In words, the c2 logic is: c2 is one if digit 1 generates a carry, or if digit 0 generates one and digit 1 propagates it, or if digits 0&1 both propagate a carry in

Speed Gains With Carry Lookahead • It takes one gate to produce a G or P, two levels of gates for any carry, & 2 more for full adders • The number of OR gate inputs (terms) and AND gate inputs (literals in a term) grows as the number of carries generated by lookahead • The real power of this technique comes from applying it recursively • For a group of, say 4, digits an overall generate is G10 = G3 + P3G2 + P3P2G1 + P3P2P1G0 • An overall propagate is P10 = P3P2P1P0

Recursive Carry Lookahead Scheme • If level 1 generates G1j and propagates P1j are defined for all groups j, then we can also define level 2 signals G2j and P2j over groups of groups • If k things are grouped together at each level, there will be logkm levels, where m is the number of bits in the original addition • Each extra level introduces 2 more gate delays into the worst case carry calculation • k is chosen to trade-off reduced delay against the complexity of the G and P logic • It is typically 4 or more, but the structure is easier to see for k=2

Fig. 6.4 Carry Lookahead Adder for Group Size k = 2

Fast multiplication methods • Array multipliers • Silicon is cheap – use an array of the primitive one-digit multipliers shown previously • Booth’s algorithm – • two bits at a time means half as many adds • Carry-save multiplication • Avoids carry propagation except at the last step

Fig. 6.5 Digital Multiplication Schema p: product pp: partial product

Signed and unsigned operations • Addition and subtraction are the same in two’s complement integer • Unsigned and signed multiplication are somewhat different – the same basic structure is used, but the overflow-handling and last step change.

Table 6.5 Radix-4 Booth Encoding (Bit-Pair Encoding)

Carry-save multiplication • Basic idea • Three numbers can be added to make two • This is done without any carry-propagation delay • So 32 numbers can be added as follows:32>22>16>12>8>6>4>3>2 • But then the last two must be added conventionally – • The entire operation takes about as long as two normal additions

A refinement to carry-save • The basic block is now 4>3>2 • Groups of 4 numbers are added through two stages to make 2 numbers • In our previous terms32>24>16>12>8>6>4>3>2 • Compare with32>22>16>12>8>6>4>3>2 • This version takes much less long interconnect lines

Condition codes • A condition code register usually remembers the results of the last significant operation • SRC doesn’t have one, 80x86 does • Significant operation usually means an arithmetic or logical (including shifts) but not a move, pop, or push • The usual bits are C(Carry), V(oVerflow), N(negative), Z(Zero) • The 80x86 instructions are based on useful logical combinations of theseG(Greater), E(Equal), L(Less) relate to signed representationA(Above), E(Equal), B(Below) relate to unsigned representation • Note that compare works correctly even if the result overflows

Floating-point basics • Floating-point numbers and operations are like scientific notation • Number has sign, mantissa, exponent • Example - -1.453*10-4 • Addition and subtraction have similar stages • Align small number to larger (in absolute value) • Add/subtract • Renormalize if needed (so mantissa is smaller than base and >=1) • Multiplication and division also have similar stages • Add/subtract exponents • Multiply magnitudes • XOR for result sign • Consequences • Number has sign, exponent, and magnitude fields • S&M used for magnitudes since absolute value is needed • Separate ALU’s are needed than for fixed point operations ICOM 4206 – Floating point arithmetic

IEEE formats 1. S mantissa Fractional part of mantissa Comments: implicit MSB implies number is normalized since this bit is always 1, why store, it; use the bit for an extra bit of precision instead. exponent is stored in excess format – 127, 1023, 16383 extended precision is used internally to some FPU’s; it is not standard. ICOM 4206 – Floating point arithmetic

Numeric examples All examples are short (32 bit) format Notes: This shows the general pattern, don’t try to duplicate it Note, sign-and-magnitude, not complementation is used Sign-and-magnitude comparison, not floating point, is sufficient to compare All integers (unless beyond precision range) are exactly represented Fractions are repeating decimals unless denominator is a power of 2 ICOM 4206 – Floating point arithmetic

First operand Second operand Sign compare Exponent subtraction Swap numbers if left has smaller exponent Prenormalize Shift right by exponent difference S&M add mantissas Postnormalize and adjust exponent Reassemble and store General form of a FPU for addition ICOM 4206 – Floating point arithmetic

Comments on the unusual blocks This is just normal Complement subtraction Sign compare Exponent subtraction Swap numbers if left has smaller exponent Prenormalize Shift right by exponent difference Extra rounding digits must be kept here S&M add mantissas Postnormalize and adjust exponent Reassemble and store ICOM 4206 – Floating point arithmetic

Floating point add/subtract steps explained • Sign comparison • Since mantissas are in S&M, second argument sign is flipped for subtraction, then S&M addition rules are followed • Exponent comparison • Normal complement subtraction of exponents cancels out the excess in the representation. • The operands are swapped if the second operand has the larger exponent difference • Prenormalization shift • The second operand is shifted right by the amount of the exponent difference • Three extra bits, called round, guard, and sticky, must be retained • Addition • This is a standard S&M addition, but it must include the extra bits • Postnormalization • This shift can be from one place right to many places left if the difference is small. • Rounding decisions are made here and not earlier ICOM 4206 – Floating point arithmetic

Floating point multiply/divide Exponent add or subtract Mantissa multiply Or divide Sign comparison and XOR Exponent adjust Postnormalizing shifts Full precision Is kept here Rounding is Done only here Reassembly ICOM 4206 – Floating point arithmetic

IEEE Floating-point format • Previous floating-point formats had problems • IBM 360 – used base 16 exponent – poor precision • All IBM – biased rounding destabilized numerical algorithms • Some HP – same problems, also used floating decimal, which ruined speed and was incompatible • CDC – strange word lengths – 60 bits, etc. • In general, accuracy, control of rounding, and incompatibility were the problems – read book for description of incidents • Weather prediction and other big numerical algorithms depend heavily on good floating point algorithms • IEEE floating-point features • Standardized format • Programmer control of rounding methods • Space in the format for unnormalized and not-a-number (NAN) formats • Implementations • 8087 (Intel) coprocessor was defined before standard, and has a misdefined stack – programming problems ever since (especially compiler design) ICOM 4206 – Floating point arithmetic

IEEE format – some details • Mantissa is in S&M format with understood one in MSB • Understood 1 requires S&M • Understood 1 gives one more digit of precision • Prenormalization is easier, because incoming numbers are almost always normalized • Exponent field in excess format, coming before mantissa • With S&M format this means fixed-point S&M compare can find which number is larger (if they are normalized) • Exponent renormalization doesn’t require sign logic • All zero and all one exponents • These are the extremes, very rare in real numbers • All zeroes is used to code unnormalized numbers • All ones are used to code not-a-numbers (usually resulting from exceptions) • Rounding varieties • Toward 0 • Toward plus infinity • Toward minus infinity • Unbiased ICOM 4206 – Floating point arithmetic

Register structure of MIPS ICOM 4206 – Floating point arithmetic

MIPS coprocessor • Operations • Add and subtract (single, double) – add.s, sub.s, add.d, sub.d • Multiply and divide (single, double) – mul.s, div.d, etc. • Load/store (single) - lwc1, swc1 – note, this is coprocessor 1 • Compare (single, double) – c.lt.s, c.lt.d • Branch conditional – bclt, bclf – • Compares and branches use a special condition register • Moves between coprocessor and CPU registers – mfc1, mfc1.d • Registers • 32 coprocessor registers – separate from others • Condition register – used only by coprocessor branch and compare • No lo and hi registers in coprocessor • For double, registers are used in pairs - $f2 means $f2 and $f3 in double • Notes • Data moves don’t really use floating arithmetic, just floating registers • Note register specifications in appendix A or on mips instruction sheet • Don’t trust anything except appendix A – mips instruction sheet doesn’t have all the instructions • Programming is like expression programming otherwise ICOM 4206 – Floating point arithmetic

Floating-point arithmetic Floating-point basics The IEEE standard The MIPS floating-point coprocessor MIPS floating-point instructions and registers ICOM 4206 – Floating point arithmetic

The basics of the MIPS floating point instructions ICOM 4206 – Floating point arithmetic

Notes Floating processor is coprocessor 1, not elle ,d, .s, w or d refer to double, single, word, or doubleword Order of convert specifiers is dest, source The opcodes themselves ICOM 4206 – Floating point arithmetic

Computer arithmetic