90 likes | 243 Vues
This comprehensive overview of floating point arithmetic discusses essential concepts in computer architecture, focusing on the representation of numbers with fractions, small and large values. It explains the IEEE 754 standard for single and double precision, emphasizing the significance of significand, exponent, and sign. Key details include normalization, biasing, and the implications of choices in representation for accurate arithmetic operations. Real-world examples illustrate the conversion of decimal numbers to binary formats, highlighting addition and multiplication processes within floating point systems.
E N D
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic
Floating Point (a brief look) • We need a way to represent • numbers with fractions, e.g., 3.1416 • very small numbers, e.g., .000000001 • very large numbers, e.g., 3.15576 109 • Representation: • sign, exponent, significand: (–1)signsignificand 2exponent • more bits for significand gives more accuracy • more bits for exponent increases range • IEEE 754 floating point standard: • single precision: 8 bit exponent, 23 bit significand • double precision: 11 bit exponent, 52 bit significand
Floating point representation: • The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point. • 12345 = 1.2345 * 10^4 • .0000012345 = 1.2345 * 10^-6 • Do this in binary: 1.01110 x 2^(1011) • IEEE FP representation • (+/-) 1.0101010101010101010101 * 2 ^ ( 10101010) • This is single precision • Double precision: 64 bits in all. • Where does one need accuracy of that level?
Floating point numbers • Representation issues: • sign bit, exponent, significand • Question: how to represent each field • Question: which order to lay them out in a word? • Factor: should be easy to do comparisons (for sorting) • For arithmetic, we will have special hardware anyway • Choice: • Sign + magnitude representation • Sign bit, followed by exponent, then significand (why?) • exponent: represented with a “bias”: add 127 (1023 for double precision) • significand: assume implicit 1. (so 00001 means 1.00001)
Floating point representation • So: • (+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number • Example: 0 00001000 01010000000000000000000 • Example: convert -.41 to single precision form
IEEE 754 floating-point standard • Leading “1” bit of significand is implicit • Exponent is “biased” to make sorting easier • all 0s is smallest exponent all 1s is largest • bias of 127 for single precision and 1023 for double precision • summary: (–1)signsignificand) 2exponent – bias • Example: • decimal: -.75 = -3/4 = -3/22 • binary: -.11 = -1.1 x 2-1 • floating point: exponent = 126 = 01111110 • IEEE single precision: 10111111010000000000000000000000
Floating point addition • The problem is: the exponents of numbers being added may be different • 2.0 * 10^1 + 3.0 * 10^(-1) • 2.0 * 10^1 + .03 * 10^ 1 : Now we can add them • 2.03 * 10 ^1 • But we are not necessarily done! • E.g. 9.74 * 10^0 + 3.3 * 10^(-1) • 10.07 * 10^0 is not correct form! • Shift again to get the correct form: 1.037 * 10^1
You can get different results • A + B + C = A + (B+C) = (A+B) + C • Right? • Can you see a problem? • When do you lose bits?
Floating point multiplication • Add exponents, but subtract bias • Then multiply significands • Then normalize