1 / 12

# IEEE 754 Floating Point

IEEE 754 Floating Point. Luddy Harrison CS433G Spring 2007. What is represented. Real numbers 5.6745 1.23 × 10 19 Remember however that the representation is finite , so only a subset of the reals can be represented No trancendentals Limited range Limited precision (number of digits).

Télécharger la présentation

## IEEE 754 Floating Point

E N D

### Presentation Transcript

1. IEEE 754 Floating Point Luddy Harrison CS433GSpring 2007

2. What is represented • Real numbers • 5.6745 • 1.23 × 1019 • Remember however that the representation is finite, so only a subset of the reals can be represented • No trancendentals • Limited range • Limited precision (number of digits)

3. Normalizing Numbers • In Scientific Notation, we generally choose one digit to the left of the decimal point • 13.25 × 1010 becomes 1.325 × 1011 • Normalizing means • Shifting the decimal point until we have the right number of digits to its left (normally one) • Adding or subtracting from the exponent to reflect the shift

4. Binary Floating Point • A binary number in scientific notation is called a floating point number • Examples: • 1.001 × 217 • 0.001 × 2-13

5. Parts of a floating point number • ±1.mmmmmmm × B±eeee • A signed fixed-point fraction (±1.mmmmmmm) called the mantissa • For non-zero mantissas, the leading 1 is implicit • That is, it is not present in the representation (bit pattern), but it is assumed to be there when interpreting the bit pattern • See the previous lecture for the meaning of fixed point fractions • An implicit base B • A unsigned integer (±eeee) called the exponent • An implicit bias. The actual exponent is eeee – bias • Some bit patterns are reserved for special values • Not ANumber • ±∞

6. About IEEE 754 • This standard defines several floating point types and the meaning of operations (+, ×, etc.) on them • Single • Double • Extended Precision • It deals at length with the thorny questions of • Erroneous and exceptional results • Rounding and conversion

7. 32-bit Single Precision S E M 1 8 23 -1S × 1.M × 2E - 127 • E is an unsigned twos-complement integer. A bias of 127 is used, so that the actual exponent is E – 127. • Exponents 00000000 and 11111111 are reserved for special purposes • The sign bit of the mantissa is separated from magnitude bits of the mantissa. The mantissa is therefore an unsigned fixed point fraction with an implicit 1 to the left of the binary point. • All zero bits (S, E, and M) means zero (0). In this case there is no leading 1 mantissa bit implied.

8. Some examples 0 0 0 = 0 (note that there is no implicit leading 1 here) 1 100 1010…0000 = -1 × 1.101 × 24-127 = -13/8 × 2-123 0 11111110 0000…0000 = 1.0 × 2254-127 = 1 × 2127

9. Denormalized Numbers 0 00000000 0000…0001 = 0.0000…0001 × 2-126 An exponent field of zero is special; it indicates that there is no implicit leading 1 on the mantissa. This allows very small numbers to be represented. Note that we cannotnormalize this value. (Why?) Zero is effectively a denorm (and it cannot be normalized – why?) 0 11111110 0000…0001 = 1.0000…0001 × 2254-127 = 1.0000…0001 × 2127 Here, the mantissa has an implicit leading 1. If we wanted 0.0000…0001 × 2127 we could obtain it by writing 1.0 × 2104.

10. 64-bit Double Precision S E M 1 11 52 -1S × 1.M × 2E - 1023 • E is an unsigned twos-complement integer. A bias of 1023 is used, so that the actual exponent is E – 1023. • As before, an exponent of all 0 bits or all 1 bits is reserved for special values. • As before, the mantissa is an unsigned fixed point fraction with an implicit 1 to the left of the binary point. The sign of the entire number is held separately in S. • A representation of all zero bits (S, E, and M) means zero (0). In this case there is no leading 1 mantissa bit implied.

11. Infinity 0 11111111 0 = +∞ 1 11111111 0 = -∞

12. Not ANumber x 11111111 ≠0 x 11111111 1xxx…xxxx Quiet NaN x 11111111 0xxx…xxxx ≠ 0 Signalling NaN

More Related