Understanding Numeric Processing in Computing

COMS 161Introduction to Computing Title: Numeric Processing Date: November 08, 2004 Lecture Number: 30

Announcements

Review • Real numbers • Representation • Limitations

Outline • Real numbers • Representation • Limitations

IEEE Standard 754 • Provides two floating point types • Single • 24-bits of significand precision • Double • 53-bits of significand precision

s exponent significand 30 23 22 31 0 Single Precision • IEEE standard 754 • Floating point number representation • 32-bit s eeeeeeee fffffff ffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative

Single Precision s eeeeeeee fffffff ffffffffffffffff • e: (8) exponent bits [-126 … 127] • A bias of 127 is added to the exponent • f: (24) fractional part [23 bits + 1 implied bit] • Normalize the fractional part • 1 will always be on the left side of the binary point

Special Single Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 0000 0000 0000 0000 0000 0000 0000 0000 • 0x0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 0000 • 0x8000 0000 (-0)

Special Single Cases • Positive infinity • +INF • s = 0, e = 255, f = 0 (all fractional bits are all 0) • 0111 1111 1000 0000 0000 0000 0000 0000 • 0x7f80 0000 • Negative infinity • -INF • s = 1, e = 255, f = 0 (all fractional bits are all 0) • 1111 1111 1000 0000 0000 0000 0000 0000 • 0xff80 0000

Special Single Cases • Not-A-Number (NaN) • s = 0 | 1, e = 255, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1100 0000 0000 0000 0000 0000 • 0x7fc0 0000

Special Single Cases • Maximum single number • 0111 1111 0111 1111 1111 1111 1111 1111 • 0x7f7f ffff • 3.40282347 x 1038 • Minimum positive single number • 0000 0000 1000 0000 0000 0000 0000 0000 • 0x00800000 • 1.17549435 x 10-38 • To represent larger numbers

Double Precision • IEEE standard 754 • Floating point number representation • 64-bit s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • s: (1) sign bit • 0 means positive, 1 means negative s exponent significand 62 52 51 63 32 significand 31 0

Single Precision s eeeeeeeeeeeffffffffffffffffffffffffffffffffffffffffffffffffff • e: (11) exponent bits [-1022 … 1023] • A bias of 1023 is added to the exponent • f: (53) fractional part [52 bits + 1 implied bit] • Normalize the fractional part • 1 will always be on the left side of the binary point

Byte 0 1 2 3 seeeeeee eee f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f Byte 4 5 6 7 Real (Decimal) Number Storage • Double precision floating point numbers • s: (1) sign bit • e: (11) exponent bits [-1022 … 1023] • f: (53) fractional part [52 bits + 1 implied bit]

Special Double Cases • Two zeros • Signed zero • e = 0, f = 0 (exponent and fractional bits are all 0) • (-1)s x 0.0 • 64 bits • 0000 0000 0000 0000 0000 0000 0000 … 0000 • 0x0000 0000 0000 0000 (+0) • 1000 0000 0000 0000 0000 0000 0000 … 0000 • 0x8000 0000 0000 0000 (-0)

Special Double Cases • Positive infinity • +INF • s = 0, e = 2047, f = 0 (all fractional bits are all 0) • 0111 1111 1111 0000 0000 0000 0000 … 0000 • 0x7ff0 0000 0000 0000 • Negative infinity • -INF • s = 1, e = 2047, f = 0 (all fractional bits are all 0) • 1111 1111 1111 0000 0000 0000 0000 … 0000 • 0xfff0 0000 0000 0000

Special Double Cases • Not-A-Number (NaN) • s = 0 | 1, e = 2047, f != 0 (at least one fractional bit is NOT 0) • There are many representations for NaN • Here is one example • 0111 1111 1111 1000 0000 0000 0000 … 0000 • 0x7ff8 0000 0000 0000

Special Double Cases • Maximum double number • 0111 1111 1110 1111 1111 1111 1111 … 1111 • 0x7fef ffff ffff ffff • 1.7976931348623157 x 10308 • Minimum positive single number • 0000 0000 0001 0000 0000 0000 0000 … 0000 • 0x0010 0000 0000 0000 • 2.2250738585072014 x 10-308 • Don’t forget about the implied 1 bit!!

Decimal to Float Conversion • Show –24.12510 in IEEE single precision format • First, save sign (negative so 1) and convert to binary… • 24.12510 = 11000.0012 x 20 • Normalize… • = 1.10000012 x 24 • Strip 1 off the mantissa and extend to form significand • = .10000010000000000000000 • Bias the exponent… • Exp + Bias = 4 + 127 = 131 = 100000112

Real (Decimal) Number Storage • 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00 • 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • Hex value : 0xC1C10000 • Link me baby

Understanding Numeric Processing in Computing