Understanding Floating Point Arithmetic in Computer Architecture

CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic

Floating Point (a brief look) • We need a way to represent • numbers with fractions, e.g., 3.1416 • very small numbers, e.g., .000000001 • very large numbers, e.g., 3.15576  109 • Representation: • sign, exponent, significand: (–1)signsignificand 2exponent • more bits for significand gives more accuracy • more bits for exponent increases range • IEEE 754 floating point standard: • single precision: 8 bit exponent, 23 bit significand • double precision: 11 bit exponent, 52 bit significand

Floating point representation: • The idea is to normalize all numbers, so the significand has exactly one digit to the left of the decimal point. • 12345 = 1.2345 * 10^4 • .0000012345 = 1.2345 * 10^-6 • Do this in binary: 1.01110 x 2^(1011) • IEEE FP representation • (+/-) 1.0101010101010101010101 * 2 ^ ( 10101010) • This is single precision • Double precision: 64 bits in all. • Where does one need accuracy of that level?

Floating point numbers • Representation issues: • sign bit, exponent, significand • Question: how to represent each field • Question: which order to lay them out in a word? • Factor: should be easy to do comparisons (for sorting) • For arithmetic, we will have special hardware anyway • Choice: • Sign + magnitude representation • Sign bit, followed by exponent, then significand (why?) • exponent: represented with a “bias”: add 127 (1023 for double precision) • significand: assume implicit 1. (so 00001 means 1.00001)

Floating point representation • So: • (+/-) x (1 + significand) x 2 ^ (exponent - bias) is the value of a floating point number • Example: 0 00001000 01010000000000000000000 • Example: convert -.41 to single precision form

IEEE 754 floating-point standard • Leading “1” bit of significand is implicit • Exponent is “biased” to make sorting easier • all 0s is smallest exponent all 1s is largest • bias of 127 for single precision and 1023 for double precision • summary: (–1)signsignificand) 2exponent – bias • Example: • decimal: -.75 = -3/4 = -3/22 • binary: -.11 = -1.1 x 2-1 • floating point: exponent = 126 = 01111110 • IEEE single precision: 10111111010000000000000000000000

Floating point addition • The problem is: the exponents of numbers being added may be different • 2.0 * 10^1 + 3.0 * 10^(-1) • 2.0 * 10^1 + .03 * 10^ 1 : Now we can add them • 2.03 * 10 ^1 • But we are not necessarily done! • E.g. 9.74 * 10^0 + 3.3 * 10^(-1) • 10.07 * 10^0 is not correct form! • Shift again to get the correct form: 1.037 * 10^1

You can get different results • A + B + C = A + (B+C) = (A+B) + C • Right? • Can you see a problem? • When do you lose bits?

Floating point multiplication • Add exponents, but subtract bias • Then multiply significands • Then normalize

Understanding Floating Point Arithmetic in Computer Architecture

Understanding Floating Point Arithmetic in Computer Architecture

Presentation Transcript

Universal Mechanisms for Data-Parallel Architectures

GSM Protocol Architecture

CSC 317 Computer Organization and Architecture

System Software and Machine Architecture

COMPUTER ORGANIZATION AND ARCHITECTURE

Computer Architecture

DESIGN OF SOFTWARE ARCHITECTURE

CSC: 345 Computer Architecture

CSCD102

Conceptual Architecture View

Computer Organization and Architecture + Networks

Computer Networks

Advanced Computer Architecture CSE 8383

CPE 323 Introduction to Embedded Computer Systems: The MSP430 System Architecture

Interconnection Networks Computer Architecture: A Quantitative Approach 4 th Edition, Appendix E

80386 MICROPROCESSOR Architecture

198:211 Computer Architecture

Computer Architecture I: Digital Design Dr. Robert D. Kent

EEL 5764 Graduate Computer Architecture Chapter 2 - Instruction Level Parallelism

ECE 4100/6100 Advanced Computer Architecture Lecture 15 Static Scheduling Machines