1 / 50

ECE 369

ECE 369. Chapter 3. operation. a. ALU. 32. result. 32. b. 32. Lets Build a Processor, Introduction to Instruction Set Architecture. First Step Into Your Project !!! How could we build a 1-bit ALU for add, and, or ? Need to support the set-on-less-than instruction (slt)

jacie
Télécharger la présentation

ECE 369

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 369 Chapter 3

  2. operation a ALU 32 result 32 b 32 Lets Build a Processor, Introduction to Instruction Set Architecture • First Step Into Your Project !!! • How could we build a 1-bit ALU for add, and, or? • Need to support the set-on-less-than instruction (slt) • slt is an arithmetic instruction • produces a 1 if a < b and 0 otherwise • use subtraction: (a-b) < 0 implies a < b • Need to support test for equality (beq $t5, $t6, Label) • use subtraction: (a-b) = 0 implies a = b • How could we build a 32-bit ALU? Must Read Appendix

  3. One-bit adder • Takes three input bits and generates two output bits • Multiple bits can be cascaded cout = a.b + a.cin + b.cin sum = a <xor> b <xor> cin

  4. Building a 32 bit ALU

  5. 000 = and001 = or010 = add110 = subtract What about subtraction (a – b) ? • Two's complement approach: just negate b and add. • How do we negate? • A very clever solution: 000 = and001 = or010 = add

  6. Supporting Slt • Can we figure out the idea? 000 = and001 = or010 = add110 = subtract111 = slt

  7. Test for equality • Notice control lines000 = and001 = or010 = add110 = subtract111 = slt • Note: Zero is a 1 if result is zero!

  8. How about “a nor b” 000 = and001 = or010 = add110 = subtract111 = slt

  9. Big Picture

  10. Conclusion • We can build an ALU to support an instruction set • key idea: use multiplexor to select the output we want • we can efficiently perform subtraction using two’s complement • we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware • all of the gates are always working • speed of a gate is affected by the number of inputs to the gate • speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, • Clever changes to organization can improve performance (similar to using better algorithms in software) • How about my instruction smt (set if more than)???

  11. ALU Summary • We can build an ALU to support addition • Our focus is on comprehension, not performance • Real processors use more sophisticated techniques for arithmetic • Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware!

  12. Multiplication • More complicated than addition • Accomplished via shifting and addition • More time and more area

  13. Multiplication: Implementation

  14. Example

  15. Second version

  16. Example

  17. Final version

  18. Example

  19. Division • Even more complicated • Can be accomplished via shifting and addition/subtraction • More time and more area • Negative numbers: Even more difficult • There are better techniques, we won’t look at them

  20. Division: First version

  21. Example

  22. Division: Second version

  23. Improved Division

  24. Division (7÷2)

  25. Floating point (a brief look) • We need a way to represent • Numbers with fractions, e.g., 3.1416 • Very small numbers, e.g., 0.000000001 • Very large numbers, e.g., 3.15576 x 109 • Representation: • Sign, exponent, fraction: (–1)sign x fraction x 2exponent • More bits for fraction gives more accuracy • More bits for exponent increases range • IEEE 754 floating point standard: • single precision: 8 bit exponent, 23 bit fraction • double precision: 11 bit exponent, 52 bit fraction

  26. IEEE 754 floating-point standard • 1.f x 2e • 1.s1s2s3s4…. snx2e • Leading “1” bit of significand is implicit • Exponent is “biased” to make sorting easier • All 0s is smallest exponent, all 1s is largest • Bias of 127 for single precision and 1023 for double precision

  27. Single Precision • summary: (–1)sign x (1+significand) x 2(exponent– bias) • Example: • 11/100 = 11/102= 0.11 = 1.1x10-1 • Decimal: -.75 = -3/4 = -3/22 • Binary: -.11 = -1.1 x 2-1 • IEEE single precision: 1 01111110 10000000000000000000000 • exponent-bias=-1 => exponent = 126 = 01111110

  28. Opposite Way - 129 0x2-1+1x2-2=0.25

  29. Floating point addition 1.610x10-1 + 9.999x101 0.01610x101 + 9.999x101 10.015x101 1.0015x102 1.002x102

  30. Floating point addition

  31. Add 0.510 and -0.437510

  32. Multiplication

  33. Floating point multiply • To multiply two numbers • Add the two exponent (remember access 127 notation) • Produce the result sign as exor of two signs • Multiply significand portions • Results will be 1x.xxxxx… or 01.xxxx…. • In the first case shift result right and adjust exponent • Round off the result • This may require another normalization step

  34. Multiplication 0.510 and -0.437510

  35. Floating point divide • To divide two numbers • Subtract divisor’s exponent from the dividend’s exponent (remember access 127 notation) • Produce the result sign as exor of two signs • Divide dividend’s significand by divisor’s significand portions • Results will be 1.xxxxx… or 0.1xxxx…. • In the second case shift result left and adjust exponent • Round off the result • This may require another normalization step

  36. Floating point complexities • Operations are somewhat more complicated (see text) • In addition to overflow we can have “underflow” • Accuracy can be a big problem • IEEE 754 keeps two extra bits, guard and round • Four rounding modes • Positive divided by zero yields “infinity” • Zero divide by zero yields “not a number” • Other complexities • Implementing the standard can be tricky • Not using the standard can be even worse • See text for description of 80x86 and Pentium bug!

  37. Chapter Three Summary • Computer arithmetic is constrained by limited precision • Read pages 213-215 • Bit patterns have no inherent meaning but standards do exist • two’s complement • IEEE 754 floating point • Operations are somewhat more complicated (see text) • In addition to overflow we can have “underflow” • Implementing the standard can be tricky • Not using the standard can be even worse (3.10) • See text for description of 80x86 and Pentium bug!

  38. Optional Reading

  39. Overflow

  40. Formulation

  41. A Simpler Formula ?

  42. Problem: Ripple carry adder is slow! • Is a 32-bit ALU as fast as a 1-bit ALU? • Is there more than one way to do addition? • Can you see the ripple? How could you get rid of it? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 = • Not feasible! Why?

  43. Carry Bit

  44. Generate/Propagate 0 0 0 1 0 1 1 1

  45. Generate/Propagate (Ctd.)

  46. Carry-look-ahead adder • Motivation: • If we didn't know the value of carry-in, what could we do? • When would we always generate a carry? gi = ai . bi • When would we propagate the carry? pi = ai + bi • Did we get rid of the ripple? c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1g0 + p1p0c0 c3 = g2 + p2c2 c3 = g2 + p2g1 + p2p1g0 + p2p1p0c0 c4 = g3 + p3c3 c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0 • Feasible! Why? c1 = a0b0 + a0c0 + b0c0 c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 = a3 a2 a1 a0 b3 b2 b1 b0

  47. A 4-bit carry look-ahead adder • Generate g and p term for each bit • Use g’s, p’s and carry in to generate all C’s • Also use them to generate block G and P • CLA principle can be used recursively

  48. 16 Bit CLA

  49. 1 1+2 1+2+2 Gate Delay for 16 bit Adder

  50. 64-bit carry lookahead adder

More Related