332:578 Deep Submicron VLSI Design Lecture 17 Functional Units, Multiplication, and Division

332:578 Deep SubmicronVLSI DesignLecture 17Functional Units, Multiplication, and Division David Harris and Mike Bushnell Harvey Mudd College and Rutgers University Spring 2005

Outline • Unsigned vs. Signed Numbers • Boolean Operations • Error Correcting Codes • Multi-input Adders • Multipliers • Priority Encoders • Dividers • Summary Material from: CMOS VLSI Design, by Weste and Harris, Addison-Wesley, 2005 Deep Submicron VLSI Des. Lec. 17

Signed vs. Unsigned • For signed numbers, comparison is harder • C: carry out • Z: zero (all bits of A-B are 0) • N: negative (MSB of result) • V: overflow (inputs had different signs, output sign  B) Deep Submicron VLSI Des. Lec. 17

Signed vs. Unsigned Deep Submicron VLSI Des. Lec. 17

Boolean Logical Operations • Use a MUX circuit Deep Submicron VLSI Des. Lec. 17

Circuit Operation • Assign different P values to get various Boolean operations • MUX between adder and Boolean unit or merge Boolean unit into adder as in TTL 181 ALU Deep Submicron VLSI Des. Lec. 17

Coding • Correct SRAM/DRAM soft errors • Due to a particles or cosmic rays • Reduce bit error rates of communication links • Parity tree example Deep Submicron VLSI Des. Lec. 17

Hamming Error Correcting Codes (ECCs) • Hamming distance Hd between 2 numbers -- # bits in which they differ • Add check bits to data words for ECC • Increase the Hd between legal code words • If an illegal code word detected, the legal code word closest to it is the corrected word • Parity has Hd of 2 – detects but cannot correct errors • Make Hd = 3 Hamming code of length 2c-1 with c check bits and N = 2c – c – 1 data bits Deep Submicron VLSI Des. Lec. 17

Code Generation Procedure • Number bits from 1 to 2c – 1 • Each bit in a position that is power of 2 is check bit • Choose check bit value to get even parity for all bits with a 1 in the same position as the check bit Deep Submicron VLSI Des. Lec. 17

Gray Codes • Binary-reflected code • Start with all 0 and keep flipping the right-most bit that gives a new string • Use to save power in finite state machines – successive states follow Gray code • Use also to synchronize counters across clock domains • Either get the current or the previous value because only 1 bit changes per clock Deep Submicron VLSI Des. Lec. 17

Gray Code Deep Submicron VLSI Des. Lec. 17

Static XOR/XNOR Circuits Deep Submicron VLSI Des. Lec. 17

Static XOR/XNOR Circuit • Does not swing rail-to-rail Deep Submicron VLSI Des. Lec. 17

STATIC CMOS XOR Deep Submicron VLSI Des. Lec. 17

CPL XOR/XNOR Circuit Deep Submicron VLSI Des. Lec. 17

CVSL XOR/XNOR Deep Submicron VLSI Des. Lec. 17

Dynamic XOR/XNOR • Both true & complementary inputs needed • Violates monotonicity rule • Solutions: • Push XOR/XNOR to end of chain of Domino logic and built it as static logic • Use dual-rail Domino logic Deep Submicron VLSI Des. Lec. 17

Multi-input Adders • Suppose we want to add kN-bit words • Ex: 0001 + 0111 + 1101 + 0010 = _____ Deep Submicron VLSI Des. Lec. 17

Multi-input Adders • Suppose we want to add kN-bit words • Ex: 0001 + 0111 + 1101 + 0010 = 10111 Deep Submicron VLSI Des. Lec. 17

Multi-input Adders • Suppose we want to add kN-bit words • Ex: 0001 + 0111 + 1101 + 0010 = 10111 • Straightforward solution: k-1 N-input CPAs • Large and slow Deep Submicron VLSI Des. Lec. 17

Carry Save Addition • A full adder sums 3 inputs and produces 2 outputs • Carry output has twice weight of sum output • N full adders in parallel are called carry save adder • Produce N sums and N carry outs Deep Submicron VLSI Des. Lec. 17

CSA Application • Use k-2 stages of CSAs • Keep result in carry-save redundant form • Final CPA computes actual result Deep Submicron VLSI Des. Lec. 17

Multiplication • Example: Deep Submicron VLSI Des. Lec. 17

Multiplication • Example: • M x N-bit multiplication • Produce NM-bit partial products • Sum these to produce M+N-bit product Deep Submicron VLSI Des. Lec. 17

General Form • Multiplicand: Y = (yM-1, yM-2, …, y1, y0) • Multiplier: X = (xN-1, xN-2, …, x1, x0) • Product: Deep Submicron VLSI Des. Lec. 17

16X16 Mult. Dot Diagram • Each dot represents a bit Deep Submicron VLSI Des. Lec. 17

Array Multiplier Deep Submicron VLSI Des. Lec. 17

Rectangular Array • Squash array to fit rectangular floorplan Deep Submicron VLSI Des. Lec. 17

Optimizations • 1st row adds 1st partial product to pair of 0’s • Change first CSA row to add 1st 3 partial products together • Reduces row count by 2 and reduces adder propagation delay • Can also use 1st row of CSAs to add one or two other inputs with no extra delay • Most common DSP operation: Y = A B + C • Speed up by replacing bottommost row with CPA or lookahead or tree adder • Asymmetric circuit some inputs have more logical effort than others Deep Submicron VLSI Des. Lec. 17

2’s Complement Multiplication • 2 partial products have negative weight • Must be subtracted • Baugh-Woodley algorithm takes 2’s comp. of terms to be subtracted • In example, AND gates replaced by NAND gates in hatched cells • Extra ones added in unused inputs to take correct 2’s complement • Use XOR’s to conditionally invert some of the terms to select between signed and unsigned multiplication Deep Submicron VLSI Des. Lec. 17

2’s Comp. Multiplier Deep Submicron VLSI Des. Lec. 17

Simplified Partial Products Deep Submicron VLSI Des. Lec. 17

Modified Baugh-Woodley Deep Submicron VLSI Des. Lec. 17

Fewer Partial Products • Array multiplier requires N partial products • If we looked at groups of r bits, we could form N/r partial products. • Faster and smaller? • Called radix-2r encoding • Ex: r = 2: look at pairs of bits • Form partial products of 0, Y, 2Y, 3Y • First three are easy, but 3Y requires adder  Deep Submicron VLSI Des. Lec. 17

Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product Deep Submicron VLSI Des. Lec. 17

Booth Encoding • Instead of 3Y, try –Y, then increment next partial product to add 4Y • Similarly, for 2Y, try –2Y + 4Y in next partial product Current Prev. Deep Submicron VLSI Des. Lec. 17

Booth Hardware • Booth encoder generates control lines for each PP • Booth selectors choose PP bits Xi means add in Y 2Xi means add in 2Y M means negate partial prod. Deep Submicron VLSI Des. Lec. 17

332:578 Deep Submicron VLSI Design Lecture 17 Functional Units, Multiplication, and Division