Shift Operations

Shift Operations Source: David Harris Aug 2007

Shifter Implementation Regular layout, can be compact, use transmission gates to avoid threshold drop. Not amenable to synthesis, high capacitive loading for large arrays. Source: David Harris Aug 2007

Shifter Implementation Each level shifts by two. Amenable to synthesis, fast. Aug 2007

Multiplication Source: David Harris Aug 2007

Array Multiplier with CPAs Array adder with Carry propagate adders (CPA), multiple near-critical paths Source: Jan Rabaey Aug 2007

Array Multiplier with CSAs Only one critical path Source: Jan Rabaey Aug 2007

How do CSAs work? CSA: Carry Save Adder Want to add these four numbers together (same problem as adding partial products in a multiplier) Source: David Harris Aug 2007

How do CSAs work? (cont) Can use a full adder network to add three numbers together if we view the carry-in inputs as a bus that contains the third number. The output produces a sum vector and a carry vector, and these have to be added to produce the final result. Source: David Harris Aug 2007

How do CSAs work? (cont) carry vector has to be shifted to left by 1 before being added to the sum because the COUT bit has a weight of 2x that of the sum bit. Source: David Harris Aug 2007

CSA Multiplier Carry is shifted to left before being added. This final addition is always N/2 in size if the product has N bits. For large multipliers, need to use a fast adder structure to do this addition. Aug 2007 Source: Jan Rabaey

Multiplier Layout Layout can be made to be rectangular Source: David Harris Source: David Harris Aug 2007

Source: David Harris 2’s Complement Multiply Definition MSb has negative weight MSb has negative weight 4 bit 2’s complement example: = -5 = 0xB = 1011 = -1*23 + 0*22 +1*21 +1*20 =-8+0+2+1=-5 Source: David Harris Aug 2007

2’s Complement Multiplication Source: David Harris 2’s complement Source: David Harris Aug 2007

Modified Baugh-Wooley Multiplier(2’s complement) Source: David Harris Pre-compute sums of constant ‘1’, push some terms upwards. Aug 2007

Multiplier Layout For Two’s Complement Shaded Cells are modified cells for Baugh-Wooley. Source: David Harris Aug 2007

Booth Encoding Previous multipliers use radix-2, one bit of the multiplier is observed at a time. In general, radix-2r multipliers produce N/r partial products (assuming NxN multiplier). Fewer partial products lead to smaller/faster CSA arrays. A radix-4 = radix-22 multiplier produces N/2 partial products. Two-bits * two bits = Y1Y0 * X1X0 = Y*X = Y*0, Y*1, Y*2, Y*3 Y*0, Y*1, Y*2 are easy/fast (Y*2 is a shift). Y*3 is hard, has to be done Y*3= Y*(2+1)= 2Y + Y, involves a carry propagate. Aug 2007

Radix-4 Partial Products Y XN-1XN-2...X3X2 X1X0 * Y* X1X0 Number of partial products is reduced. + Y* X3X2 + Y* XN-1XN-2 Source: David Harris Aug 2007

Booth Encoding (cont.) Observe that 2Y = 4Y – 2Y and 3Y = 4Y – Y 4Y is simply the next row in the partial product, so just add Y to next row. In both cases, Y has to be added to current partial product. Booth encoding looks at current 2 bits, and MSB of previous 2 bits, and modifies the partial product. If the MSB of the previous pair is ‘1’, add in ‘Y’ to current value. Aug 2007

Booth Encoding (cont) PP =0*Y PP =0*Y +Y = Y PP =Y +0 = Y PP =Y +Y = 2Y PP =-2Y +0 = -2Y PP =-2Y +Y = -Y PP =-Y +0 = -Y PP =-Y +Y = 0 Negative operations are done at bit level as complements with +1 added to PP to complete 2’s complement 1Y select Sign bit select 2Y select Aug 2007 Source: David Harris

Booth Selection Logic Replaces AND gates in CSA array When –Y is chosen, have a problem in that a ‘1’ has to be added to complete two’s complement Source: David Harris Aug 2007

Unsigned R-4 Booth Array (16 x 16) sign extension, either all 1’s or all 0’s for-Y terms Extra PP in case last PP needed a ‘Y’ added in here (last two X bits were either 2 or 3) ‘1’ or ‘0’ needed to complete 2’s complement Source: David Harris Aug 2007

Optimized R-4 Booth Array (unsigned) SSSS = 1111 + S additional reduction produces this. Source: David Harris Aug 2007

Signed R-4 Booth Array (16 x 16) ei = Mi xor y15 Last PP8 is not needed for signed multiply Source: David Harris Aug 2007

Booth Speedup • Radix-4 arrays 20-to-50% smaller than CSA arrays and up to 20% faster. • Higher Radix multipliers are possible, but not worth it except for larger multipliers (at least 64 bits). Aug 2007

Wallace Trees A CSA adder just adds the PPs together one at a time: 3,2 Counter is another name for a full adder Source: David Harris Aug 2007

Wallace Trees (cont). A Wallace tree adds the partial products in parallel! Number of levels is: Layout is not regular, long wires can cause delay. Source: David Harris Aug 2007

4-2 Compressor Used to reduce the number of levels in a Wallace Tree Number of levels is: Layout is more regular. Logic more complex than Full Adder Source: David Harris Aug 2007

Multiplier Summary • CSA’s – simple, but many partial products • Booth Encoding – reduces number of required PPs, achieves speedup over CSAs • Wallace Trees – adds PPs in parallel Aug 2007

Shift Operations

Shift Operations

Presentation Transcript

Shift Work

SHIFT HAPPENS

4.0 Shift

LOL BUTTERFIELD Shift Advisor Shift

Shift

Shift Happens

ARM shift operations

Shift happens!

Doppler Shift

Strategic shift

Streamlining CDF Shift Operations

ATLAS (Shift) Operations Documentation and Training

Shift Micro operations

SHIFT: infrastructure

Shift Registers and Shift Register Counters

Shift Work

Logical and Shift operations

SHIFT: infrastructure

Shift Register

Basic Shift Register Operations

Shift:

SHIFT upgrade