Advanced Adder Design Techniques for Efficient Arithmetic Operations

CSE 246: Computer Arithmetic Algorithms and Hardware Design Lecture 4: Adders Instructor: Prof. Chung-Kuan Cheng

Topics: • Adders • AND/OR gate v.s. Circuit • Logic Design • Graph Design (Prefix Adder)

Chapter 2: ADDERS • Half Adders • Half adders can add two 1-bit binary numbers when there is no carry in. • If the inputs are xi and yi, the sum and carry-out is given by the formula • si = xi ^ yi • ci+1 = xi . yi • We use the following notations throughout the slides • . means logical AND • + means logical OR • ^ means logical XOR • ‘ means complementation

Full Adder • The inputs are x[i], y[i] (operand bits) and c[i] (carry in) • The outputs are s[i] (result bit) and c[i+1] (carry out) • Inputs and outputs are related by these relations • s[i] = x[i] ^ y[i] ^ c[i] • c[i+1] = x[i].y[i] + c[i].(x[i] + y[i]) = x[i].y[i] + c[i].(x[i] ^ y[i])

Full Adder • If carry-in bit is zero, then full adder becomes half adder • If carry-in bit is one, then • s[i] = (x[i] ^ y[i])’ • c[i+1] = x[i] + y[i] • To add two n-bit numbers, we can chain n full adders to build a ripple carry adder

Ripple Carry Adder x[0] y[0] cin/c[0] x[n-1] y[n-1] x[1] y[1] c[n-1] . . . c[1] c[2] cout s[n-1] s[1] s[0] Overflow happen when operands are of same sign, and the result is of different sign. If we use 2’s complement to represent negative numbers, overflow occurs when (cout ^ c[n-1]) is 1

Ripple Carry Adder • For sake of brevity, we use the following notations: • g[i] = x[i].y[i] • p[i] = x[i] + y[i] • In terms of these notations, we can rewrite carry equations as • c[1] = g[0] + p[0].c[0] • c[2] = g[1] + p[1].c[1] • and so on… • We shall use these notations afterwards while discussing the design of other kind of adders • It has been observed that expected length of carry chain is 2, while expected maximal length of carry chain is lg n. Hence, ripple carry adders are in general fast.

Ripple Carry Adder • How do know that an adder has completed the operation? • Worst case scenario: Wait for the longest chain in the carry propagation network • We might inspect c[i+1] and its complement b[i+1] to determine the status of the adder

Improvement to Ripple Carry Adder: Manchester Adders • By intelligently using our device properties, we can reduce the complexity of the circuit used to compute carries in a ripple carry adder. • Define: a[i] = (x[i])’.(y[i])’ • Next we observe that c[i+1] is 1 in exactly these scenarios: • g[i] is 1, i.e. both x[i] & y[i] are 1 • c[i] is 1 and it is propagated because p[i] is 1 • c[i+1] is ‘pulled down’ to logic 0 irrespective of the value of c[i], when a[i] is 1, i.e. both x[i] and y[i] are 0 • From these conditions, and keeping in mind the general characteristics of transistor devices we can design simplified circuits for computing carries – as shown in the next slide

Improvement to Ripple Carry Adder: Manchester Adders

Implementation of Manchester Adder using MOS transistors This is essentially the same circuit for computing carry, but implemented with MOS devices

Manchester Adder: Alternate design • We divide the computation cycle into two distinct half-cycle : ‘precharge’ and ‘evaluate’. In the precharge half-cycle, g[i] and c[i+1] are assigned a tentative value of logic 1. This is evaluated in the next half-cycle with actual value of a[i]. • The actual circuit for computing carries is shown in the next slide.

Manchester Adder: Alternate design evaluation precharge Q Time 

Carry Look-ahead Adder • In a ripple-carry adder m-full adders are grouped together (m is usually equal to 4). Once the carry-in to the group is known, all the internal carries and the output carry is calculated simultaneously. • We can use some algebraic manipulations to minimize hardware complexity. • Consider the carry out of the group • c[i] = g[i-1] + p[i-1].c[i-1] • Putting the value of c[i-1], we can rewrite as c[i] = g[i-1] + p[i-1].g[i-2] + p[i-1].p[i-2].c[i-2] • Proceeding in this manner we get c[i] = g[i-1] + p[i-1].g[i-2] + p[i-1].p[i-2].g[i-3] + p[i-1].p[i-2].p[i-3].g[i-4] + p[i-1].p[i-2].p[i-3].p[i-4].c[i-4] • To further simplify the equation, we note that g[i-1] = g[i-1].p[i-1], and p[i-1] can be factored out

Ling’s Adder c[i] = g[i-1] + p[i-1].g[i-2] + p[i-1].p[i-2].g[i-3] + p[i-1].p[i-2].p[i-3].g[i-4] + p[i-1].p[i-2].p[i-3].p[i-4].c[i-4] We replace p[i]=x[i]^y[i] with t[i]=x[i]+y[i]. Because g[i]=g[i]t[i], we have c[i] = g[i-1]t[i-1] + t[i-1]g[i-2] + t[i-1].t[i-2].g[i-3] + t[i-1].t[i-2].t[i-3].g[i-4] + t[i-1].t[i-2].t[i-3].t[i-4].c[i-4] Let h[i] = g[i-1] + g[i-2] + t[i-2].g[i-3] + t[i-2].t[i-3].g[i-4] + t[i-2].t[i-3].t[i-4].t[i-5] h[i-4] C[i]= h[i]t[i-1]

Ling’s Adder h[0]=c[0] h[3]=g[2]+g[1]+t[1]g[0]+t[1]t[0]h[0] s[3]=p[3]^c[3]=p[3]^(h[3]t[2]) =t[3]’h[3]t[2]+t[3](h[3]’+t[2]’) =h[3]’p[3]+h[3](p[3]^t[2]) h[6]=g[5]+g[4]+t[4]g[3]+t[4]t[3]t[2]h[3] s[6]=h[6]’p[6]+h[6]’(p[6]^t[5])

Generalized Design for Adders: Prefix Adder • Prefix computation • Given n inputs x1, x2, x3…xn and an associative operator ×. We want to compute yi = xi× xi-1× xi-2…× x2× x1 for all i, 1≤ i ≤n • x can be a scalar/vector/matrix • For design of adders, we define the operator × in the following manner • (g, p) = (g’, p’) × (g’’, p’’) • g = g’’ + p’’.g’ • p = p’.p’’

Alternate modeling of Prefix Computer: Finite State Machine • A finite state machine has a set of states, and it ‘moves’ from one state to another according to input. Mathematically, • sk = f (sk-1, ak-1) • The problem is to determine final state sn in O(lg n) operations, given initial state s0 and sequence of inputs (a0, a1, …an-1) • This problem can be formulated in terms of prefix computation

Alternate modeling of Prefix Computer: Finite State Machine • We assume that number of states are small and finite. • Let sk = fak-1(sk-1), fak-1 can be represented by matrix Mak-1 • Now we are ready to represent our problem in terms of prefix computation.

Alternate Modeling of Prefix Computer: Finite State Machine • The algorithm • Compute Mai in parallel • Compute • N1 = Ma1 • N2 = Ma2.Ma1 • … • Nn = Man.Man-1…Ma1 • Compute Si+1= Ni(S0)

0/0 0/0 1/0 0/0 1/0 A B C 1/1 M0 M1 PS NS PS NS X=0 X=1 A B A A B B B C C B C A Prefix Computation • FSM example: • Given: • initial state S0=A • A sequence of inputs: (0 0 1 1 1 0 1 0 1) • Derive the sequence of outputs Compute N’s: N1=M0 N2=M0 M0 N3=M1 M0 M0 N4=M1 M1 M0 M0 … Input Sequence: 0 0 1 1 … State table

Graph Based Approach • Consider the (g p) chain • break the long paths g3 p3 g2 p2 C4 g1 p1 C1

Graph Based Approach • Generating g32 and p32 g3 p3 g2 p2 g1 p1 C4 g3 p3 g2 p2 C1 g32 p32

Graph Based Approach • Generating g10 and p10 g3 p3 g2 p2 g1 p1 C4 g1 p1 cin cin g10 p10

g3 p3 g2 p2 g32 p32 g1 p1 cin g10 p10 Graph Based Approach • Generating g30 and p30 g32 p32 g10 g30 p10 p30

Boolean Approach g4 + p4 ( g3 + p3 ( g2 + p2 ( g1 + p1 ( g0 + p0 cin ) ) ) ) g4 , p4 g3 , p3 g2 , p2 g1 , p1 g0 , p0 cin g4+p4g3 , p4p3 g2+p2g1 , p2p1 g0 , p0cin g4+p4g3+p4p3(g2+p2g1) , p4p3p2p1 g0 , p0cin g4+p4g3+p4p3(g2+p2g1)+(p4p3p2p1)g0 , (p4p3p2p1) p0cin

Given: n inputs (gi, pi) An operation o Compute: yi= (gi, pi) o … o (g1, p1) ( 1 <= i <= n) Associativity (A o B) o C = A o ( B o C) Prefix Adder a, i=1 aibi , otherwise 1, i=1 ai xor bi , otherwise gi= pi= • (g’’, p’’) o (g’, p’) = (g, p) • g=g’’ + p’’g’ • p=p’’p’

Prefix Adder: Graph Representation • Example: Ripple Carry Adder ai bi (gi , pi) x y xoy xoy

Prefix Adders: Conditional Sum Adder 8 7 6 5 4 3 2 1

Prefix Adders: Conditional Sum Adder • For output yi, there is an alphabetical tree covering inputs (xi, xi-1, …, x1) 8 7 6 5 4 3 2 1 • alphabetical tree: • Binary tree • Edges do not cross

Prefix Adders: Conditional Sum Adder • From input x1, there is a tree covering all outputs (yi, yi-1, …, y1) 8 7 6 5 4 3 2 1 • The nodes in this tree can be reduced to (g, p) o c = g+pc

Prefix Adders: size and depth • Objective: • Minimize # of nodes, sc(n). • Minimize depth, dc(n) • Ripple Carry Adder: • sc(8) = 7 • dc(8) = 7 • total = 14 • Conditional Sum Adder: • sc(8) = 12 • dc(8) = 3 • total = 15

Prefix Adder –Well-known and Well-developed? • Classic prefix networks: Sklansky, Kogge-Stone, Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles etc.

Prefix Adders: Brent – Kung Adder 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 • sc(16) = 26 • dc(16) = 6 • total = 32

Prefix Adder –New Respects, New Method • Realistic design considerations: Timing, Power and Area. • Integer Linear Programming for prefix adder: • Logic effort timing model (gate cap. + wire cap.) • Activity-statistic power model • Non-uniform signal arrival/required times Logic Levels Timing Power Area Max Fanouts Max Wire Tracks

Prefix Adder –Optimum Prefix adders • Uniform signal arrival/required times Sklansky Adder Kogge-Stone Adder Fastest depth-3 optimal prefix adder Fastest depth-4 optimal prefix adder

Prefix Adder –Optimum Prefix adders • Uniform signal arrival/required times

The Big Picture What is the minimum depth of zero-deficiency circuits for a given width?

Proof for Snir’s Theorem • Proof • Consider the alphabetical tree rooted at the MSB output with all the input nodes being its leaves; • The size of this tree is n-1 while its depth is dM; • At most dM prefix outputs can be generated from this tree; • At least one extra node is needed for the columns where the prefix results are not ready. Consequently size ≥ (n-1)+(n-(dM + 1)) = 2n -2 - dM which is size + depth ≥ 2n - 2 Given an arbitrary prefix graph of width n, we have depth + size ≥ 2n – 2

Backbone Affiliated Tree Definitions For a prefix circuit, define • Backbone • The binary alphabetical tree generating MSB prefix output; • Affiliated tree • rooted at the LSB input, with all the prefix outputs (except MSB output) as its tree nodes • Ridge • the path from the LSB input to the MSB output.

How to … ? • Look from the MSB output • Since the circuit is of zero-deficiency, the ridge has exactly d nodes (excluding the first input node), one node per level. • The idea: try to stretch the ridge as long as possible while maintaining zero-deficiency

T-tree • Definition of Tk(k) tree

T-tree example – T3(5)

A-tree • Definition of Ak(t) tree

A-tree example – A3(5)

Compound of A tree and T-tree

Example

Proposed Prefix Circuit

BK(32) 1 32 T3(5) + A3(5) 58 33 T2(6) + A2(6) T1(7) + A1(7) 80 59 88 81 An Example: Z(d)|d=8 Width = 88

The width of Z(d) Circuit • The width of Z(d) circuit is Nz(d) = F(d+3) – 1 (d≥1) Where F(i) are the Fibonacci numbers • Numerical Comparison LYD : Design by S. Lakshmivarahan, C.M. Yang & S.K. Dhall, 1987 LS : Design by Lin & Shish, 1999

Advanced Adder Design Techniques for Efficient Arithmetic Operations