230 likes | 406 Vues
A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products. Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University. f. e. c. a. d. b. q = c * d. p = a * b. p. q. z = p + q + e + f. z. What is a Sum-of-Product (SOP).
E N D
A Timing-Driven Synthesis Approach of a FastFour-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University
f e c a d b q = c * d p = a * b p q z = p + q + e + f z What is a Sum-of-Product (SOP) • An arithmetic Sum-of-Product block (SOP) consists of an arbitrary number of product terms and sum terms. • General form of SOP:
Multiplier {assign z = a * b} found in Microprocessors Multiply-Accumulator {assign z = (a * b) + c} found in Cryptographic Applications Squarer {assign z = a * a} found in DSP processors Addition Tree {assign z = a + b + c + d} found in ALU, Wireless applications Generalized SOP {assign z = (a * b) + (c * d)} found in FIR filters, IIR filters Examples of SOP Blocks
Synthesis of Sum-of-Products Inputs • Synthesis of Sum-of-Product blocks is done in 3 steps (in the order of data-flow) • Creation of Partial Products • Reduction of Partial Products into 2 operands • Computation of Final Sum by adding the 2 operands Creation of Partial Products Reduction of Partial Products Computation of Final Sum Output
Motivation and Problem Statement • SOP blocks are widely used and computationally-intensive • Final adder in SOP consumes about 30% to 40% delay of the SOP block. This paper focuses on the synthesis of an efficient final adder for a SOP expression • Stand-alone adder architectures do not work well in SOP
Stand-alone Adder Architectures • Frequently used adder architectures • Ripple-Carry • Area-efficient, but slow • Timing-efficient if inputs have skewed arrival time • Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) • Faster architecture • Requires more area • Carry-Select • Large area overhead (often >100%) • Better delay if Cin signal arrives late. • None of these are very suitable in Sum-of-Products • Why?
Special Arrival-time Property • The 2 operands of the final adder in a SOP exhibit a peculiar arrival time pattern • As a result, traditional monolithic adders do not work well in SOP • Optimized for equal arrival times • Hence, hybrid adders are required, which exploit this arrival-time pattern • Hence it is critical to synthesize an efficient hybrid adder which is designed specifically for SOP blocks
Proposed 4-Stage Hybrid Adder w1 w2 w3 w1 w4 w2 w3 w4 SubAdder1 RippleCarry SubAdder2 KoggeStone SubAdder3 CarrySelect SubAdder4 CarrySelect w1 w2 w3 w4 • Ripple-Carry architecture near LSB • Fast Kogge-Stone architecture near Middle • 2 Carry-Selects (based on Brent-Kung) near MSB • GOAL : Find w1 , w2 , w3 and w4 algorithmically
Notations • We use the following notations: • The bit-width of SubAdder1 (Ripple) is w1 bits • The bit-width of SubAdder2 (Kogge-Stone) is w2 bits • The bit-width of SubAdder3 (Carry-Select, Brent-Kung) is w3 bits • The bit-width of SubAdder4 (Carry-Select, Brent-Kung) is w4 bits • w1 + w2 + w3 + w4 = n (total width of the hybrid adder) • T(ai) = Time when input signal ai is available • T(Si) = Time when output signal Si (Sumi) is available • T(Ci) = Time when output signal Ci (Carryi) is available
x0 x1 y1 y0 FA FA z1 z0 SubAdder1 (Ripple-Carry) xk yk x2 y2 • Most area-efficient architecture • Very slow • Timing-efficient if input arrival time is skewed. We use it for a few bits near LSB (which arrive earliest) FA FA zk+1 zk z2
Parallel-Prefix Adders (KS, BK) • In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the Generate and Propagate concept). • For each bit i of the adder, Generate (Gi) indicates whether a carry is generated from that bit • Gi = ai bi • For each bit i of the adder, Propagate (Pi) indicates whether a carry is propagated through that bit • Pi = ai bi • The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss next
(Gright, Pright ) (Gleft, Pleft) (Gleft, right, Pleft, right ) Parallel-Prefix Adders (KS, BK) • If two blocks (comprising one or more bits) have the GP value-pairs as (Gleft, Pleft) and (Gright, Pright), then the combined block has the GP values as follows: • Gleft, right = Gleft (Pleft Gright) • Pleft, right = Pleft Pright • The above computation is performed by a carry-operator or ”o”-operator • Once we obtain carry for each bit, it is trivial to compute the sum output of each bit (XOR and NAND)
SubAdder2 (Kogge-Stone) GP0 GP6 GP7 GP2 GP4 GP5 GP3 GP1 C7 C1 C8 C3 C5 C4 C6 C2 • Kogge-Stone Parallel prefix architecture • Delay: log2n levelsof ”o”-operator • Area: (n*log2n)-n+1 number of ”o”-operator Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973
Brent-Kung (BK) GP0 GP6 GP7 GP2 GP4 GP5 GP3 GP1 C7 C1 C8 C3 C5 C4 C6 C2 • Brent-Kung Parallel prefix architecture • Delay: (2*log2n)-2 levels of ”o”-operator • Area: (2*n)-2-log2n number of ”o”-operator Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982
SubAdder3 & SubAdder4 (Carry-Select) y x y x • Large area overhead • Used as a special case, since Cin arrives late • Speed depends on the architecture of two adders • But these adders need not be KS (rather, we use BK) • The arrival times of the inputs of SubAdder3 and SubAdder4 are earlier than those for SubAdder2 1’b1 1’b0 Adder0 Adder1 z1 z0 Mux cin z
Determination of width of SubAdder1 • Width of the Ripple adder (SubAdder1) • At every bit (i), compute T(Ci+1) and check if • T(Ci+1) ≤ T(ai+1) • T(Ci+1) ≤ T(bi+1) • If check passes, i = i+1 • Else continue checking until 3 consecutive bits fail the check (Hill Climbing) • Return the value i as the Ripple Adder width
Determination of width of SubAdder2 • Width of Kogge-Stone Adder (SubAdder2) • The latest arriving signals are part of this adder • Hence keep this adder wide, while ensuring that this does not result in a very narrow Carry-Select adder for SubAdder3 and SubAdder4 • We determine the widths with the following equation: • w2 = n – w1 if (n-w1) ≤ 8 • w2 = 2p, where p = log2 (n-w1) if (n-w1) > 8 • Example: If n=32 and w1=7 then w2=16
Delay of the Hybrid Adder w1 w2 w3 w1 w4 w2 w3 w4 SubAdder1 RippleCarry SubAdder2 KoggeStone SubAdder3 CarrySelect SubAdder4 CarrySelect w1 w2 w3 w4 T(C4) T(S4) T(S3) T(S2) Thybrid = max (T(C4), T(S4), T(S3), T(S2))
Determination of widths of SubAdder3 andSubAdder4 • Width of the two Carry-Select adders • Initial width configuration • w3 = (n-w1-w2)/2 • w4 = (n-w1-w2-w3) • With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide) • Use an iterative approach to explore in the appropriate direction (similar to Binary Search) and converge on the smallest delay configuration
Experimental Setup • To test our approach, we used: • Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer) • Two process technologies (0.13µ and 0.09µ) • Two commercial library vendors • Two different arrival time constraints • We compared the results of our hybrid adder with the adder produced by a commercial datapath synthesis tool.
Results On an average, 14.31%faster than the result of the commercial Synthesis tool (with 6.62% area penalty)
Summary • Hybrid adder consists of 4 SubAdders • SubAdder1 has Ripple-Carry architecture • SubAdder2 has Kogge-Stone architecture • SubAdder3 and SubAdder4 have Carry-Select (based on Brent-Kung) architecture • Widths of all SubAdders are computed based on a timing-driven analysis • On an average, 14.31% faster (with 6.62% area penalty)