1 / 39

Jackson Adders

bianca
Télécharger la présentation

Jackson Adders

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Jackson Adders Prof. David Money Harris 9 July 2010

    2. Jackson Adders 2 Overview Definitions Tree Adders Ling Adders Jackson Adders 18-bit Jackson Tree Evaluation Methodology Preliminary Results

    3. Jackson Adders 3 Addition Carry Propagate Adder Inputs: AN:0, BN:1 A0 = Cin Outputs: SN:1 Discard Cout

    4. Jackson Adders 4 Propagate, Generate, Kill Oh My! Bitwise Signals Generate: Gi:i = Gi = AiBi Propagate: Pi:i = Pi = Ai+Bi Also called ~Ki Xi = Ai xor Bi Group Recursion to form prefixes Propagate Pi:j = Pi:kPk-1:j Generate Gi:j = Gi:k+Pi:kGk-1:j Group generates if upper part generates or upper part propagates and the lower part generates Bitwise Sum Si = Xi xor Gi-1:0

    5. Jackson Adders 5 Higher Valency Groups Valency-2 Propagate Pi:j = Pi:kPk-1:j Generate Gi:j = Gi:k+Pi:kGk-1:j Valency-3 Propagate Pi:j = Pi:kPk-1:lPl-1:j Generate Gi:j = Gi:k+Pi:k (Gk-1:j+Pk-1:IGl-1:j) Valency-4 Propagate Pi:j = Pi:kPk-1:lPl-1:mPm-1:j Generate Gi:j = Gi:k+Pi:k(Gk-1:j+Pk-1:I(Gl-1:m+Pl-1:mGm-1:j))

    6. Jackson Adders 6 Tree Adders How should the recursion be organized?

    7. Jackson Adders 7 Black and Gray Cells Black cell: Group G and P Gray cell: Group G only Inverting vs. non Higher Valency

    8. Jackson Adders 8 Tree Adders

    9. Jackson Adders 9 Higher Valency Trees

    10. Jackson Adders 10 Sparse Trees Sklansky sparseness 4 Only compute prefixes for every 4th column Precompute 4-bit results for each possible carry in Select result based on carry (group generate)

    11. Jackson Adders 11 Carry Selection

    12. Jackson Adders 12 Ling Adders Factor some complexity out of first term Insert it back into sum selection Remove 1 transistor from critical path Exploits fact that GiPi = (AiBi)(Ai+Bi) = Gi

    13. Jackson Adders 13 Ling Equations Define Pseudogenerate: Hi:j = Gi + Gi-1:j Simpler than Gi:j = Gi + PiGi-1:j Recreate Gi:j = PiHi:j = Pi(Gi + Gi-1:j) = Gi + PiGi-1:j Define Pseudopropagate Ii:j = Pi-1:j-1 Shifted version of group propagate Valency-2 recursion is same as PG Hi:j = Hi:k + Ii:kHk-1:j Ii:j = Ii:kIk-1:j Sum: Si = Xi xor Gi-1:0 = Xi xor (Pi-1Hi-1:0) Selection mux: Si = Hi-1:0 ? [Xi xor Pi-1] : Xi

    14. Jackson Adders 14 Ling Circuits Simplifies first stage Compute Hi+1:I in one swell foop

    15. Jackson Adders 15 Jackson Adders Generalized Ling technique Simplify logic in the prefix tree as well Use sum selection to reinsert missing terms Balance logic so both data and select to sum mux are comparable in criticality Developed by Jackson and Talwar in 2004 Used in Arithmetica synthesis tool Parameterized by architecture, valency, sparseness Reportedly produced superior energy-delay tradeoffs Burgess09 indicates benefits over standard designs No comprehensible complete published designs

    16. Jackson Adders 16 Jackson Logic Define new terms D: a group generates or propagates a carry Special case: B: a group generates a carry in at least one bit Rewrite group generate: Group generates if upper part generates or propagates and either at least one bit of upper part generates or the low part generates

    17. Jackson Adders 17 Reduced Generate Again, Rename bracketed term reduced generate R Rp has the top p propgate signals stripped out R0i:j = Gi:j R1i:j = Hi:j Jackson consideres p = 2 Group generate can be rewritten in terms of R Computing R prefixes can be easier than G

    18. Jackson Adders 18 Hyperpropagate Another term will be useful for recursion: hyperpropagate Define Special case for 2-bit groups:

    19. Jackson Adders 19 Jackson Recursions Valency-2 is no simpler Valency-3 simplifies R at expense of Q

    20. Jackson Adders 20 Valency-3 Circuits Compound gate implementation Simpler gate implementation

    21. Jackson Adders 21 Logical Effort of Valency-3

    22. Jackson Adders 22 Sum Selection Select sum based on Rpi-1:0 Requires p-bit D signal for sum-selection data input This is the complexity that is factored out of R D recursion

    23. Jackson Adders 23 Prior Work [Jackson04] + Introduced R and Q + Showed how to compute a single sum output Does not show how to build an entire adder Does not include recursions for D, valency-2 R/Q [Burgess09] + Comments on critical path + Comparisons suggest benefits of Jackson adder - Hard to decipher diagram of 24-bit adder

    24. Jackson Adders 24 Example 18-bit Jackson Adder Sklansky tree with sparseness 2 Valency-2 initial stage (like Ling) Valency-3 2nd and 3rd stages Only 4 levels of noninverting logic

    25. Jackson Adders 25 Initial Stage Reduced Generate Hyperpropagate Also will need gi for even bits, pi for odd bits, xi for all bits For sum selection logic

    26. Jackson Adders 26 Second Stage Compute 3 and 6-bit group signals Note potential for sharing common terms

    27. Jackson Adders 27 Third Stage Reduced generate signals for all groups

    28. Jackson Adders 28 D Logic Medium-length groups of D are required for sum selection Note that D17:9 depends on R317:12 Hence, arrives at same time as R917:0

    29. Jackson Adders 29 Sum Selection Sparseness of 2 requires 1-bit ripple from even to odd

    30. Jackson Adders 30 Prefix Network

    31. Jackson Adders 31 Observations Only 4 levels of noninverting logic D17:9 is critical Too much factored out of R917:0 Could eliminate need by doing a 2-bit ripple into s18

    32. Jackson Adders 32 Comparison Methodology Goal: energy-delay curves for Jackson adders compared to conventional adders How can we objectively compare against the best conventional design? Technology mapping challenges Sizing Gatesizer limitations SCOT is better, but we only have 130 nm models Inadequate design effort on conventional cases Plan: synthesize with Design Compiler Compare against assign y = a + b;

    33. Jackson Adders 33 Preliminary Results 130 nm Artisan library for IBM CMOS8sf 1.2 V FO4 Delay: 55 ps Fastest designs are 570 ps (10 FO4) Jackson takes more energy except at very long delay s18 optimization helps at fastest delays

    34. Jackson Adders 34 Optimization Ideas Compare against Design Compiler architectures Starts with NAND/NOR to compute ~gi, ~pi Computes xi = pi * ~gi to avoid costly XORs Appears to use valency-2 Sklansky tree with inverting gates Final XOR Logical effort analysis of critical path Look for areas to reduce effort Architecture Valency: consider direct bitwise PG, followed by valency-3 Jackson tree Sparseness (sparseness 3 in tree above?, sparseness 1) Sklansky vs. Kogge-Stone Verilog coding Does sharing of terms explicitly help or hurt? Code tuning experiments

    35. Jackson Adders 35 Sun Feedback Issues raised at Sun review on 9 July 2010 Should we use SCOT to evaluate the effects of continuous sizing? Follow SCOT up with SPICE Start without wire loads, add later Wire load modeling in Design Compiler

    36. Jackson Adders 36 Short-Term Action Items Adder modeling (write eqns, code in Verilog, compare to DC) 32-bit Sklansky valency-2 baseline similar to DC NAND/NOR to form Pbar, Gbar G * Pbar to form X Inverting stages of group logic Final XOR Does it exactly match DC results? 27-bit Jackson (1-bit, followed by 3 radix-3 stages) 54-bit Jackson (2-bit Ling PG, followed by 3 radix-3 stages) Explore optimization of 18-bit design Logical effort analysis of critical path through 18-bit Jackson Tool to automatically generate energy-delay curves with DC Tool flow for DC 2010 with placement and expected wire cap Subversion repository setup Selection of cell library

    37. Jackson Adders 37 Cell Library IBM 45 nm partially-depleted SOI 12S ARM Library sc12_base_v31_rvt_soi12s0_ss_nominal_max_0p90v_125c_mxs.lib A12TR library with regular Vt (RVT) transistors 12 track cell height (1.68 mm) Typical operating point: 1.0 V, 25 C We use worst-case slow-slow, 0.9 V, 125 C library Use Maxsol (mxs) version for worst-case history effect 1X inverter INV_X1B_A12TR: Width = 0.38 mm Cin = 1.6 fF Intrinsic delay: 16.6 ps rise / 14.1 fall / 15.3 average Kload: 1.46 ps/pF rise / 1.17 fall / 1.3 average FO4 delay = 15.3 ps + 1.3 * 1.6 * 4 24 ps But .lib for 21 ps slew rate, 7.9 fF load suggests tpdf = 17 ps, tpdr = 23 ps, tpd = 20 ps, tf = 13 ps, tr = 23 ps Switching energy: 0.00078 mW/MHz 0.8 fJ equals 0.5 CinVDD2 Leakage power: 0.1 mW (very high!)

    38. Jackson Adders 38 Summary Jackson adders appear to offer potential benefits Logical effort Arithmetica results Burgess results Preliminary synthesis results dont yet demonstrate the advantages HMC 2010-11 Clay-Wolkin Research goals Understand Jackson design space Logical effort analysis of critical path Develop Jackson adders superior to conventional Design Compiler results

    39. Jackson Adders 39 References [Burgess09] N. Burgess, Implementation of recursive Ling adders in CMOS VLSI, Proc. Asilomar Conf. Signals, Systems and Computers, 2009, pp. 1777-1781. [Jackson04] R. Jackson and S. Talwar, High speed binary addition, Proc. Asilomar Conf. Signals, Systems and Computers, 2004, pp. 1350-1353. [Jackson08] R. Jackson, Data detection algorithms for perpendicular magnetic recording in the presence of strong media noise, Ph.D. thesis, Department of Mathematics, University of Warwick, 2008. [Ling81] H. Ling, "High-speed binary adder," IBM J. Research and Development, vol. 25, no. 3, May 1981, pp. 156-166. [Patil07] D. Patil, O. Azizi, M. Horowitz, R. Ho, and R. Ananthraman, "Robust energy-efficient adder topologies," Proc. Computer Arithmetic Symp., Jun. 2007, pp. 16-28. [Weste10] N. Weste and D. Money Harris, CMOS VLSI Design, 4th Ed., Boston: Addison-Wesley, 2010. [Zlatanovici09] R. Zlatanovici, S. Kao, and B. Nikolic, Energy-delay optimization of 64-bit carry-lookahead adders with a 240 ps 90 nm CMOS design example, IEEE J. Solid-State Circuits, vol. 44, no. 2, Feb. 2009, pp. 569-583.

More Related