1 / 43

Circuit Performance and Adders

Circuit Performance and Adders. Recap from last time Hardware Design is Complicated Because We Want Circuits to Go Fast Combinational Logic: Used A Simple Model of Delay Integer Delay on Each Gate Reduction of Circuit to Directed Acyclic Graph

mikasi
Télécharger la présentation

Circuit Performance and Adders

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Circuit Performance and Adders • Recap from last time • Hardware Design is Complicated Because We Want Circuits to Go Fast • Combinational Logic: Used A Simple Model of Delay • Integer Delay on Each Gate • Reduction of Circuit to Directed Acyclic Graph • Delay of Circuit (= Clock Period) is longest path in graph • Making Circuits Go Fast = Shortening Longest Path • Exploit Asymmetry between path lengths • Shorten Longest Path by • Introducing Redundant Logic • Moving Logic from Long to Short Paths • We will see a different technique today! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 1

  2. Delay Model of a Circuit • Translate circuit into graph • Weights on nodes are delay through gates • Delay through circuit is longest path through graph • Easy, linear-time algorithm A 2 B 1 C 1 D CS 150 – Spring 2008 – Lec #15: Ckt Performance - 2

  3. Circuit Performance Model Latches Latches Combinational Logic Inputs stabilize at 0 Logic finishes when last output stabilizes CS 150 – Spring 2008 – Lec #15: Ckt Performance - 3

  4. Circuit Performance Model • Outputs of latches are stable only at clock edge • Inputs to latches must be stable by next clock edge • Time between clock edges must be > delay of combinational logic Latches Latches Combinational Logic CS 150 – Spring 2008 – Lec #15: Ckt Performance - 4

  5. Adders • Highly-Studied Circuit, so case study in design • “Ripple-carry” adder: standard adder where carry ripples from one bit to another • Longest path for n-bit adder is O(n) • Number of gates for n-bit adder is O(n) • “Carry Lookahead”: Accelerate carry chain • Collapse carry into all bits • O(log n) delay (optimal!) • O(n^3) gates (terrible!) • Practical Compromise is block-accelerated adders • Block-carry lookahead • Carry-select adder CS 150 – Spring 2008 – Lec #15: Ckt Performance - 5

  6. m-bit CLA adder m-bit CLA adder m-bit CLA adder GG GG GG PP PG PG Hierarchical Carry lookahead • PHG, GG used as propagate, generate inputs to hierarchical block Carry Lookahead Block PG1 GG1 PG2 GG2 PG0 GG0 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 6

  7. Synopsis of Hierarchical Carry-Lookahead • n-bit adder, m-bit blocks, n/m blocks • Delay is 2 log n + 2 log m • Size is max(nm^2, (n/m)^3) • Best is m = n^2/5 • Delay is 14/5 log n, size is O(n^9/5) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 7

  8. Analysis of the Carry-Lookahead Adder • n bit adder, m-bit blocks, n/m blocks • Delay through the adder: 2 * delay through the lookahead block + delay through the super-lookahead block • Lookahead block 2 log m • Super-block: 2 log n/m = 2 log n – 2 log m • Total: 2 log n + 2 log m • Logic: scales like the lookahead blocks • Size p block: O(p^3) from before • Two size of blocks: n/m blocks of size m, one block of size n/m • Total: n/m * m^3 = nm^2, (n/m)^3 • Choose m to minimize max(nm^2,(n/m)^3) • Solution at m=n^(2/5).Total is n + n^3/5 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 8

  9. Carry Select Adder • “Combinational Speculative Execution” • Basic intuition: • Adders spend time waiting to see what carry-in is • Therefore • Go ahead and guess each way • Pick the right answer when the carry comes by CS 150 – Spring 2008 – Lec #15: Ckt Performance - 9

  10. Carry-Select adder • Each block is doubled • One block computes Carry-in=0, other carry-in=1 • Actual carry-in (carry-out from previous block) computes result • m sum bits • 1 carry-out bit 0 1 m-bit block m-bit block m-bit block m m 1 0 1 0 Block 0 m Block 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 10

  11. Analysis of Carry-Select Adder • Delay analysis: Worst-case path is through Block0 then control of multiplexer chain • O(m) gates in Block0 • O(p = n/m) gates in multiplexer chain Blockp1 Blockp0 Block21 Block20 Block11 Block10 Block0 • Choose m to minimize max(n/m, m) • Minimum is to choose m= Ön CS 150 – Spring 2008 – Lec #15: Ckt Performance - 11

  12. Twelve-bit Carry-Select Example • Problem: add -3 (0xffd, 111111111101) to 17 (0x011, 000000010001)) • Use 4-bit carry select blocks 1 d 0 f f 0 1 f f 1 1 0 1 0 0,f 0,0 0,0 0,1 0 e 0 0,0 0 Result is 0xe (14) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 12

  13. Hardware for the Carry Select Adder • Ön blocks, each of Ön gates • Additional hardware is Ön multiplexers + additional adder for each block but the first • n - Ön additional adder bits • Therefore Ön + 2n - Ön = 2n gates • Exactly twice the size of an ordinary adder, but delay is Ön instead of n CS 150 – Spring 2008 – Lec #15: Ckt Performance - 13

  14. Carry-Bypass Adder • Like the carry-select adder, has O(Ön) delay • But even more efficient (in terms of gates) than the carry-select • Has only n + Ön log n gates • However, it broke every timing analyzer… • Instead of shortening the longest path, made it longer! • How can this be? Isn’t the delay of the circuit the length of the longest path?... CS 150 – Spring 2008 – Lec #15: Ckt Performance - 14

  15. What is the delay of the Circuit? • The delay of a circuit is the time that the last output settles • This can be the length of the longest path, but sometimes isn’t • The longest path is an upper bound on the delay of the circuit, but sometimes this isn’t tight CS 150 – Spring 2008 – Lec #15: Ckt Performance - 15

  16. Example • Long paths are from X,Y->out through bottom of circuit • But no signal can travel down these paths! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 16

  17. 1 1 1 Example 1 1 t=0 t=1 0 1 t=2 t=3 t=4 0 t=6 1 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 17

  18. Timing Analysis Longest path is 8, but no signal ever travels down it! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 18

  19. What happened? • Long Paths are false • A->B requires z=1 • B->C requires z=0 • Conflict! No signal can propagate down this path • This analysis doesn’t quite work • Analysis has to take into account delays • Complete theory not understood till 1993 • This is good enough for carry-bypass adder CS 150 – Spring 2008 – Lec #15: Ckt Performance - 19

  20. Announcements • Prof. Pister will lecture on wireless protocol Thursday • Need this for your project • Spring Break • Tuesday 4/1 – TBD • Thursday 4/3 – MT review • Tuesday 4/8 – MT 2 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 20

  21. False Paths and Adders • Key idea: Don’t make critical paths in adder short • Idea behind Carry Lookahead and Carry-Select adders • Instead, make long paths false • Critical Path is Through the Carry Chain • Only exercised when propagate bit through every block is set? • (Question: is this likely?) • Therefore: when signal would propagate through carry chain, skip the block! • Recall from block carry-lookahead adder: Group Propagate PG = P0P1P2P3 • When PG=1 have the carry skip the whole block! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 21

  22. Carry-Skip Block Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 22

  23. Suppose Carry-in Propagates to Carry-Out… Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 23

  24. Then PG=1 Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 24

  25. So Path goes Through the 1-port of the MUX Carry-in to next block m-bit ripple-carry adder Carry-in PG 0 1 Delay is 1-MUX delay, not 4 propagate delays! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 25

  26. 0 0 0 1 1 1 Full Carry-Bypass Adder Block 0 Block n/m Block 1 Carry-in PG As before, n/m array of m-bit blocks CS 150 – Spring 2008 – Lec #15: Ckt Performance - 26

  27. 0 0 0 1 1 1 Full Carry-Bypass Adder: Worst-case path Block 0 Block n/m -1 Block 1 Carry-in PG Worst-case path goes through m-1 bits of block 0, n/m-2 1 gates of multiplexer, m-1 bits of block n/m -1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 27

  28. Timing and Size Analysis • Delay = 2 * (m – 1) + n/m – 2 • Choose m to minimize delay => m= Ön • We have Delay = 2 * (Ön – 1) + Ön – 2 = 3 Ön – 4 • What’s the additional circuitry? • log m gates to build PG (1 per block) • 1 two-input multiplexer per block • n/m blocks • => n/m (log m + 1) • m = Ön => Ön (log n/2 + 1) • Same delay as carry-select, but much smaller (n + Ön) vs 2n CS 150 – Spring 2008 – Lec #15: Ckt Performance - 28

  29. 0 0 0 1 1 1 Full Carry-Bypass Adder: Longest path Block 0 Block n/m -1 Block 1 Carry-in PG Longest path goes through all blocks and all multiplexers: m * n/m + n/m CS 150 – Spring 2008 – Lec #15: Ckt Performance - 29

  30. Longest Path vs Circuit Delay • Longest Path is n + Ön • Worst-case path is Ön • Worst-case path for ripple-carry is n • Made things better, but a timing analyzer thinks it’s worse! • Stimulated tremendous interest in timing analyzers! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 30

  31. Adder Summary CS 150 – Spring 2008 – Lec #15: Ckt Performance - 31

  32. A comment on n • Asymptotic results tell us what happens at infinity • For our purposes, n=16, 32, 64 • Means: square root n = 4 – 8 • Means: Log n = 4-6 • For the sizes we are interested in, carry-select and carry-bypass are as fast as block CLA CS 150 – Spring 2008 – Lec #15: Ckt Performance - 32

  33. Remaining Questions (just for fun) • How often does worst-case delay path occur in Carry-bypass adder? • How do we automatically analyze for false paths? CS 150 – Spring 2008 – Lec #15: Ckt Performance - 33

  34. ü ü How often does (near) worst-case delay occur? • Worst case delay: Pi = 1 for all i > j, small j • Pi=AiÅ Bi • How often is Pi=AiÅ Bi = 1? Only two of nine cases, but they happen frequently CS 150 – Spring 2008 – Lec #15: Ckt Performance - 34

  35. How hard is it to analyze false paths? • Hard! • Problem noticed in early timing verifiers in the 1970’s • Early researchers (Hitchcock, Jouppi, Ousterhout) used hand-done rules • Often wrong (if it’s hard to analyze automatically, it’s hard to guess right by hand) • Next: “Static sensitization” • Assert “non-controlling’’ values on side inputs (0 for OR/NOR, 1 for AND/NAND) • Make sure assignments are consistent • Problem: Values are changing! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 35

  36. Example • To sensitize a->d->f->g, note: a->d requires b=1 • But b=1 => e=0, and f->g requires b=1 • Similar argument says you can’t set b->d->f->f CS 150 – Spring 2008 – Lec #15: Ckt Performance - 36

  37. But… Delay of the circuit is 3! Path a->d->f->g really was true CS 150 – Spring 2008 – Lec #15: Ckt Performance - 37

  38. Key Problem • All inputs are changing… • a->d requires b=1 means b=1 stable at t=0 • But b changes to 0 at t=0 • Therefore, value of b is unknown (X) • Also, delays of gates are unknown • “1” really means [0,1] CS 150 – Spring 2008 – Lec #15: Ckt Performance - 38

  39. Key Idea: Derive Function for each time d= 1 at 1 d = 0 at 1 d = X at 1 CS 150 – Spring 2008 – Lec #15: Ckt Performance - 39

  40. Key Idea: Derive Function for each time (d= 1 at 1) = (a=1 at 0) and (b = 1 at 0) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 40

  41. Key Idea: Derive Function for each time (d= 0 at 1) = (a=0 at 0) or (b = 0 at 0) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 41

  42. Key Idea: Derive Function for each time (d= 0 at 1) = (d=1 at 1) nor (d = 1 at 0) CS 150 – Spring 2008 – Lec #15: Ckt Performance - 42

  43. Delay of the Circuit • Delay of the Circuit is the latest t such that (“output = X at t”) is not == 0 • Problem is NP-complete • Size of problem is linear in number of time slices x number of gates • Mathematical machinery fairly massive • “Special Theory”: 1989 – handled symmetric gates, zero-lower-bounded delays (all signals were X until they hit their final values) • Other cases were conservatively approximated • “General Theory”: 1993 – handled all gates, general delay models • Gave exact answers for all delay types • Still hasn’t quite reached industrial practice! CS 150 – Spring 2008 – Lec #15: Ckt Performance - 43

More Related