1 / 52

Bridging the gap between asynchronous design and designers

Bridging the gap between asynchronous design and designers. Peter A. Beerel Fulcrum Microsystems, Calabasas Hills, CA, USA Jordi Cortadella Universitat Polit è cnica de Catalunya, Barcelona, Spain Alex Kondratyev Cadence Berkeley Labs, Berkeley, CA, USA. Outline.

amelia
Télécharger la présentation

Bridging the gap between asynchronous design and designers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the gap between asynchronous designand designers Peter A. Beerel Fulcrum Microsystems, Calabasas Hills, CA, USA Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona, Spain Alex Kondratyev Cadence Berkeley Labs, Berkeley, CA, USA

  2. Outline • Basic concepts on asynchronous circuit designTea Break • Logic synthesis from concurrent specifications • Synchronization of complex systemsLunch • Design automation for asynchronous circuitsTea Break • Industrial experiences

  3. Basic concepts on asynchronous circuit design

  4. Outline • What is an asynchronous circuit ? • Asynchronous communication • Asynchronous design styles (Micropipelines) • Asynchronous logic building blocks • Control specification and implementation • Delay models and classes of async circuits • Channel-based design • Why asynchronous circuits ?

  5. R CL R CL R CL R CLK Synchronous circuit Implicit (global) synchronization between blocks Clock period > Max Delay (CL + R)

  6. Asynchronous circuit Ack R CL R CL R CL R Req Explicit (local) synchronization: Req / Ack handshakes

  7. Motivation for asynchronous • Asynchronous design is often unavoidable: • Asynchronous interfaces, arbiters etc. • Modern clocking is multi–phase and distributed –and virtually ‘asynchronous’ (cf. GALS – next slide): • Mesachronous (clock travels together with data) • Local (possibly stretchable) clock generation • Robust asynchronous design flow is coming(e.g. VLSI programming from Philips, Balsa fromUniv. of Manchester, NCL from Theseus Logic …)

  8. Globally Async Locally Sync (GALS) Asynchronous World Clocked Domain Req3 Req1 R R CL Ack3 Ack1 Local CLK Req4 Req2 Ack4 Ack2 Async-to-sync Wrapper

  9. Key Design Differences • Synchronous logic design: • proceeds without taking timing correctness(hazards, signal ack–ing etc.) into account • Combinational logic and memory latches(registers) are built separately • Static timing analysis of CL is sufficient todetermine the Max Delay (clock period) • Fixed set–up and hold conditions for latches

  10. Key Design Differences • Asynchronous logic design: • Must ensure hazard–freedom, signal ack–ing,local timing constraints • Combinational logic and memory latches (registers) are often mixed in “complex gates” • Dynamic timing analysis of logic is needed to determine relative delays between paths • To avoid complex issues, circuits may be builtas Delay-insensitive and/or Speed-independent (as discussed later)

  11. Verification and Testing Differences • Synchronous logic verification and testing: • Only functional correctness aspect is verified and tested • Testing can be done with standard ATE and at low speed (but high–speed may be required for DSM) • Asynchronous logic verification and testing: • In addition to functional correctness, temporal aspect is crucial: e.g. causality and order, deadlock–freedom • Testing must cover faults in complex gates (logic+memory) and must proceed at normal operation rate • Delay fault testing may be needed

  12. Synchronous communication • Clock edges determine the time instants where data must be sampled • Data wires may glitch between clock edges(set–up/hold times must be satisfied) • Data are transmitted at a fixed rate(clock frequency) 1 1 0 0 1 0

  13. Dual rail 1 1 1 • Two wires with L(low) and H (high) per bit • “LL” = “spacer”, “LH” = “0”, “HL” = “1” • n–bit data communication requires 2n wires • Each bit is self-timed • Other delay-insensitive codes exist (e.g. k-of-n)and event–based signalling (choice criteria: pin and power efficiency) 0 0 0

  14. Bundled data • Validity signal • Similar to an aperiodic local clock • n–bit data communication requires n+1 wires • Data wires may glitch when no valid • Signaling protocols • level sensitive (latch) • transition sensitive (register): 2–phase / 4–phase 1 1 0 0 1 0

  15. Example: memory read cycle Valid address • Transition signaling, 4-phase Address A A Valid data Data D D

  16. Example: memory read cycle Valid address • Transition signaling, 2-phase A A Address Valid data Data D D

  17. Asynchronous modules DATA PATH • Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+reqin- start- [reset] done- reqout- ackout- ackin-(more concurrency is also possible) Data IN Data OUT start done req in req out CONTROL ack in ack out

  18. A C Z B A B Z+ 0 0 0 0 1 Z 1 0 Z 1 1 1 Asynchronous latches: C element Vdd A B Z B A Z B A Z Static Logic Implementation A B [van Berkel 91] Gnd

  19. Vdd A B Z B A Gnd C-element: Other implementations Vdd A Weak inverter B Z B A Dynamic Quasi-Static Gnd

  20. A.t C.t B.t A.f C.f B.f Dual-rail logic Dual-rail AND gate Valid behavior for monotonic environment

  21. done C Completion detection tree Completion detection Dual-rail logic • • • • • •

  22. Differential cascode voltage switch logic start Z.f Z.t done A.t N-type transistor network C.f B.f A.f B.t C.t start 3–input AND/NAND gate

  23. Examples of dual-rail design • Asynchronous dual-rail ripple-carry adder(A. Martin, 1991) • Critical delay is proportional to logN(N=number of bits) • 32–bit adder delay (1.6m MOSIS CMOS): 11 ns versus 40 ns for synchronous • Async cell transistor count = 34versus synchronous = 28 • More recent success stories (modularity and automatic synthesis) of dual-rail logic fromNull-Convention Logic (Theseus Logic)

  24. start done delay Bundled-data logic blocks Single-rail logic • • • • • • Conventional logic + matched delay

  25. r1 g1 C d1 r2 g2 d2 r1 a1 r a r2 a2 sel outf in outt Micropipelines (Sutherland 89) Micropipeline (2-phase) control blocks Request-Grant-Done (RGD)Arbiter Join Merge out0 in out1 Select Toggle Call

  26. C C C delay delay delay Micropipelines (Sutherland 89) Aout Ain C L logic L logic L logic L Rin Rout

  27. Data-path / Control L logic L logic L logic L Rin Rout CONTROL Ain Aout

  28. Control specification A+ A B+ B A– A input B output B–

  29. Control specification A+ B– B A A– B+

  30. C Control specification A+ B+ A C+ C B A– B– C–

  31. C Control specification A+ B+ A C+ C A– B B– C–

  32. Ro+ Ri+ Ri Ro FIFO cntrl Ao+ Ai+ Ao Ai Ro- Ri- C C Ai- Ao- Ri Ro Ao Ai Control specification

  33. A simple filter: specification IN Ain Rin y := 0; loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x; end loop filter Aout Rout OUT

  34. + OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout A simple filter: block diagram • x and y are level-sensitive latches (transparent when R=1) • + is a bundled-data adder (matched delay between Ra and Aa) • Rin indicates the validity of IN • After Ain+ the environment is allowed to change IN • (Rout,Aout) control a level-sensitive latch at the output

  35. + OUT x y IN Ry Ay Rx Ax Ra Aa Rin Rout control Ain Aout Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout– Ra– Ry– Rx– Rin– Aout– Aa– Ay– Ax– Ain– A simple filter: control spec.

  36. Rx Ax Aa Ry Ra Ay Aout C Ain Rout Rin Rout+ Ra+ Ry+ Rx+ Rin+ Aout+ Aa+ Ay+ Ax+ Ain+ Rout– Ra– Ry– Rx– Rin– Aout– Aa– Ay– Ax– Ain– A simple filter: control impl.

  37. x’ z+ x– x y z’ z x+ y+ z– y– Taking delays into account • Delay assumptions: • Environment: 3 time units • Gates: 1 time unit events: x+  x’–  y+  z+  z’–  x–  x’+  z–  z’+  y–  time: 3 4 5 6 7 9 10 12 13 14

  38. z+ x– x+ y+ z– y– Taking delays into account x’ x y z’ z very slow Delay assumptions: unbounded delays events: x+  x’–  y+  z+  x–  x’+  y– failure ! time: 3 4 5 6 9 10 11

  39. Gate vs wire delay models • Gate delay model: delays in gates, no delays in wires • Wire delay model: delays in gates and wires

  40. DI Delay models for async. circuits • Bounded delays (BD): realistic for gates and wires. • Technology mapping is easy, verification is difficult • Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. • Technology mapping is more difficult, verification is easy • Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. • DI class (built out of basic gates) is almost empty • Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). • In practice it is the same as speed independent BD SI  QDI

  41. Channel-Based Design Synchronization and communication between blocks implemented with handshaking using asynchronous channels by sending/receiving “data tokens” Asynchronous channel clock Synchronous System Asynchronous System

  42. Channel Design – Single Rail 3 1 Req sender Ack receiver 2 4 • Features • One request wire • One wire per data bit • One acknowledgment wire • Has timing assumptions Req Ack Data Data Data stable 4-phase bundled-data channel

  43. 2 4 Ack Ack sender receiver 1 3 Data Data (1-of-N) 4-phase 1-of-N channel Channel Design: Dual Rail & 1-of-N • Dual Rail • Two wires per data bit • One acknowledgment wire • Advantage: • Supports delay-insensitive design • 1-of-N • Generalization of dual-rail

  44. Reg A Reg B BN-3 BN-2 BN-1 Adder Multiplier leaf cells channels FAN-2 FAN-1 FA0 FAN-3 ASIC Reg C Main FSM Subtract/ Divider Register Bank Memory Adder/ Mult. Anatomy of a Channel-Based Asynchronous Design • Architecture is typically a multi-level hierarchy of communicating blocks Yields a hierarchical netlist of cells, where at each level blocks communicate along channels

  45. Asynchronous Cells F Output Channels Input Channels • Definition • Smallest element that communicates with its neighbors along asynchronous channels • Functionality • Reads a subset of input channels • Computes F and writes to a subset of output channels • Linear Pipelines • Only one input and one output channel F

  46. Cells for Non-Linear Pipelines • Non-Linear Pipelines • Joins and Forks • Conditional Joins: Read only some of the input channels • Conditional Splits: Write only to some of the output channels F F Fork Join F F Conditional Join Conditional Split

  47. C LCD RCD LCD F 2-input 1-output pipeline stage C LCD RCD RCD F 1-input 2-output pipeline stage Template-Based Leaf-Cell Design • Each pipeline style (QDI, timed…) has a different blueprint • Create a library using a blueprint to implement the lowest level communicating blocks C LCD RCD F Blueprint for a QDI N-input M-output pipeline stage

  48. Template-Based Leaf-Cell Design • Pros • Enables fine-grain 2-D pipelining yielding high-performance • Simplifies logic synthesis by enabling simple control circuit generation and re-use of typical datapath synthesis • Leaf-cells can be layed-out and verified creating a leaf-cell library, localizing timing assumptions • Cons • Unified template may not be optimal in all cases • Particularly, less effective for non-pipelined architectures with more complicated control

  49. Motivation (designer’s view) • Modularity for system-on-chip design • Plug-and-play interconnectivity • Average-case peformance • No worst-case delay synchronization • Many interfaces are asynchronous • Buses, networks, ...

  50. Motivation (technology aspects) • Low power • Automatic clock gating • Electromagnetic compatibility • No peak currents around clock edges • Security • No ‘electro–magnetic difference’ between logical ‘0’ and ‘1’in dual rail code • Robustness • High immunity to technology and environment variations (temperature, power supply, ...)

More Related