180 likes | 308 Vues
This paper presents a comprehensive approach to enhancing observability and operand isolation in high-throughput asynchronous pipelines. It addresses critical challenges in designing efficient systems, such as power optimization and integration with synchronous CAD tools. By utilizing three-valued logic and conditional communication techniques, the proposed methods aim to improve performance while minimizing area overhead. Empirical results suggest significant power reductions through targeted operand isolation without incurring performance penalties. The findings have implications for various applications, including Ethernet switches and high-speed FPGAs.
E N D
Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF) Patmos 2012, Sep 2012, Newcastle upon Tyne
Asynchronous Circuit Design - Today Applications • 3D Network on chips (STMicroelectronics) • Ethernet Switches (Intel SRD) • Ultra high-speed FPGAs (Achronix) • Process variation • Low-power chip design (Encryption – Tiempo, …) Basic challenges: Automation Proteus design flow (USC) • Uses commercial synchronous CAD tools • Starting at a high-level specification written in SVC (SystemVerilogCSP) STMicroelectronics WIOMING 3D-IC (July 2012) AchronixFPGA. 1.7 M LUTs. 2.1 Gbps IO TiempoTAM16 - Clockless16-bit microcontroller Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G) - 1.2 B transistors, 90% Asynchronous 13% Proteus
Proteus/ Sync Library Sync Library Clock Gating Clock Gating ClockFree Netlist Constraints Clock Tree Synthesis The Proteus Flow System- Verilog Key Features • Re-uses synchronous EDA tools • Seamless integration into existing flows • Up to 2X higher performance Tool Status • Started at USC Async CAD/VLSI • Commercialized by TimeLess (2008) • Acquired by Fulcrum (2010) • Intel Acquired Fulcrum (2011) • Used in Intel Ethernet Alta FM6000 chip The Problem • Limited and manual power optimization Verilog Design Goals SVC2RTL Constraints Synth. RTL Synthesis Image Netlist Netlist Constraints Constraints AsyncNetlist Netlist Physical Design Final Layout
Conditional Communication in Proteus 0 Dummy value 0 0 Not sent 1 Not received 1
Example: ALU SVC Description No conditionality in high-level description
Reconvergingfanouts + Unnecessary calculation
Adding Isolation Cells • All inputs/outputs are unconditional • Operand Isolation • And-based isolation cells • Generated by synchronous RTL synthesizer • Does not prevent swit ching in asynchronous circuits Isolation cells are not effective in asynchronous circuits
Three-valued logic • Formal justification of conditioning • Three-valued logic image model • Each iteration is modeled by a clock cycle • Each variable can be 0, 1, or N (no token) One iteration Status of each channel
3VL Unconditional Functions Unconditional functions • Can be represented only by , , operators • Example: functions represented by combinational gates in a typical cell library: NAND, NOR, AOI, XOR, … • Lemma 1: the output is Niff at least one of the inputs is N.
SEND/RECEIVE Operators • Conditional Communication • RECEIVE and SEND are modeled as Ⓡ and Ⓢ operators Behave like buffers when E=1
SEND Reconditioning Assuming y=f(x) is unconditional and e TFO(y) • Lemma 2: Application: SEND cells can be moved through logic • Similar to retiming in synchronous circuits Less number of SENDs Less switching when e=0
Observability in 3V Networks Local Observability Partial Care (LOPC) • OPC(f,C,xj) of input xjof a node representing a function fis the condition under which f’s output is not affected as xjchanges in C {0,1,N} Global Observability Partial Care (GOPC) • GOPC(C,x) of a variable x is the condition under which the value of no primary output is affected as the value of x changes in C {0,1,N} • Example: s =1 i1 changes in {0,1} are not observable when… i2 =0 or i2 =1
GOPC Conditioning When xj is not observable… • Add a SEND followed by a RECEIVE • Move the SENDs using SEND reconditioning SEND Reconditioning • Lemma 3: N N 0 N 1 N 0 or 1 N
Conditioning & + + 0 0 No Activity
Inserting Isolating Nodes and Recognizing Enable Domains Synchronous synthesis tools can insert isolating nodes • Constrained to insert isolating nodes only on non-critical paths Node u is in e’s Enable Domain OIED(e) if • All paths starting from a primary input and ending at u include an isolating node controlled by e • Detected using a DFS search
Pre-layout Analysis • Wu : power of receiving data on all inputs and sending the output (unconditional nodes) • K: power of conditional nodes • rf: activity factor Power of each domain Total power Domain power after isolation (n inputs) Benefit of isolating each domain
Post-layout Experimental Results • Case study: 32-bit ALU placed and routed • Back annotated switching activity using a VCD file • Results: • Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2 • 53% power reduction when only isolating MUL (rf=0.25) • Area cost of isolating MUL is about 4% andno performance penalty
Conclusions and Future Work Conditional communication in async. circuits is not free • Creates area and performance overheads • Requires manual or automatic optimization Asynchronous circuits can/should leverage sync. tools • This paper is first to use 3-valued-logic and observability don’t cares for power optimization of asynchronous circuits Our future work • Evaluate the proposed method on bigger designs • Adopt other sync power optimization techniques such as clock gating • Optimize the location of SEND/RECEIVE nodes (Reconditioning)