1 / 11

Global Delay Optimization using Structural Choices

Global Delay Optimization using Structural Choices. Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc. Overview. Motivation Timing criticality Restructuring for delay Algorithm Experimental results Conclusions Future work. Motivation.

lynncamp
Télécharger la présentation

Global Delay Optimization using Structural Choices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global Delay Optimization using Structural Choices Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc.

  2. Overview • Motivation • Timing criticality • Restructuring for delay • Algorithm • Experimental results • Conclusions • Future work

  3. Motivation • AIG is an And-Inverter Graph • AIG-based combinational logic synthesis is fast and effective • AIG-based synthesis is area-oriented (except balancing) • Needed: Delay optimization in AIG-based synthesis • AIGs allow for accumulation of structural choices [Lehman et al, TCAD’97; Chatterjee et al, ICCAD’05] • Can leverage efficient technology mapper with choices • Can lead to fast delay optimization (~10% of mapping time)

  4. Distinctive Features • Traditional approach • For all timing-critical areas • Perform timing analysis • Generate alternative structures • Evaluate the improvement and decide is transformation is accepted • Proposed approach • Perform timing analysis only once • For all timing-critical areas • Generate and store structural choices • Use technology mapper to pick and choose good structures • Characteristics of the proposed approach • Fast – because there is no repeated timing analysis • Simple – because it leverages AIG package and LUT mapper • Effective – because it makes decision in the global space

  5. Timing Criticality • Critical nodes • Used by many traditional algorithms • Critical edges • Used by our algorithm • We pre-compute critical edges of critical nodes • Reduces computation • An edge between critical nodes may not be critical • See illustration: edge 13 Primary outputs 4 4 3 3 2 2 1 1 Primary inputs

  6. Delay-Oriented Restructuring • Using traditional MUX-restructuring • AKA generalized select transform

  7. Overall Algorithm mapped netlist performSpeedup ( subject graph S, // S is an And-Inverter Graph mapped netlist M, // M was previously derived by tech-mapping of S timing window w, // w is used to detect the critical paths logic depth l, // l is used to detect a logic cone rooted at a node edge count p ) // p limits the number critical edges of the cone { perform timing analysis of M with unit-delay or LUT-library model; pre-compute critical section of M as nodes n such that 0  slack(n)  w; pre-compute timing-critical edges connecting these nodes; for each timing critical node n { find cone C of M that extends l levels down from n; pick the set of timing-critical edges V feeding into C; if the number of edges in V exceeds p, continue; find logic cone C’ in S corresponding to C in M; find variables V’ in S corresponding to V in M; derive cofactors of the function of C’ w.r.t. variables in V’; build multiplexer tree C’’ of the cofactors using variables in V’; add structural choice C’= C’’ to the subject graph S; } returnmapped netlist M’ derived by mapping subject graph S with added choices; }

  8. Experimental Setup • Implemented in ABC as command speedup • Used FPGA technology mapper if • Verified the results using CEC engine cec • Experiments targeting 6-LUTs were run on an Intel Xeon 2-CPU 4-core computer with 8Gb RAM. • Experimentally compared the following scripts • Without delay-optimization: • (st; dchoice; if -C 16 -F 2)8 • With delay-optimization: • (st; dchoice; if -C 16 -F 2)4 • (speedup; if -C 16 -F 2)3 • (st; dchoice; if -C 16 -F 2)4

  9. Examples of LUT Libraries The unit-delay LUT library 1 1.0 1.0 2 1.0 1.0 1.0 3 1.0 1.0 1.0 1.0 4 1.0 1.0 1.0 1.0 1.0 5 1.0 1.0 1.0 1.0 1.0 1.0 6 1.0 1.0 1.0 1.0 1.0 1.0 1.0 A variable-pin-delay LUT library 1 1.0 0.2 2 1.0 0.2 0.3 3 1.0 0.2 0.3 0.4 4 1.0 0.2 0.3 0.4 0.45 5 1.0 0.2 0.3 0.4 0.45 0.55 6 1.0 0.2 0.3 0.4 0.45 0.55 0.65 A variable-pin-delay LUT library with wire-delays 1 1.0 0.4 2 1.0 0.4 0.5 3 1.0 0.4 0.5 0.6 4 1.0 0.4 0.5 0.6 0.65 5 1.0 0.4 0.5 0.6 0.65 0.75 6 1.0 0.4 0.5 0.6 0.65 0.75 0.85 LUT size LUT area LUT pin delays

  10. Experimental Results Time1 – the runtime of AIG restructuring only Time2 – the total runtime of Speeup Geomean – geometric averages of columns Ratios – ratios of geometric averages LUT – number of LUTs Lev – number of LUT levels Delay – delay using LUT library Total – total runtime of Baseline

  11. Conclusions and Future Work • Developed a method that is • Fast – because there is no repeated timing analysis • Simple – because it leverages AIG package and LUT mapper • Effective – because it makes decision in the global space • Future work may include • measuring improvements after place-and-route • extending the algorithm to work for sequential circuits • applying similar optimization for cost functions other than delay (e.g. switching activity minimization)

More Related