210 likes | 344 Vues
Power Optimization Toolbox for Logic Synthesis and Mapping. Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley. Outline. Introduction Background Contributions SimSwitch : Switching activity estimation PowerMap : Mapping for power reduction
E N D
Power Optimization Toolbox for Logic Synthesis and Mapping Stephen Jang Kevin Chung Xilinx Inc. Alan Mishchenko Robert Brayton UC Berkeley
Outline • Introduction • Background • Contributions • SimSwitch: Switching activity estimation • PowerMap:Mapping for power reduction • PowerDC:Re-synthesis for power reduction • Experiments • Conclusions
Introduction • High power dissipation is a rising concern • It was shown that, in FPGAs, 2/3 of dissipation is due to dynamic power [J. Anderson, F. N. Najm, FPGA’02] • Minimization of dynamic power is achieved by reducing the total switching activity of the nodes • This work • Uses sequential simulation to estimate switching • Controls switching during synthesis and mapping fis the clock frequency, V the supply voltage, Ci the capacitance switched by signal i, and Si is the probability of signal i making a transition(switching)
Background • Boolean network • And-Inverter Graphs • Technology mapping • LUTs and standard cells • SAT-based re-synthesis • Resubstitution with don’t-cares
AIGs: Unifying Representation • An underlying data structure for various computations • Rewriting, resubstitution, simulation, SAT sweeping, induction, etc are based on the same AIG manager • A unifying representation for the whole synthesis/mapping/resynthesis/verification flow • Synthesis, mapping, verification use the same data-structure • Allows multiple structures to be stored and used for mapping • The main functional representation in ABC • A foundation of “contemporary logic synthesis”
d a b a c b c a c b d b c a d AIG Definition and Examples AIG is a Boolean network composed of two-input ANDs and inverters F(a,b,c,d) = ab + d(ac’+bc) 6 nodes 4 levels F(a,b,c,d) = ac’(b’d’)’ + c(a’d’)’ = ac’(b+d) + bc(a+d) 7 nodes 3 levels
c d c d a b a b Three Tricks That Make AIGs Tick • Structural hashing • Makes sure AIG is always stored in a compact form • Is applied during AIG construction • Propagates constants • Ensures each node is structurally unique • Complemented edges • Represents inverters as attributes on the edges • Leads to fast, uniform manipulation • Does not use memory for inverters • Leads to efficient structural hashing • Memory allocation • Uses fixed amount of memory for each node • Can be done by a simple custom memory manager • Even dynamic fanout manipulation is supported! • Allocates memory for nodes in a topological order • Optimized for traversal in the same topological order • Small static memory footprint for many applications Without hashing With hashing
SimSwitch • Fast sequential logic simulator • Useful for switching activity estimation • Improvements in simulation • Compact logic representation • only 12 bytes per AIG node • Recycling simulation memory • allocate simulation memory only for nodes on the frontier • Bit-parallel simulation of two time frames • When comparing simulation info in two consecutive time frames, avoids storing the simulation info from the previous frame
Simulation Runtime Evaluation Intel Xeon 2-CPU 4-core computer with 8GB RAM. Less than 100Mb was used in these experiments.
Review of Cut-Based Mapping Input: And-Inverter Graph • Compute K-feasible cuts for each node • Compute best arrival time at each node • In topological order (from PI to PO) • Compute the depth of all cuts and choose the best one • Iterate area recovery • Using area flow • Using exact local area • Chose the best cover • In reverse topological order (from PO to PI) Output: Mapped netlist S. Chatterjee et al, “Reducing structural bias in technology mapping”, Proc. ICCAD’05.
Cost Functions • Area flow • Wire flow • Switching flow (J. Cong, FPGA’99 S. Chatterjee, ICCAD’05) (S. Jang, FPGA’08) (This work)
SAT-based Re-synthesis Framework • SAT-based re-synthesis (FGPA’09) has these features • substantial optimization power • due to the use of internal don’t-cares • scalable local computation • due to the use of windowing • practical computation speed • due to the use of Boolean satisfiability for functional manipulation • ability to use various optimization objectives • due to the flexible conceptual framework.
Experimental Setup • Considered 20 industrial designs (12K to 165K 6-LUTs) • Used Intel Xeon 2-CPU 4-core computer with 8GB RAM • Verified the results using command “cec” in ABC • Experimental runs performed: • Baseline: comb synthesis with choices • (dch; if –e)2 (WireMap [FGPA’08] is disabled) • FullOpt: complete flow including high-effort seq and synthesis • (scl; lcorr; scorr) + (dch; if)2 (WireMap is enabled) • PowerMap: power-aware LUT-mapping • FullOpt + (dch; if –p)2 • PowerDC: power-aware resynthesis • PowerMap + (mfs –p)2
Power Reduction due to Power-Aware Optimization Table 1: Inputs toggle rate is 0.25 Table 2: Inputs toggle rate is 0.50 The results are geometric averages over 20 industrial designs
Changes in Wire Ratiosdue to Power-Aware Optimization Wire group codes: T5: “hot wires” (p > 0.4) … T1: “cold wires” (p < 0.1) where p is the probability of switching (note that p can be more than 0.5)
Power Dissipation per Wire GroupWith / Without Power-Aware Optimization Wire (Wire2) are wires before (after) synthesis. Pwr (Pwr2) are power dissipations before (after) synthesis.
Conclusions • Presented several contributions • SimSwitch: Estimation of switching activity • PowerMap: An extension of the priority cut LUT mapper [ICCAD’07] to prioritize cuts based on switching activity of the nodes • PowerDC: An extension of SAT-based resynthesis [FPGA’09] to remove signals with high switching • Demonstrated reductions in switching activity (without degradation of area and delay) • 27% reduction due to seq synthesis [ICCAD’08] and WireMap [FPGA’08] against a plain-vanilla flow • +19% reduction due to PowerMap and WireDC described in this paper
Future Work • Speeding up switching activity estimation • Current implementation can be made faster • More accurate power estimation • Estimating glitching in addition to switching • Making other transforms power-aware • Computing power-aware choices • Specialized logic structuring (power gating) • Sequential techniques for power reduction • Clock-gating that uses induction to compute signals that are valid clock gates on the reachable states