Huan Ren and Shantanu Dutt Dept. of Electrical and Computer Engineering

Algorithms for Simultaneous Consideration of Multiple PhysicalSynthesis Transforms for Timing Closure Huan Ren and Shantanu Dutt Dept. of Electrical and Computer Engineering University of IllinoisChicago

Outline • Problem formulation & prior work • Network flow model • Methodology Flow • Discretization Requirements • Structures for Accurate Objective Function Cost • Simultaneous Detailed Placement—A Holistic Approach! • Experimental Results • Conclusions

Problem Statement • Problem Statement • Simultaneously apply a given set T of synthesis and replacement transforms to cells and nets on critical paths of a initial placed circuit to improve circuit delay near-optimally while satisfying area constraints. • For the current expts, T = {cell resizing, replication, replacement, type-1 & type-2 buffer insertions} • Critical paths (CP) = paths with delay > (1-α) fraction of circuit delay. We choose α=0.1. • Timing objective function [Dutt et al., ICCAD’06] • CS(ni ): critical sinks of nj, in CP • D(uj, ni ) : delay of ni at sink uj. • Sa(ni ) : allocated slack of ni , which isthe path slack of the most critical path through the net divided by the number of nets in the path • b allows exponential magnification of the timing function for critical nets in order to approximate min. of the max net timing function ~ min. delays in CP

Why necessary? Wire load estimation is very inaccurate prior to placement Leaves large room for improvements Various transforms Cell sizing: effective for improving timing Continuous sizing [Fishburn et al., ICCAD’85] and Discrete sizing [Hu et al., DAC’07], [Ren et al., IWLS’08] Options: Different cell sizes available in the library (s options for s sizes) Incremental global placement Re-place a subset of cells targeting the metric of interest for design closure [Dutt et al., ICCAD’06], [Wonjoon et al., ICCAD’03] Transform options: Remain in the position in the initial placement Move to the new position determined in a incremental global placement process Post-placement Incremental Physical Synthesis

Various Transforms (continued) Buffer insertion Usually associated with routing tree generation Can be estimated after placement using two different types of buffers [Jiang et al., TVLSI’98] Transform options for each buffer type: Do not insert any buffer Insert a buffer with different sizes available in the library (s options for s sizes) Critical S S Driving buffer (type 1) Isolating buffer (type 2) D S Non-critical D S Buffer S S Buffer

Various Transforms (continued) Cell Replication Can both improve drive capability and isolating sinks. Need to partition sinks between the two drivers [Srivastava et al., TVLSI’04] [Lillis et al., ISCAS’96]. Transform options: Do not replicate a cell Replicate a driver cell with several possible partitions of the sink cells among the two replicas (k options for k partitions) S S D S D D’ S S S

Detailed partition level Coarse partition level Cell resizing TD pl adjustment Replication and buffering Combining Multiple Synthesis Transforms—Past Work • Usually timing-driven • Most methods simply apply them sequentially • Transforms are not unified • [Donath et al., DATE’00] • Incorporating different synthesis transforms in different partition levels in a partition based placement • [Jiang et al., TVLSI’98] • Considers both cell resizing and buffer insertion • Dynamic sequencing but greedy. Choose the transform with largest delay improvement to area increase ratio for a net/cell each time. • Can be trapped in local optimums. • Hard to handle other transforms (e.g. incremental placement which cause no area increase) Our method: -- simultaneous -- unified transforms

Complete bipartite TD function value for this choice of options 1 (res), 2 (b1) Ob1 Ores(u) D 2 1 1 2 G Network Flow Model • An example: A simple transform selection graph (TSG) for one net • Nodes: Transform options for each net (& its cells) • Arcs: those in complete bipartite graphs between transform option sets for a net—all combinations are available as flow paths • Flow: has binary meaning: flow through a node  the option for the node is selected • Flow: also has a quantitative meaning:In constraint satisfaction problems, flow amount = constraint metric value = (in our case) sizes of selected options • Flow cost is equal to the timing objective function value with selected options  Timing-optimal transform options = the min-cost flow v u w ni

Overall Model n1 n2 n4 n3 Spanning structures • Mini-TSG is constructed for each net in CP (net structures) • If two nets have common cells, their net structures are connected by a spanning structure. N1 N2 T S N3 N4 DPG • Flows indicating selected cell sizes and positions are sent to the DPG to perform detailed placement • Detailed placement “cost” is also considered when selecting options to reach an overall near-optimal soln

Methodology Flow Determine the set CP of near-critical paths = {paths w/ delays >= (1-e)[critical path delay)} Determine transform options from trans. set T for every net in CP (from library or using known algorithms, e.g., for replication) Construct the transform selection graph (TSG) and couple it with the detailed placement graph (DPG) [Dutt et al. ICCAD’06] Determine F- (obj) and C- (discretization) costs for arcs in the TSG Determine min-cost flow through TSG + DPG using the “concave-cost’’ min-cost method of [Kim & Pardalos, OR Letters, ’99] Determine transforms across all cells & nets in CP and their legalized detailed placement from the above flow

Ob1 Ores(u) T S 2 1 1 2 Discretization Requirements in the Network Flow Model • Mutually exclusive arcs (MEAs) for the output arc and/or input arcs stes of some nodes: at most one arc in an MEA set can have flow through it • Hyper-arc flow • Hyper-arcs may be needed in some problems to model k-way dependencies (k > 2). For example, needed in our physical synthesis problems to accurately reflect obj. metric value change caused by flow through nodes in it. Valid Invalid MEA sets MEA sets Star graph model w/ only 2 states 4-ary hyperarc Star graph model —No flow state All flow state

v u w Ores(v) v Ores(u) v’ v 1 2 1 1 2 2 2 1 Orep(v) Ob1 w Orep(u) u’ u v Net Structure and F-cost • Each flow path is • a transform combination • Set {paths} = Set {transform • combos} • First attempt: A linear structure • Product term based arc cost • Order of a product term in the timing objective function is the # of transforms the term is a function of. E.g., Objective func. (linear delay model): d(u,v)+d(u,w)= 2cRdL(ni)+2RdCv+2RdCw d(u, v) Distribution node Gathering node d(u, w) Ores(v) Ores(w) Ob1 Ores(u) u Rd(Ores(u), Ob1) ·Cv(Ores(v), Orep(u),Orep(v)) order 5 u w

T(Ox?, Oy1, Oz2) T(Ox1, Oy2) T(Ox, Oy) T(Ox, Oz) 2 1 2 1 2 1 T(Ox, Oy,Oz) No bipartite graph Linear Structure—Issues in Objective Function Cost • Drawbacks of linear structure • Cannot handle terms with order >2 • Cannot handle terms that depend on two “non-adjacent” transforms. Gathering node Supply node Oy Oz Ox

v u w Meta-hyperarc H for above order-5 term i2 i1 i1 Ores(u) m2 m2 m1 j1 j1 j2 l2 l1 l1 k1 k1 k2 Orep(u) Ob1 Ores(v) Orep(v) Hyperarcs: Accurate Objective Function Cost • Product term based arc cost • Order of a product term in the timing objective function: the # of transforms the term is a function of. Ex: Simple linear delay model: d(u,v)+d(u,w) = 2cRdL(ni)+2RdCw+2RdCv Rd(Ores(u), Ob1) ·Cv(Ores(v), Orep(v), Orep(u)) order 5 d(u, v) “Combination” hyperarcs Flow needs to select exactly 1 comb. hyperarc 2n hyperarcs d(u, w) • Assuming 2 options per transform, order=n • mn hyperarcs if • m options per transform

Arcs in network flow graph can only be between two nodes. Parallel arcs between central transform and parallel transform. Each parallel arc & the arcs to the regular transform option nodes it represents corresponds to one hyperarc. f (valid) T(Oxi, Oyj, Oz1) …… T(Ox, Oy, Oz) Parallel arcs j Oy Ox T(Oxi, Oyj, Ozm) m options m parallel arcs Parallel arc sets f’ (invalid) T(Oxi, Oyj, Oz1) Parallel transform ….. j Oy Ox Ox Oy Multiple arcs T(Oxi, Oyj, Ozm) Central transform Meta arc i i …. 1 Regular transforms Oz Oz Multiple option nodes m Meta Star Graph Hyperarcs: Star Graph Structure Hyperarc representing an order-3 cost term value O Oy Oz

Obtain standard min-cost flow of cost C2 w/o discretization constraints Let CΔ= C1 – C2 Set MEA arc cost = CΔ+1 CΔ +1 CΔ +1 MEA sets CΔ +1 CΔ +1 Invalid flow F+C-cost Valid flow F-cost C-cost diff >= CΔ+1 F-cost diff >= - CΔ CΔ Total diff >= 1 Min-cost invalid flow F-cost Valid flow F+C-cost MEA Satisfaction via Arc C-costs Heuristically or randomly select a valid flow & determine its cost C1 • Besides the objective function based cost (F-cost), a objective function independent C-cost is added • Total arc cost = F-cost + C-cost (cost is a step function—incurred once for any flow amount) Theorem: A min-cost flow with C-costs on MEA arcs ensures MEA satisfaction

Tot cap = f i Hyperarc-Consistent Flows via Arc C-costs • Consistent Hyperarc flow: • Idea: Only the total capacity of a parallel arc and arcs to its consistent regular option nodes can be = to incoming flow amount f. • How: use prime numbers f(1-1/3) f j f(1-1/5) Oy Ox f(1/3) Tot cap < f 1 Tot cap > f Oz f(1/5) • For k total regular option nodes (across all • regular transforms), select k prime numbers • p1<p2…<pksuch that: • 1/p1+…+1/pk>(pk-1)/ pk • Cap of non-para arcs: f(1/pj) • Cap of para arcs: f-(cap of its consistent non-para arcs) 2 • C-cost is proportional to arc capacity: Cunit * cap(e) • Cunit = (CΔ+1)/ Δcapmin , Δcapmin is the min{cap of invalid arc sets – f} • Theorem: A min-cost flow with C-costs on star graph arcs ensures hyparc-consistent flows in star graphs

Slope=cost(e)/cap(e) Step function cost (concave) c c Standard linear flow cost Cost(e) Well studied NP-hard problem [Kim et al., ORL’99]; we use their min-cost algo. f f Cap(e) Cap(e) Discrete Arc Cost • Total arc cost = F-cost + C-cost (incurred once for any amt of flow)—arc cost is discrete

Affected parameters for ni: Driver R: Rd(Ores(u)), WLLi(Orep(u),Ob2), Sink C: Cv(Ores(v), Orep(v), Orep(u)), Cw(Ores(w), Orep(w), Orep(u)) • Order > 2 terms: 2RdCv(order 4), 2c · RdLi (order 3), 2RdCw (order 4) Orep(v) Ores(v) Distr. node Ores(u) Ob2 Orep(u) MEA constraint ensures consistent option selection for common transforms in diff. star graphs Ob2 v Gathering node Ores(u) u Orep(w) Ores(w) w Meta arc ni Sub-TSG for net ni 2c ·RdLi Orep(u) Multiple Cost Terms: Intersecting Hyperarcs & Overlapping Star Graphs • There is one star graph structure for each term in the objective function. • Option nodes for common transforms between different terms are combined. • Example: Consider three transforms: gate sizing (res), replication (rep) and isolating buffer (b2).

C11 C12 C13 C14 C21 C22 A1 Background: Incremental Detailed Placement [Dutt et al., ICCAD’06] • Flow amount  Cell movement • Arcs  possible movement directions • Arc cost  Deterioration on the objective metric of the corresponding movement • Cells to be legalized are connected to the source • White spaces are connected to the sink. C11 C12 C13 C14 W1 Row1 Source Row2 W21 C21 C22 C24 W2 Sink A1 Cells to be legalized C31 C32 C33 Row3 W3 W1 Flows from the source to the sink perform cell legalization via white spaces. C24 W2 W21

Amax-Aj(u) Aj(u) Amax Aj(u) Simultaneous Detailed Placement &Area Constraint Satisfaction • Directly send branch flows to the detailed placement network flow graph (DPG) to perform simultaneous detailed placement • Flow is sent from the replacement option node of a cell to the corresponding position in the DPG. • Flow amount means the selected size of the cell. Coupling between the flow and the size option nodes is needed: Shunting structure Pos i of u Pos j of u To DPG Shunting arc DPG i (Amax,0) (Aj(u),0) j Opl(u) Sink k j i (Amax,0) Ores(u) Opl(u)

Experimental Results—Benchmarks • Three benchmark sets TD-Dragon [Yang et al., ICCAD’02], ISCAS’85, TD-IBM • Available options • For cell sizing & type-1, type-2 buffers: 4 options for TD-Dragon and ISCAS’85, and 5 options for TD-IBM • For replication: 4 options: 3 replication options with different partitions of sink cells and a no-replication option • For replacement: 2 options: a timing-driven position of each cell is calculated using method in [Dutt et al., ICCAD’06]. A cell can either stay at its original position or be moved to its timing-driven position. • 3% extra white space is added to initial circuits in TD-IBM, and 10% extra white space is added to circuits in ISCAS’85 and TD-Dragon

Sequential Application of Transforms • We compare our results to the sequential application of transforms • Order of transform application matters in sequential application. We tested three different orders: • 1) Decreasing order of ΔT/ΔA ΔT=25.92% replacement  isolating buffer  cell resizing  drive buffer  replication • 2) Decreasing order of ΔT ΔT=18.11% replacement  cell resizing  isolating buffer  replication  drive buffer • 3) Increasing order of ΔA ΔT=22.64% replacement  isolating buffer  drive buffer  cell resizing  replication

Experimental Results TD-ibm benchmarks 34.8 8.9 25.9 34.4% relatively better ISCAS’85 20.4 7.9 12.5 63.2% relatively better

Experimental Results TD-Dragon 15.1 6.3 8.8 71.6% relatively better • Our run time is about 1.5 times that of the seq. approach • Linear increase w.r.t. number of cells on CP.

Conclusions • A general discretized n/w flow based approach to TD post-placement multiple physical synthesis; can handle most transforms in an unified manner • Considers transform applications simultaneously • Obtained high-quality solutions; is not trapped in local optimas • Performs simultaneous detailed placement (DP) so that DP cost is considered when selecting transform options • Reasonable run time, good scalability & high quality solutions • Demonstrates the power of using continuous opt. w/ well-structured discretizations • Applicable to other constrained optimization problems (e.g., power opt w/ area and timing constraints) • Future Work: (a) Application to mixed-cell designs; (b) Consider global re-routing as a transform for signal integrity

Thank you

Huan Ren and Shantanu Dutt Dept. of Electrical and Computer Engineering