ECE260B – CSE241A Winter 2005 Partitioning & Floorplanning

ECE260B – CSE241AWinter 2005Partitioning & Floorplanning Website: http://vlsicad.ucsd.edu/courses/ece260b-w05

Key Design Stages • Synthesis • Partitioning • Floorplanning • Power/ground Generation • Clock Generation • Placement • Routing

Floorplanning

Floorplanning Input • Design netlist (required) • Area requirements (required) • Power requirements (required) • Timing constraints (required) • Physical partitioning information (required) • Die size vs. performance vs. schedule trade-off (required) • I/O placement (optional) • Macro placement information (optional)

Floorplanning Output • Die/block area • I/Os placed • Macros placed • Power grid designed • Power pre-routing • Standard cell placement areas  Design ready for standard cell placement

Floorplanning Output

Floorplan • Blocks inside a pad frame • Routing inside, between blocks • Different-sized blocks more difficult than standard cells to place and route • Blocks • Hard, soft, semi-soft • Rectangular, L-shaped, T-shaped, rectilinear • Can rotate, mirror, … blocks RAM std cell I/O pads Routing channels data path Courtesy K. Yang, UCLA

Design Styles • Full Customized • Analog / RF • CPU design • ASIC (Application Specific IC) • Gate array / sea of gate / standard cells • Via programmable • Structured ASICs • Programmable Logics • PLA • FPGA • Software implementation • Micro-code Courtesy K. Yang, UCLA

Physical Design Schedule Perf Die size Die size Size Estimation • Why we care: • If area is too small: P&R will not finish or meet timing, will run too long • Schedule and die size inversely related • Performance and die size have complex relationship • Rule of thumb (must correct for power, clock, etc.): • 3LM: Cell utilization 65 percent // what is utilization? • 4LM: Cell utilization 70 percent • 5LM: Cell utilization 75 percent • 6LM: Cell utilization 80 percent • Floorplan metrics • Low interconnect density  Cell util (standard cell area/standard cell row area) • High interconnect density  “Net util” (number of nets/standard cell area)

A A B C B C channel 1 ch 1 ch 3 ch 2 ch 2 Channels • Channels end at block boundaries • Alternate channel definitions possible, depending on position of blocks A B C Courtesy K. Yang, UCLA

C A B D E Channel Intersection Graph • Nodes are channels, edges correspond to pairs of channels that touch • Channel graph shows paths between channels • Channel graph can be used to guide global routing Courtesy K. Yang, UCLA

channel B B A D channel A C constraint Channel Ordering • Wire out end of one channel creates pin on side of next channel • “Wheel” = Circular constraints that create an unroutable configuration of channels Courtesy K. Yang, UCLA

1 A C 3 2 D B 4 E Slicing Floorplan Represented by Binary Tree • A slicing floorplan can be recursively cut in two without cutting any blocks • A slicing floorplan is guaranteed to have no “wheels”, therefore guaranteed to have a feasible order of routing for the channels • A slicing floorplan can be represented as a binary tree, with internal nodes representing slices in the floorplan and leaves representing blocks. 1 2 3 4 C A B D E Courtesy K. Yang, UCLA

O-Tree • Partial ordering based on projection overlapping (with given physical locations) • Transforming into binary trees by pivoting, etc. • Coded in a node sequence given a tree traversal algorithm • E.g., OACBDEF for DFS • Condensed solution space C A O D E B F Courtesy K. Yang, UCLA

Sequence Pair • Based on layout partitions by non-overlapping ascending/descending staircases • Coded in two node sequences • E.g., CEDFAB for descending staircases and • ABCDEF for ascending staircases • Larger solution space, finer representation C A D E B F Courtesy K. Yang, UCLA

Partitioning

Outline • Introduction • Kernighan-Lin Algorithm • Fiduccia-Mattheyses Algorithm • Partitioning by Network Flow • Clustering • End-case Partitioning (and Placement)

Partitioning • Decomposition of a complex system into smaller subsystems • Done hierarchically • Partitioning done until each subsystem has manageable size • Each subsystem can be designed independently • Interconnections between partitions minimized • Less hassle interfacing the subsystems • Communication between subsystems usually costly • Time-budgeting

Example: Partitioning of a Circuit Input size: 48 Cut 1=4 Size 1=15 Cut 2=4 Size 2=16 Size 3=17

Hierarchical Partitioning • Levels of partitioning: • System-level partitioning:Each sub-system can be designed as a single PCB • Board-level partitioning:Circuit assigned to a PCB is partitioned into sub-circuitseach fabricated as a VLSI chip • Chip-level partitioning:Circuit assigned to the chip is divided into manageable sub-circuitsNOTE: physically not necessary

Delay at Different Levels of Partitions x A B D C 10x PCB1 PCB2 20x

Partitioning: Formal Definition • Input: • Graph or hypergraph • Usually with vertex weights • Usually weighted edges • Constraints • Number of partitions (K-way partitioning) • Maximum capacity of each partitionORmaximum allowable difference between partitions • Objective • Assign nodes to partitions subject to constraintss.t. the cutsize is minimized • Tractability • Is NP-complete 

Hypergraphs in VLSI CAD • Circuit netlist represented by hypergraph Slides Courtesy Kia Bazargan, U. Minn

Hypergraph Partitioning in VLSI • Variants • directed/undirected hypergraphs • weighted/unweighted vertices, edges • constraints, objectives, … • Human-designed instances • Benchmarks • up to 4,000,000 vertices • sparse (vertex degree » 4, hyperedge size » 4) • small number of very large hyperedges • Efficiency, flexibility: KL-FM style preferred

Some Notations • A net n is cut by a cluster C if at least one, but not all, pins of n is in C. • We use E(C) to denote the set of nets cut by a cluster C. • We use E(P) to denote the set of nets cut by at least one cluster of a partition P. • We use w(C) to denote the no. of cells assigned to a cluster C.

Some Bipartitioning Formulations • Min-Cut Bipartitioning: • Objective : Minimize F(P2) = |E(C1)| = |E(C2)| • Min-Cut Bisection: • Objective : Minimize F(P2) = |E(C1)| = |E(C2)| • Constraint : |w(C1) - w(C2)|   • Size-Constrained Min-Cut Bipartitioning: • Objective : Minimize F(P2) = |E(C1)| = |E(C2)| • Constraint: L  w(C1), w(C2)  U • Minimum Ratio Cut Bipartitioning: • Objective : Minimize F(P2) = |E(C1)|/(w(C1)w(C2))

Some Multi-Way Partitioning Formulations • Size-Constrained Min-Cut k-Way Partitioning: • Objective : Minimize F(Pk) • Constraint: L  w(Ci)  U  Ci  Pk • Many other complicated formulations • k-way partitioning: Formulation • Given a netlist of n cells V = {v1, v2, …, vn}, assign the cells to k clusters Pk = {C1, C2, …, Ck} satisfying some given constraints such that an objective function F(Pk) is optimized. • Partitioning: k is small O(1) • Clustering: k is large O(n) • Technology Mapping: Constraints on the clusters

Kernighan-Lin (KL) Algorithm • On non-weighted graphs • An iterative improvement technique • A two-way (bisection) partitioning algorithm • The partitions must be balanced (of equal size) • Iterate as long as the cutsize improves: • Find a pair of vertices that result in the largest decrease in cutsize if exchanged • Exchange the two vertices (potential move) • “Lock” the vertices • If no improvement possible, andstill some vertices unlocked, thenexchange vertices that result in smallest increase in cutsize W. Kernighan and S. Lin, Bell System Technical Journal, 1970.

Kernighan-Lin (KL) Algorithm • Initialize • Bipartition G into V1 and V2, s.t., |V1| = |V2|  1 • n = |V| • Repeat • for i=1 to n/2 • Find a pair of unlocked vertices vai V1 and vbi V2 whoseexchange makes the largest decrease or smallest increasein cut-cost • Mark vai and vbi as locked • Store the gain gi. • Find k, s.t.  i=1..k gi=Gain k is maximized • If Gain k > 0 then move va1,...,vak from V1 to V2 and vb1,...,vbk from V2 to V1. • Until Gain k  0

An Example a b c d a 0 1 2 3 2 a c 3 b 1 0 1 4 1 3 1 b d c 2 1 0 3 4 d 3 4 3 0 Slides courtesy F. Y. Young, U. Hong Kong

An Example - Pass One 2 3 4 a c d c d b 3 3 3 1 3 4 2 3 1 1 1 1 b d b a c a 4 1 2 g(a,c) = -1+3-3+1 = 0 g(a,d) = -1+2-3+4 = 2 g(b,c) = -1+4-3+2 = 2 g(b,d) = -1+1-3+3 = 0 g1 = 2 g(b,c) = -4+1-2+3 = -2 g2 = -2  G = g1 = 2 (k = 1)

An Example - Pass Two 3 3 3 d c d c d c 1 3 3 1 3 2 4 4 2 4 2 1 a a b b b a 1 1 1 g(a,b) = -2+3-4+1 = -2 g(a,d) = -2+1-4+3 = -2 g(c,b) = -2+3-4+1 = -2 g(c,d) = -2+1-4+3 = -2 g1 = -2 g(a,b) = -3+2-1+4 = 2 g2 = 2 G = g1 + g2 = 0 (k = 2) STOP!

Cut During One Pass (Bipartitioning) Cut Moves

Kernighan-Lin (KL) : Analysis • Time complexity? • Inner (for) loop • Iterates n/2 times • Iteration 1: (n/2) x (n/2) • Iteration i: (n/2 – i + 1) (n/2 – i + 1). • Passes? Usually independent of n • O(n3) • Drawbacks? • Local optimum • Balanced partitions only • No weight for the vertices • High time complexity • Only on edges, not hyper-edges

Fiduccia-Mattheyses Algorithm: Basic Ideas • Differences from KL: • Move only one cell each time. • Cells can have different sizes. • Nets can be multi-terminal. • Maintain a balanced partition after every move.

FM Algorithm • Start with a balanced partition P = {X,Y}. • Repeat • For i = 1 to n: • Choose a free cell b  XY s.t. moving b to the other side gives the highest gain, gain(b), and moving b preserves balance in P. • Move and lock b. • Let gi = gain(b). • Find k s.t. G = g1 + g2 + ….. + gk is maximized and shuffle the cells up to this kth step. • Until G = 0.

An Example g2 d e f a c a b c d e f g1 locked b a c g4 a c d f g3 f d b e b e

An Example c a a g5 a d d c g6 f d f c f e b e e b b If G = g1 + g2 + g3 + g4 is the largest partial sum, the partition after this pass is: c d e a f b

Balanced Partition • A partition P = (X,Y) is balanced iff: for some constant r  1 where w(X) is the total size of the cells in X. To preserve balance, a cell b is moved in a pass only if: after moving b where W = w(XY) and Smax is the maximum cell size

KL and FM Extensions: Tie-Breaking Strategy • When picking the highest gain move, break ties by looking ahead a certain number of steps. • If ties still occur, some researchers observe that LIFO order improves solution quality.

Ratio of #edges to #vertices • Solution quality of KL and FM depends on the ratio of #edges to #vertices: good if ratio > 5 and bad if ratio < 3. VLSI circuits have ratio 1.8-2.5 typically. • Goldberg and Burstein suggested contracting edges to increase the ratio: AB A B

12/12 12 b b a a 20 19/20 11/16 16 s t s t 10 4 9 7 10 1/4 9 7/7 13 4 12/13 4/4 c d c d 11 11/11 min-cut = max-flow Network Flow Technique • The network flow technique can find the min-cut bipartition optimally, but not necessarily balanced. • Apply the algorithm repeatedly to obtain a balanced bipartition.

Network Flow Technique • The network flow technique is very useful in many different research areas. • Many sophisticated improvements have been made to the original algorithm. • Ford & Fulkerson: O(|E||f|) where |f| is the size of the total flow. Note that for unit capacity, |f|  |E|, so O(|E|2) time.

Circuit Partitioning • We can apply the network flow algorithm in partitioning circuits. • The biggest problem is that the two partitions may not be balanced. • The problem of obtaining two balanced partitions with minimum cut is NP-complete. • However we can apply some heuristics to balance the two partitions.

Flow-Balanced-Bipartition (FBB) • Find a min-cut C = (X,Y) in the network N • If (1-)W/2  w(X)  (1+)W/2, stop and return C • If w(X) < (1-)W/2 then • Collapse all nodes in X to s • Collapse to s a node vY incident on a net in C • Go to to step 1 • If w(X) > (1+)W/2 … (similarly) ... Why do we need this step?

A B C D How to represent this netlist by a simple graph? Circuit Representation • Another problem in applying the network flow technique in circuit partitioning is how to represent a circuit correctly by a graph.

Hypergraph In hypergraph, an edge is a set of vertices. H(V,E) where V = {A, B, C, D} E = {n1, n2, n3} n1 = {A, B, C, D} n2 = {A, B} n3 = {C, D} A B C D Circuits can be represented by hypergraphs, but the net-work flow method can only be used in simple graphs.

ECE260B – CSE241A Winter 2005 Partitioning & Floorplanning