Constraint-Driven Large Scale Circuit Placement Algorithms

Constraint-Driven Large Scale Circuit Placement Algorithms Advisor: Prof. Jason Cong Student: Min Xie September, 2006

Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

Publication List • Cong. J, Xie M., and Zhang Y. “An Enhanced Multilevel Routing System,” Proceedings of the ICCAD, pp. 51-58, 2002. • Chang C., Cong J. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” Proceedings of ASPDAC, pp. 621-627, 2003. • Cong J., Romesis M. and Xie M., “Optimality, Scalability and Stability Study of Existing Partitioning and Placement Algorithms,” Proceedings of ISPD, pp. 88-94, 2003. • Cong J., Romesis M. and Xie M., “Optimality and Stability Study of Timing-driven Placement Algorithms,” Proceedings of ICCAD, pp. 472-478, 2003. • Cong J., Kong T., Shinnerl J. Xie M. and Yuan X. “Large-Scale Circuit Placement: Gap and Promise,” Proceedings of ICCAD, pp. 883-890, 2003. • Chang C., Cong J. Romesis M. and Xie M., “Optimality and Scalability of Existing Placement Algorithms,” IEEE TCAD, vol. 23, no. 4, pp. 537-549, 2004. UCLA VLSICAD LAB

Publication List • Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” Proceedings of ICCAD, pp. 883-890, 2004. • J. Cong, J. Fang, M. Xie, and Y. Zhang, "MARS - A Multilevel Full-Chip Gridless Routing System,"IEEE TCAD, Vol. 24, No. 3, pp. 382-394, March 2005. • J. Cong, T. Kong, J. Shinnerl, M. Xie, and X. Yuan, "Large Scale Circuit Placement," ACM TODAES, Vol. 10, No. 2, pp. 389-430, April 2005. • Li C., Xie M, Koh C.K., Cong J., and Madden P., “Routability-driven Placement and White Space Allocation,” IEEE TCAD, to appear. • T. Chan , J. Cong M. Romesis J. Shinnerl, K. Sze, M. Xie, “mPL6: A Robust Multilevel Mixed-size Placement Engine,” Proceedings of ISPD, pp. 227-229, April 2005. • Cong J. and Xie M., “A Robust Detailed Placement Algorithm for Mixe-size IC Designs”, Proceedings of ASPDAC, pp.188-194., 2006. • J. Cong, T. Chan, J. Shinnerl, K. Sze and M. Xie, "mPL6: Enhanced Multilevel Mixed-size Placement," Proceedings of the ISPD, pp. 212-214, April 2006. UCLA VLSICAD LAB

Relative Wirelength A Brief History of mPL • mPL 1.1 • FC-Clustering • added partitioning to legalization • mPL 1.0 [ICCAD00] • Recursive ESC clustering • NLP at coarsest level • Goto discrete relaxation • Slot Assignment legalization • Domino detailed placement UNIFORM CELL SIZE • mPL 2.0 • RDFL relaxation • primal-dual netlist pruning • mPL 3.0 [ICCAD 03] • QRS relaxation • AMG interpolation • multiple V-cycles • cell-area fragmentation • mPL 4.0 • improved DP • better coarsening • backtracking V-cycle NON-UNIFORM CELL SIZE • mPL5,mPL6 • Multilevel Force-Directed 2002 2003 year 2000 2001 2004 UCLA VLSICAD LAB

Given problem Problem size decreases Interpolation & Relaxation (optimization) Coarsening(Clustering) Multiscale Optimization Framework • Explores different scales of the solution space at different levels • Supports VERY FAST and SCALABLE methods • Supports inclusion of complicated objectives and constraints • Successful across MANY DIVERSE applications UCLA VLSICAD LAB

Logsum wirelength Average bin density Equality constraint Average bin density = utilization ratio mPL6 – Generalized Force Directed Refinement v4 3 v5 v3 2 v6 v2 1 v7 v1 1 3 4 2 = a13(v7) = fractional area of cell v7 in bin B13 UCLA VLSICAD LAB

mPL6 – Iterative Flow • Bestchoice clustering [Alpert et al, ISPD05] • AMG declustering [Chen et al, DAC03, Chan et al ICCAD03] • Multiple V cycle with distance based reclustering [Chan et al, ICCAD03] Level 3 C+I C I I Level 2 C+I C I I Level 1 UCLA VLSICAD LAB

Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Motivation and previous work • Routability-driven multilevel placement • Experiment results • Conclusions and future work • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

Motivation • mPL does not consider routing congestion • Aggressive HPWL minimization != routability • Routability-driven placement • Routability modeling • Routability optimization UCLA VLSICAD LAB

Previous Work -- Routability Modeling • Topology-free methods • Dragon [Yang et al., TCAD03] • Sparse [Hu et al., ICCAD02] • BonnPlace [Brenner & Rohe, ISPD02] • Topology-based methods • [Mayrhofer & Lauther, ICCAD90] • mPG [Chang et al., ISPD02] UCLA VLSICAD LAB

Previous Work -- Routability Optimization • Cell weighting • Cell inflation based on congestion • Constructive and iterative methods • Dragon [Yang et al, TCAD03] • BonnPlace [Brenner & Rohe, ISPD02] • Net weighting • Translate into bin weights and optimize weighted wirelength • Iterative methods • Sparse [Hu & Sadowska, ICCAD02] • mPG [Chang et al, ISPD02] UCLA VLSICAD LAB

Routability-Driven Multilevel Placement • Global placement • Congestion estimation by a fast LZ router • Congestion-driven cell re-placement based on weighted wirelength • Hierarchical top-down white space allocation • Geometric-based slicing tree • Congestion estimation on tree • Cutline adjustment UCLA VLSICAD LAB

Right region Left region mPL-R Congestion Estimation with LZ Router • Use LZ-Router [Chang et al., ISPD02] for fast congestion analysis on each level • Binary search on V-stem (or H-stem) • Initialize left region and right region to cover bounding box • Repeat • Query wire usage on both regions • Select region with less congestion VHV HVH Less congested More congested UCLA VLSICAD LAB

WLc = 15.5 WLc = 9.2 • Search adjacent bins within certain window • Choose the bin based on weighted WL mPL-R Congestion-Driven Re-Placement • Pick cells whose incident nets cross congested regions to move • Start from the optimal location for HPWL 2.0 0.5 1.2 UCLA VLSICAD LAB

A B E F D C G H root Cut direction Cut location Node area Congestion • Estimate congestion on leaf nodes. Congestion on other nodes can be computed from bottom to top. A B C D E F G H White Space Allocation -- Slicing Tree Construction • Recursively bipartition chip region from top to bottom. • Group cells into children nodes according to location relative to cutline. UCLA VLSICAD LAB

A A B B E E F F D D C C G G H H 240/88 116/28 124/60 A B C D E F G H cell area/congestion White Space Allocation – Cutline Adjustment • Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their overflow. root Assuming chip area of root = 300 Total WS area = 300 – 240 = 60 WS area for left child = 60*28/(28+60) = 19.1 WS area for right child= 40.9 Chip area for left child = 116+19.1 = 135.1 Chip area for right child = 124+40.9 = 164.9 UCLA VLSICAD LAB

A B C D E F G H White Space Allocation – Cutline Adjustment • Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. A B E F D C G H root 240/88 cell area/congestion 116/28 124/60 62/19 58/34 54/9 66/26 UCLA VLSICAD LAB

White Space Allocation – Cutline Adjustment • Adjust cut location from top to bottom such that white spaces for children nodes are proportional to their congestions. A B E F D C G H root 240/88 cell area/congestion 116/28 124/60 A B C D E F G H UCLA VLSICAD LAB

Experiment Setup • 16 IBM version 2 examples • 5% to 15% white space • Three state-of-the-art routability-driven placers • Dragon-fd 3.01 [Yang et al, TCAD03] • Simulated annealing with bin swapping • Two-step white space allocation • Capo 10.0 [Roy et al, ISPD06] • Fast steiner tree approximation • Congestion based cutline shifting • Fengshui 5.1 [Agnihotri et al, ISPD05] • Recursive bi-section • Similar white space allocation method incorporated • Magma router for evaluation UCLA VLSICAD LAB

Routability-Driven Placement Tools Comparison mPL-R+WSA is the only flow to produce all successful routing mPL-R+WSA produces the shortest wirelength UCLA VLSICAD LAB

Routability Optimization Techniques Comparison • mPL • Latest pure WL-driven version • No consideration of routing congestion • mPL-R • mPL-I • Cell inflation + dummy density assignment • Highest quality in ISPD06 contest [Nam ISPD06] • Density target set as utilization • mPL+WSA • mPL-R+WSA UCLA VLSICAD LAB

Routability Optimization Techniques Comparison mPL-I with heuristic penalty term does not perform very well Both mPL-R and WSA improves routability significantly Combined workflow gives the highest completion rate UCLA VLSICAD LAB

Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Enhancement for macro legalization algorithm • Additional experiment results • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

? Enhancement for Macro Legalization • Constraint graph reduction • Original constraint graph • One edge for each pair of macros • O(n2) in total • Reduced constraint graph • Edge inserted only when no transitive closure present • Significant reduction of memory consumption A C B UCLA VLSICAD LAB

Experiment Result with ICCAD04-MS • 84% reduction of constraint edges • No degradation of solution quality UCLA VLSICAD LAB

Enhancement for Macro Legalization fij x Hij • Used in ISPD 2006 placement contest UCLA VLSICAD LAB

ISPD05 Examples • Bigger problem size • Suitable to test scalability UCLA VLSICAD LAB

Scalability Comparison on ISPD05-- Global Placements by APlace • XDP produces 1% longer WL, but is 10X faster UCLA VLSICAD LAB

Scalability Comparison on ISPD05-- Global Placements by mPL • XDP can be 10x faster with comparable quality UCLA VLSICAD LAB

Impact of Gradual Macro Legalization – ISPD05 • 12 % WL reduction possible with macros movable UCLA VLSICAD LAB

Outline • Chapter 1. Introduction • Chapter 2. Optimality and scalability study of existing placement algorithms • Chapter 3. Routability driven multilevel global placement and white space allocation • Chapter 4. A robust legalization scheme for mixed-size placement • Chapter 5. Applications of mixed-size placement legalization • Chapter 6. “Global” localized preprocessing for detailed placement • Chapter 7. Heterogeneous placement for FPGAs • Motivation and previous works • Multilevel heterogeneous placement – mPL-H • Experiment results • Conclusions and future work • Chapter 8. Conclusions and future works UCLA VLSICAD LAB

Motivation • Popularity of FPGAs • Ease of use • Low cost for small to medium production • Modern FPGA placement impose heterogeneous constraints • Memory block of different capacity, DSP blocks • Each block should only be placed on sites of the same type UCLA VLSICAD LAB

Example FPGA Chip Figure taken from Altera Stratix Handbook UCLA VLSICAD LAB

Previous Works -- Academia • Simulated annealing • VPR [Betz & Rose, FPL97, Marquardt et al, FPGA00] • PATH [Kong, ICCAD02] • SPCD [Chen & Cong, FPL04, FPGA05] • Partitioning • PPFF [Maidee et al, DAC03] • Graph embedding • CAPRI [Gopalakrishnan et al, DAC06] • Multilevel • Ultrafast-VPR [Sankar & Rose, FPGA99] • mPG-ms [Cong & Yuan, ASPDAC03] • None of them handle heterogeneous constraint UCLA VLSICAD LAB

Previous Works -- Industry • Quartus II by Altera Corporation • Stratix, Stratix II, etc. • ISE by Xilinx Corporation • Virtex II, Virtex II Pro, etc. • Do have heterogeneous capability • Only for proprietary chip architecture • Algorithms and techniques not publicly documented UCLA VLSICAD LAB

Multilevel Heterogeneous Placement – mPL-H • Based on multilevel generalized force directed placement • Multi-layered placement to handle heterogeneous placement • Filler cells to enhance quality and stability • Gradual carry chain legalization UCLA VLSICAD LAB

Limitations of mPL for Heterogeneous Placement • Does not consider heterogeneous constraints • Any block can be placed anywhere • Requires density to be uniform everywhere • Penalize wirelength for low utilization UCLA VLSICAD LAB

mPL-H -- Global Placement (I) • Multiple layers, each layer for each resource • DSP layer • M-RAM layer • LAB layer • M4K layer • M512 layer • Forbidden regions blocked by obstacles • Uniform wirelength computation DSP M-RAM LAB UCLA VLSICAD LAB

mPL-H -- Global Placement (II) • Filler cell • Occupy the residual capacity • Transform inequality into equality • Density computed independently on each layer • Granularity may not be fine enough UCLA VLSICAD LAB

sites cells mPL-H -- Legalization (I) • DSP and memory blocks • Domains do not overlap • Legalized independently • Uniform size for the same type • Linear assignment O(n3) • Cost as distance UCLA VLSICAD LAB

mPL-H -- Legalization (II) • Carry chains • Vary in length • Legalized in descending order of length • Partition each column into same size • Assign chains of same length using linear assignment UCLA VLSICAD LAB

mPL-H -- Legalization (III) • Column-wise rearrangement of carry chains • P(n,m) is the minimum perturbation of assign (v1,…vn) to sites (s1,s2,…sm) • P(1,j) = d(1,j), d(1,j) is the perturbation of assigning v1 to site sj • P(i,j) = min{P(i-1,j-hi), P(i, j-1)} • Can be solved more efficiently for some special cases • Quadratic distance • No site constraint UCLA VLSICAD LAB

Experiment Setting Verilog netlist Quartus_map Clustered .vqm netlist Architecture Description XML Quartus_fitter mPL-H Chip type .qsf placement .qsf placement Quartus_router UCLA VLSICAD LAB

QUIP Suite UCLA VLSICAD LAB

Wirelength Comparison mPL-H is 3% better in HPWL, and 2% better in routed WL than Quartus II v5.0 UCLA VLSICAD LAB

Runtime Comparison mPL-H can be 2X faster than Quartus II v5.0 when the circuit becomes sufficiently large UCLA VLSICAD LAB

Optimality Study of mPL-H • PEKO-H construction • Populate all sites with corresponding resource type • Generate each net with optimal wirelength • Extract the netlist in the end UCLA VLSICAD LAB

Experiment Results with PEKO-H mPL-H produces HPWL 34% longer than the optima UCLA VLSICAD LAB

Displacement of PEKO-H13 UCLA VLSICAD LAB

Constraint-Driven Large Scale Circuit Placement Algorithms

Constraint-Driven Large Scale Circuit Placement Algorithms

Presentation Transcript

Benchmarking for Large-Scale Placement and Beyond

Cell Density-driven Detailed Placement with Displacement Constraint

Routability Driven Analytical Placement for Mixed-Size Circuit Designs

Performance-driven Analog Placement Considering Boundary Constraint

LARGE SCALE

Stability Analysis Algorithms for Large-Scale Applications

Placement Algorithms

Linear Solver Challenges in Large-Scale Circuit Simulation

Handling Complexities in Modern Large-Scale Mixed-Size Placement

Efficient Algorithms for Large-Scale GIS Applications

Constraint-Driven Clustering

Large Scale Circuit Placement: Gap and Promise

Large scale

Constraint-Driven Large Scale Circuit Placement Algorithms

Optimizing Routability in Large-Scale Mixed-Size Placement

Constraint Graph-Based Macro Placement for Modern Mixed-Size Circuit Designs

Efficient Algorithms for Large-Scale Topology Discovery

Data Mining Algorithms for Large-Scale Distributed Systems

Algorithms and Software for Large-Scale Nonlinear Optimization

Analytic Placement Algorithms

Efficient Algorithms for Large-Scale GIS Applications

Large Scale Circuit Placement: Gap and Promise