1 / 12

APLACE: A General and Extensible Large-Scale Placer

APLACE: A General and Extensible Large-Scale Placer. Andrew B. Kahng* Sherief Reda Qinke Wang VLSI CAD Lab UCSD CSE and ECE Departments http://vlsicad.ucsd.edu *Currently on leave of absence at Blaze DFM, Inc. Goals and Plan. Goals: Build a new placer to win the competition

Télécharger la présentation

APLACE: A General and Extensible Large-Scale Placer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APLACE: A General and Extensible Large-Scale Placer Andrew B. Kahng* Sherief Reda Qinke Wang VLSI CAD Lab UCSD CSE and ECE Departments http://vlsicad.ucsd.edu *Currently on leave of absence at Blaze DFM, Inc.

  2. Goals and Plan Goals: • Build a new placer to win the competition • Scalable, robust, high-quality implementation • Leave no stone unturned / QOR on the table Plan and Schedule: • Work within most promising framework: APlace • 30 days for coding + 30 days for tuning

  3. Philosophy Respect the competition • Well-funded groups with decades of experience • ABKGroup’s Capo, MLPart, APlace = all unfunded side projects • No placement-related industry interactions • QOR target: 24-26% better than Capo v9r6 on all known benchmarks • Nearly pulled out 10 days before competition Work smart • Solve scalability and speed basics first • Slimmed-down data structure, -msse compiler options, etc. • Ordered list of ~15 QOR ideas to implement • Daily regressions on all known benchmarks • Synthetic testcases to predict bb3, bb4, etc.

  4. Implementation Framework New APlace Flow • APlace weaknesses: • Weak clustering • Poor legalization / detailed placement Clustering Adaptive APlace engine Global Phase Unclustering • New APlace: • New clustering • Adaptive parameter setting for scalability • New legalization + iterative detailed placement Legalization WS arrangement Detailed Phase Cell order polishing Global moving

  5. Clustering/Unclustering • A multi-level paradigm with clustering ratio  10 • Top-level clusters  2000 • Similar in spirit to [HuM04] and [AlpertKNRV05] Algorithm Sketch • For each clustering level: • Calculate the clustering score of each node to its • neighbors based on the number of connections • Sort all scores and process nodes in order as long as cluster size upper bounds are not violated • If a node’s score needs updating then update score and insert in order

  6. Adaptive Tuning / Legalization Adaptive Parameterization: • Automatically decide the initial weight for the wirelength objective according to the gradients • Decrease wirelength weight based on the current placement process Legalization: • Sort all cells from left to right: move each cell in order (or a group of cells) to the closest legal position(s) • Sort all cells from right to left: move each cell in order (or a group of cells) to the closest legal position(s) • Pick the best of (1) and (2)

  7. Detailed Placement Whitespace Compaction: • For each layout row: • Optimally arrange whitespace to minimize wirelength while maintaining relative cell order. [KahngTZ99], [KahngRM04]. Cell Order Polishing: • For a window of neighboring cells • Optimally arrange cell orders and whitespace to minimize wirelength Global Moving: • Optimally move a cell to a better available position to minimize wirelength

  8. Parameterization and Parallelizing Tuning Knobs: • Clustering ratio, # top-level clusters, cluster area constraints • Initial wirelength weight, wirelength weight reduction ratio • Max # CG iterations for each wirelength weight • Target placement discrepancy • Detailed placement parameters, etc. Resources: • SDSC ROCKS Cluster: 8 Xeon CPUs at 2.8GHz • Michigan Prof. Sylvester’s Group: 8 various CPUs • UCSD FWGrid: 60 Opteron CPUs at 1.6GHz • UCSD VLSICAD Group: 8 Xeon CPUs at 2.4GHz Wirelength Improvement after Tuning : 2-3%

  9. Artificial Benchmark Synthesis • Synthetic benchmarks to test code scalability and performance • Rapid response to broadcast of s00-nam.pdf • Created “synthetic versions of bigblue3 and bigblue4 within 48 hours • Mimicked fixed-block layout diagrams in the artificial benchmark creation • This process was useful: we identified (and solved) a problem with clustering in presence of many small fixed blocks

  10. Results

  11. Bigblue4 Placement HPWL = 833.21

  12. Conclusions • ISPD05 = an exercise in process and philosophy • At end, we were still 4% short of where we wanted • Not happy with how we handled 5-day time frame • Auto-tuning  first results ~ best results • During competition, wrote but then left out “annealing” DP improvements that gained another 0.5% • Students and IBM ARL did a really, really great job • Currently restoring capabilities (congestion, timing-driven, etc.) and cleaning (antecedents in Naylor patent)

More Related