1 / 43

Packing and Placement

Packing and Placement. Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223. Packing Example (Homogeneous). Packing Example (Heterogeneous). Netlist. Packing Solution. Architecture.

deliz
Télécharger la présentation

Packing and Placement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Packing and Placement Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

  2. Packing Example (Homogeneous)

  3. Packing Example (Heterogeneous) Netlist Packing Solution Architecture

  4. Architecture Description and Packing for Logic Blocks with Hierarchy, Modes, and Complex Interconnect Jason Luu, Jason Anderson, and Jonathan Rose International Symposium on FPGAs, 2011

  5. AA-Pack 6.0 Algorithm Pick the un-packed mapped LUT with the largest number of attached nets p – Netlist block ; B partially filled logic cluster nets(p, B) – number of shared nets between p and B ext(p, B) – number of pins on p’s nets residing on netlist blocks NOT packed into B packed(p) – number of pins on p’s nets residing on netlist blocks packed into logic clusters OTHER than B num_pins(p)– number of used pins on p (normalizes affinities across netlist blocks with varying numbers of used pins

  6. Legality Challenges • Handle complex logic clusters with hierarchy • Fracturable LUTs • Carry chains • Hard logic circuits • Routability • Sparse crossbar intra-cluster routing

  7. Hierarchical Cluster Example • Strategy: Pack each netlist block into the smallest primitive that can accommodate it • Algorithm: Search the tree bottom-up, from right to left

  8. Ensuring Routability • Basic Check: Does packing the netlist block into the cluster exceed I/O pin availability? • Routability: Build routing graph and run a routing algorithm to determine legality • Routing algorithm details will be discussed next week

  9. Limitations • Focus is area optimization, not timing • Architectural limitations • (Fracturable) LUT-based logic blocks • Fracturable arithmetic blocks (e.g., multipliers) • Memories with reconfigurable aspect ratios • (not discussed) • Mapping assumptions • Different block types cannot accommodate the same netlist block • In reality, could pack a flip-flop into either a LUT- or multiplier-based block

  10. Toward Interconnect-Adaptive Packing for FPGAs Jason Luu, Jason Anderson, and Jonathan Rose International Symposium on FPGAs, 2014

  11. AA-Pack 7.0 • Calling the router repeatedly during packing is computationally expensive • Speculative Packing: avoid unnecessary calls to the router • Interconnect-Aware Pin Counting: Quickly find unroutable instances based on pin demand • Pre-packing: Support inflexible routing structures • E.g., carry chains • Other bells and whistles • Accurate timing model • Best-fit placement • Better support for high-fanout nets

  12. Speculative Packing • FPGA 2011 Implementation • Call the router to check legality each time a new block is packed into the cluster • FPGA 2014 Implementation • Fill the logic block to capacity, then call the router • If a legal route is found, we’re done • Otherwise, re-pack the block using the FPGA 2011 approach • Works because the common case is that a legal route is found

  13. Interconnect-Aware Pin Counting • Partition I/O pins into classes based on interconnect structure • When each netlist block is packed, check the demand for each pin class • Reject the block if demand exceeds supply for any pin class

  14. Example

  15. Properties and Limitations • An optimistic filter • Cases that fail are not routable • Cases that pass may or may not be routable • Sparse interconnect is approximated as fully connected • Does not account for situations where a net routes through a sub-cluster without connecting to any primitives in that subcluster • Internal feedback/feedforward connections within a logic cluster are discovered before packing and accounted for during pin counting • Gives a pass/fail answer • Does not help to guide future candidate selection

  16. Pre-packing • Inflexible routing structures • Incorrect grouping or placement of netlist blocks may fail routing • The architect enumerates “pack patterns” to describe each structure • Before packing, identify netlist sub-graphs that match “pack patterns” • Group them together and match them to logic cluster primitives that match the “pack pattern” • Pack Patterns • Multiply-add • Registered multiply • Registered add • Registered multiply-add

  17. Experiments

  18. Results

  19. Timing-Driven Placement for FPGAs Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose International Symposium on FPGAs, 2000

  20. Placement

  21. Simulated Annealing

  22. VPlace (Pre-dates this paper) • Strategy: Minimize interconnect overhead

  23. Timing Analysis • For a placed and routed net • How much delay can we add to a net before it becomes critical?

  24. T-VPlace (This Paper) • Optimize Timing + Wiring Complexity • Delay approximation • FPGAs are uniform • Store delays (Δx, Δy) in a ROM • Model a two-terminal net with source at (xsource, ysource) and target at (xsource + Δx, ysource + Δy) • Reduce the allowable move distance over time αis the fraction of attempted moves that were accepted at the previous temperature

  25. Timing Cost and Objective Sum the timing costs of all source-sink pairs Heavily weight critical nets Maximum delay of all nets in the circuit

  26. Annealing Schedule • Number of moves to perform at each temperature • Vary the temperature as the algorithm progresses • Termination criteria Default value is 10 αis the fraction of attempted moves that were accepted at the old temperature Told

  27. VPlace vs. T-VPlace

  28. Improving Simulated Annealing-Based FPGA Placement with Directed Moves KristoferVorwerk, Andrew Kennings, and Jonathan W. Greene IEEE Transactions on CAD 28(2): 179-192 (2009)

  29. Motivation: an annealer may spend significant time revisiting previously explored states before it finds the lowest cost state • Coax the annealer into exploring neighbor states that are more likely to yield an improvement

  30. Simple “Moves” (T-Vplace) • Randomly select a cell • Move a cell to an unoccupied target location • Swap the location of two cells • Location selection • Random shrinking window αis the fraction of attempted moves that were accepted at the previous temperature

  31. Heuristics to Determine Source Cells • Random • VPR • Graph coloring • Color the netlist before placement • Chose up to 15 non-adjacent (same color) cells at a time • Priority list • Randomly choose among the 25% worst placed cells • Position (details to follow) • Timing cost of paths

  32. Heuristics to Determine Target Locations • Random • VPR • Linear assignment • Details omitted • Median placement and variants • Details on the next slide • Priority list

  33. Median Placement • Compute bounding boxes for all nets omitting source pins • Take x and y minimums and maximums • Put points into vectors and sort • Define a rectangle by the median and median+1 entries in each vector • Randomly select a new target location within the rectangle

  34. Cell Rippling • Rippling directions are chosen randomly Nearest empty location to B

  35. Quality Factor of a Move • piis the probability that the move is accepted • Use previous annealing iteration to determine the probabilities empirically • Pprev(i) is P(i) from the previous iteration

  36. Results 4 BLEs per cluster 8 BLEs per cluster

  37. Improving FPGA Placement with Dynamically Adaptive Stochastic Tunneling Mingjie Lin and John Wawryznek IEEE Transactions on CAD 29(12): 1858-1869 (2010)

  38. Simulated Annealing (Conceptual) Stochastic Tunneling

  39. Simulated Annealing Weaknesses • Sensitivity to parameters • Quite a few • Interactions between them not understood • Freezing problem • Unable to escape local minima • Prevalent at low temperatures where bad moves are accepted with a very low probability

  40. Acceptance Criteria for Bad Moves “Energy” of the current solution being evaluated • Simulated Annealing • Stochastic Tunneling • “Energy” of the best solution found so far • Continually adjusted as better solutions are found Tunneling parameter

  41. Stochastic Tunneling (Conceptual)

  42. Stochastic Tunneling Pseudocode

  43. Results Averages: 89.44 92.06 422.5 363.7 87.72 488.5 10.17 8.86 9.54

More Related