1 / 39

Before Placement: Clustering

ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ece506 Lecture 6 Clustering Ali Akoglu. Before Placement: Clustering. Intra -cluster connections: fast Inter -cluster connections: slow Need to pack BLEs Goals : Reduce stress on routing

efuru
Télécharger la présentation

Before Placement: Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 506Reconfigurable Computinghttp://www.ece.arizona.edu/~ece506Lecture 6ClusteringAli Akoglu

  2. Before Placement: Clustering • Intra-cluster connections: fast • Inter-cluster connections: slow Need to pack BLEs • Goals: • Reduce stress on routing • Take advantage of local fast interconnect • Reduce inter-cluster wiring • Minimize critical path (timing-driven) • How do we do this • Take advantage of cluster architecture • Tradeoffs

  3. Basic Clustering (Betz) • How many distinct inputs should be provided to a cluster of N 4-LUTs? • How many 4 LUTs should be included in a cluster to create the most area-efficient logic block?

  4. VPACK

  5. Basic Clustering (Betz) • Flow • Iterate until all BLEs consumed • Start new cluster by selecting a random BLE • select the currently unclustered BLE with the most used inputs, • Add BLE with most shared inputs with current cluster to cluster • to minimize the number of inputs that must be routed to each cluster. • Keep adding until either cluster full or input pins used up • Hill climbing – if some cluster BLEs unused • Add another BLE even if cluster input count temporarily overflowed • If input count not eventually reduced select best choice from before hill climbing

  6. Logic Utilization

  7. Number of Inputs per Cluster • Lots of opportunities for input sharing in large clusters (Betz – CICC’99) • Reducing inputs reduces the size of the device and makes it faster. • Most FPGA devices (Xilinx, Lucent) have 4 BLE per cluster with more inputs than actually needed.

  8. TVPACK

  9. Architecture Modeling Tri-state buffer and pass transistor distribution Cluster Size vs. Routing resources (Tile size) Transistor and Buffer Scaling based on segment length Flexibility of Switches (Fc=W for large cluster size is a waste?)

  10. Logic Cluster Structure

  11. Timing-Driven Clustering – T-VPACK • Optimization goals of VPack • Pack each cluster to its capacity • Minimize number of clusters • Minimize number of inputs per cluster • Reduce the number of external connections

  12. Timing-Driven Clustering – T-VPACK • Optimization goal of T-VPack • Minimize number of external connections on critical path • Why? • External connections have higher delay and internal connections • Reducing number of external nets on critical path will reduce delay

  13. Timing-Driven Clustering – T-VPACK • First stage • Identify connections that are on the critical path • Second Stage • Pack BLEs sequentially along the critical path • Recompute criticality of remaining BLEs

  14. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation

  15. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 0 0 0 Arrival Times

  16. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 0 3 0 3 3 0 1 Arrival Times

  17. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 0 7 0 9 3 7 0 1 7 Arrival Times

  18. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 0 13 7 0 15 9 3 7 0 14 1 7 Arrival Times

  19. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 0 18 13 7 0 22 15 9 3 7 0 18 14 1 7 Arrival Times

  20. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 0 18/22 13 7 0 22/22 15 9 3 7 0 18/22 14 1 7 arrival time/required time

  21. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 0 18/22 13 7 0 22/22 15 / 15 9 3 7 / 15 0 18/22 14/ 18 1 7 arrival time/required time

  22. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 0 18/22 13 7 0 22/22 15 / 15 9 3 7 / 15 0 18/22 14 / 18 1 7/ 13 arrival time/required time

  23. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 13 / 15 0 18/22 7 0 22/22 15 / 15 9 3 7 / 15 0 18/22 14 / 18 1 7/ 13 arrival time/required time

  24. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 1 13 / 15 0 18/22 7 / 9 0 22/22 15 / 15 9 / 9 3 7 / 15 0 18/22 14 / 18 1 7/ 13 arrival time/required time

  25. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 13 / 15 1 / 5 0 18/22 7 / 9 0 22/22 15 / 15 9 / 9 3 / 3 7 / 15 0 18/22 14 / 18 1 / 9 7/ 13 arrival time/required time

  26. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 13 / 15 1 / 5 0 / 4 18/22 7 / 9 0 / 0 22/22 15 / 15 9 / 9 3 / 3 7 / 15 0 / 8 18/22 14 / 18 1 / 9 7/ 13 Slack = required time - arrival time

  27. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 2 4 4 4 2 0 0 0 0 0 8 8 4 4 8 6 Slack = required time - arrival time

  28. PI1 PO1 1 4 5 6 3 6 7 6 PI2 PO2 1 4 4 5 PI3 PO3 Slack and Criticality Calculation 13 / 15 1 / 5 0 / 4 18/22 7 / 9 0 / 0 22/22 15 / 15 9 / 9 3 / 3 7 / 15 0 / 8 18/22 14 / 18 1 / 9 7/ 13 Critical Path

  29. Timing-Driven Clustering – T-VPACK • Cost metric now considers both connectivity and timing criticality • Perform an analysis of criticality at beginning considering all wires to be inter-cluster • Determine “Base” BLE criticality

  30. Base Criticality

  31. How to break ties? • Initially, many paths may have the same number of BLEs • Include “tie-breaking” in performance cost function

  32. Results for T-VPACK versus VPACK Why does the gap between VPack and T-VPack increase as N increases?

  33. Results for T-VPACK versus VPACK • T-VPack prefers to cluster a BLE with BLEs that are in its fan-in or fan-out • VPack favors input sharing • T-VPack completely absorbs many low-fanout nets • Fewer nets to route!

  34. Results for T-VPACK versus VPACK Why does area-delay product show an increasing trend beyond cluster size of 10?

  35. Results for T-VPACK versus VPACK • Increased number of nets that are completely absorbed by T-Vpack • Area- delay product • Cluster size 7-10 best choice (36-34% better than N=1) • N=7 vs N=1 • 30% less delay, 8% les area

  36. Results for T-VPACK, DELAY !!! Why do we see a circuit speedup?

  37. Results for T-VPACK, DELAY !!! 18% 40% • Intra-cluster: Fast, Inter-cluster: Slow ! • As N increases • Number of internal connections on the critical path increase • Number of external connections on the critical path decrease

  38. Why are inter-cluster connections becoming faster? Reduction in Number of external connections (internal connections are faster) External connections on the critical path are becoming faster Reduction in routing requirements

  39. Drawback of VPack and T-VPack

More Related