FPGA Routing

# FPGA Routing

## FPGA Routing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. FPGA Routing Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

2. Routing Resource Graph (RRG) source wire3 wire4 2-LUT out out wire3 wire4 in2 in1 wire2 wire1 in2 in1 wire1 wire2 sink www.eecg.toronto.edu/~aling/ece1718/project/fang/route_rr_graph.png

3. FPGA Routing • Disjoint-path problem (on the RRG); NP-complete • Input: • Graph G(V, E) • Set of sources S = {s1, s2, …, sm} • Set of sets of sinks T = {T1, T2, …, Tn}, Ti= {ti1, ti2, …, tik} • Solution • Finds paths from each source si to all sinks in Ti • Paths emanating from different sinks must be disjoint (cannot shared any vertices or edges) • Objective(s) • Minimize delay, wirelength, etc.

4. Disjoint and Non-disjoint Paths s1 t21 s2 t11

5. Disjoint and Non-disjoint Paths Two nets are routed on the same FPGA segment (remember, RRG vertices represent wires and CLB I/Os) s1 t21 s2 t11 This route is illegal!

6. Disjoint and Non-disjoint Paths s1 t21 s2 t11 This route is legal!

7. LUT Input Equivalence (1/3) 2-LUT 2-LUT s1 s1 in1 in1 s2 s2 in2 in2 LUT inputs are interchangeable

8. LUT Input Equivalence (2/3) 2-LUT 2-LUT s1 s1 in1 in1 s2 s2 in2 in2 s1 s1 t11 = in1 t21 = in1 s2 s2 t21 = in2 t11 = in2 Overly restrictive disjoint-path problem formulation

9. LUT Input Equivalence (3/3) 2-LUT 2-LUT s1 s1 in1 in1 s2 s2 in2 in2 s1 t s2 Represent all inputs of a LUT as one RRG sink, t

10. PathFinder: A Negotiation-Based Performance-Driven Router for FPGAs Larry McMurchie and Carl Ebeling International Symposium on FPGAs, 1995

11. Architecture and CAD for Deep Submicron FPGAs(Section 2.2.3, pp. 25-30) Vaughn Betz, Jonathan Rose, Sandy Marquardt Springer, 1999

12. Basic Plan • Route nets one-by-one • Each net is routed by a priority-driven breadth-first search • Multiple nets may use the same nodes and edges • Node/edge costs depend on current usage and past usage history • Repeat for a fixed number of iterations (say 200) • Success: A disjoint routing solution for for all nets • Failure: No disjoint solution found after a fixed number of iterations

13. Routability Cost Function (per Node) • C(n) = p(n) × (b(n) + h(n)) • b(n): base cost of routing through node n • Typically the intrinsic delay of the circuit element corresponding to the node • h(n): history cost of routing through node n, based on previous router iterations • p(n): depends on the number of signals presently routed through node n in the current iteration

14. Timing-Routability Cost Function

15. First-Order Congestion A 2 2 s1 t1 1 3 1 3 B 1 1 s2 t2 1 4 4 1 s3 t3 3 C 3

16. First-Order Congestion A 2 2 s1 t1 • All paths route through B • Increase the penalty cost of B per iteration due to sharing • In a future iteration, it will be cheaper to route signal 1 through A, rather than B • In a future iteration, it will be cheaper to route signal 2 through C, rather than B • Requires gradual increase in cost of sharing nodes 1 3 1 3 B 1 1 s2 t2 1 4 4 1 s3 t3 3 C 3

17. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 1 1 B 2 2 s2 t2 1 1 s3 t3 1 C 1

18. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 1 1 B 2 2 s2 t2 1 1 s3 t3 1 C 1

19. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 1 1 B 2 2 s2 t2 1 1 s3 t3 1 C 1

20. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 • Two paths route through C • Increasing the history cost through C will push signal 2 onto an alternative path through B 1 1 B 2 2 s2 t2 1 1 s3 t3 1 C 1

21. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 • Two paths route through C • Increasing the history cost through C will push signal 2 onto an alternative path through B 1 1 B 2 2 s2 t2 3 3 s3 t3 3 C 3

22. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 • Two paths route through B • Increasing the history cost through B will push signal 1 onto an alternative path through A • Increasing the history cost through B will push signal 2 back onto the path through C 1 1 B 2 2 s2 t2 3 3 s3 t3 3 C 3

23. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 • Two paths route through B • Increasing the history cost through B will push signal 1 onto an alternative path through A • Increasing the history cost through B will push signal 2 back onto the path through C 3 3 B 4 4 s2 t2 3 3 s3 t3 3 C 3

24. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 • Two paths route through C • Increasing the history cost through C will push signal 2 back through B 3 3 B 4 4 s2 t2 3 3 s3 t3 3 C 3

25. Second-order Congestion A 2 2 s1 Routing Order: 1, 2, 3 t1 • Two paths route through C • Increasing the history cost through C will push signal 2 back through B 3 3 B 4 4 s2 t2 5 5 s3 t3 5 C 5

26. Pseudocode • Priority driven BFS • Computed from the source to every sink of net(i) • Connect the newly found path to the current sink to the partly-computed routing tree RT(i) for net I

27. Timing-Routability Cost Function Delay of critical path • Represents the estimated criticality of the path from source I to sink j on net(i) • net(i) may fan out to multiple sinks with varying criticality, depending on placement

28. Negotiated A* Routing for FPGAs Russell Tessier Fifth Canadian Workshop on Field Programmable Devices, 1998

29. Basic Idea

30. A* Cost Function • Cost at a node ni fi = gi + di gi= fi-1+ ci Estimated cost of routing from ni to the sink Cost of routing from the source to ni Cost of the next candidate node Total cost of previous path

31. A* vs. BFS A*:fi= gi + di = (fi-1 + ci) + di BFS: fi= gi Hybrid:fi = (1 - α)gi+ αdi = (1 - α)(fi-1 + ci)+ αdi

32. Good Ideas, But… • Algorithm and implementation assumed disjoint switch block • i.e., disjoint global routing domains • Key issue: • Which routing domains (reachable from the source) are explored in what order?

33. A Fast Routability-driven Router for FPGAs Jordan S. Swartz, Vaughn Betz, Jonathan Rose International Symposium on FPGAs, 1998

34. Architecture and CAD for Deep Submicron FPGAs(Section 4.3, pp. 76-80) Vaughn Betz, Jonathan Rose, Sandy Marquardt Springer, 1999

35. Speed Enhancements • Directed depth-first search • Similar to A*, but arguably more aggressive • Key Issues: • Cost function • Sink (target) ordering for multi-terminal nets • Route classification • Low-stress • Difficult • Impossible

36. Cost Function Cost(n) = C(n) + Costprev + α(ΔD) • Costprev: cost of track segments used to reach the current segment • C(n): (base) cost of using the track segment • Increases more rapidly than the original PathFinder algorithm • (ΔD): estimated cost of routing from the current track segment to the destination • Based on Manhattan (XY) distance) • α: Direction factor • BFS: α = 0 • Directed search: α > 0

37. Updated Node (Base) Cost Function • C(n) = p(n) × (b(n) + h(n)) • b(n): base cost • h(n): history cost • p(n): usage cost • C(n) = p(n) × b(n) × h(n) + Bendcost(n, m) • Multiplying b(n) and h(n) eliminates normalization • Penalize global routes that take a lot of turns • These routes are less likely to use long wires

38. Penalty Cost Function • Not specified in the original PathFinder paper p(n) = 1 + max(0, [occupancy(n) – capacity(n)] × pfac) • occupancy(n): number of nets using resource n • capacity(n): maximum number of nets that can use resource n (typically 1) • pfac: 0.5 for the first iteration; increase by 1.5x to 2x each subsequent iteration

39. History Cost Function • Not specified in the original PathFinder paper • occupancy(n): number of nets using resource n • capacity(n): maximum number of nets that can use resource n (typically 1) • hfac: constant; any value between 0.2 and 1 works well

40. Target Selection • Closest Sink First • Uses fewer track segments • Furthest Sink First • Uses more track segments

41. Net Order • Route nets in order of decreasing fan-out • High fanout nets tend to span the whole FPGA • Easier to route when there is less congestion do to other nets routed earlier • Low fanout nets tend to be more localized • Relatively easy to route, even in the presence of some congestion

42. Binning • Only the portions of the net close to the target sink should be expanded • E.g., only expand within Bin 4 in this example • Shown to be effective for sinks with more than 50 targets

43. Bounding Box • Define a bounding box around source and sinks • Restrict the route for each net to no more than 3 channels outside of the bounding box

44. Routing High-fanout Nets

45. Difficulty Prediction • Since you can’t know Wmin first without routing, use an estimate based on a wirelength prediction model

46. Routability Results

47. Routing Times as W increases

48. Routing Time vs. W (clma Benchmark)

49. BFS vs. Directed Search (w/Binning)

50. Difficulty Prediction • Westimate took less than 1 second to compute • Some prediction mistakes will happen • Prediction accuracy was 84%