1 / 30

Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAs

Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAs. Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer Engineering University of Toronto {yeandy, jayar, lewis}@eecg.utoronto.ca. Outline. Motivation Datapath regularity An datapath-oriented FPGA

yitta
Télécharger la présentation

Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architecture of Datapath-oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer Engineering University of Toronto {yeandy, jayar, lewis}@eecg.utoronto.ca

  2. Outline • Motivation • Datapath regularity • An datapath-oriented FPGA • Architecture • CAD flow • Experimental results • Area efficiency • Conclusion

  3. Modern FPGAs • Very large logic capacities • Over 10 million equivalent logic gates • Increasingly used to implement large and complex applications • Central processing units • Graphics accelerators • Digital signal processors • Packet switching networks

  4. Datapath Circuits • Large applications • Contain a greater amount of datapath circuits • Datapath circuits • Consist of multiple identical logic structures called bit-slices • Regularity • Predictability

  5. B0 C0 A0 C1 A1 B1 C2 A2 B2 B3 C3 A3 An Example Full Adder Full Adder Full Adder Full Adder Carry In Carry Out

  6. An Example

  7. Research Goal • Design a new FPGA architecture • Utilize datapath regularity • Reduce the implementation area of datapath circuits on FPGAs • Implement a full set of CAD tools for the new architecture • Synthesis • Packing • Placement • Routing

  8. Key Architectural Features • A bus-oriented logic block architecture • A mixture of coarse-grain tracks and fine-grain routing tracks

  9. L L L L L Logic Block S S Switch Block Coarse grain routing tracks Fine grain routing tracks Datapath FPGA Overview Routing Channels

  10. BLE BLE BLE BLE BLE BLE BLE BLE BLE MUX LUT BLE BLE BLE BLE Local Routing Network BLE DFF BLE BLE BLE BLE BLE M BLE A Basic Logic Element (BLE) A Cluster Logic Block — Super-cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4

  11. L L L L L Super-cluster S S Switch Block Coarse grain routing tracks Fine grain routing tracks Datapath FPGA Overview Routing Channels

  12. Super-cluster Cluster Cluster Cluster Cluster M Switch Block M M M M M Fine-grain Routing M Coarse-grain Routing Coarse-grain Routing Tracks

  13. CAD Flow • CAD flow for the datapath-oriented FPGA consists of • Synthesis • Packing • Placement • Routing • Conventional CAD flow • Minimize area and delay metrics • Destroy datapath regularity

  14. Datapath-oriented CAD Flow • Preserve datapath regularity (bit-sliced structures) • Map the preserved regularity onto the datapath-oriented FPGA architecture • Maximize the utilization of coarse-grain routing tracks • Minimize the implementation area of datapath structures

  15. Datapath Representation • Datapath circuits are represent by netlists of datapath components (VHDL or Verilog) • Datapath component library • Multiplexers • Adders/subtracters • Shifters • Comparators • Registers • Each component consists of identical bit-slices

  16. Synthesis • Enhanced module compaction algorithm • Based on the Synopsys FPGA compiler • Augmented with several datapath-oriented features • Preserve datapath regularity by preserving bit-slice boundaries • Achieve as good area results as the conventional synthesis tools

  17. a3 a0 a2 b0 b3 b2 a1 b1 mux mux mux mux c2 c0 c3 c1 d3 d0 d2 d1 + + + + s2 s3 s0 s1 An Example Datapath Circuit sel cin cout

  18. a0 b0 c0 sel 4-LUT cin d0 4-LUT + 4-LUT s0 Synthesis a0 b0 sel mux c0 d0 cin s0

  19. a3 a2 a0 a1 b2 b0 b3 b1 c3 c1 c0 c2 sel sel sel sel 4-LUT 4-LUT 4-LUT 4-LUT cin d0 d1 d2 d3 4-LUT 4-LUT 4-LUT 4-LUT cout 4-LUT 4-LUT 4-LUT 4-LUT s0 s1 s2 s3 Synthesis

  20. Packing • Based on the T-VPACK packing algorithm • Pack adjacent bit-slices into super-clusters • Utilize carry connections in super-clusters to minimize the delay of carry chains

  21. An Example • Four clusters per super-cluster • Two BLEs per cluster • Six inputs per cluster BLE BLE BLE BLE BLE BLE BLE BLE

  22. a0 a0 b0 b0 c0 c0 sel sel 4-LUT cin d0 4-LUT 4-LUT s0 Packing Into Clusters BLE BLE BLE d0 cin BLE BLE BLE BLE s0

  23. a2 a1 a3 a0 b1 b3 b2 b0 c0 c1 c3 c2 sel sel sel sel Packing Into Super-clusters BLE BLE BLE BLE BLE BLE BLE BLE d0 cin d1 d2 d3 BLE BLE BLE BLE BLE BLE BLE BLE s0 s1 s2 s3 cout

  24. Placement • Based on the VPR placer • Use simulated annealing algorithm • For super-clusters containing datapath circuits • Move super-clusters only • For super-clusters containing non-datapath circuits - Move individual clusters

  25. Routing • Based on the VPR router • Use the path finder algorithm • As much as possible • Route buses through coarse-grain routing tracks • Route individual signals through fine-grain routing tracks • When necessary • Use coarse-grain routing tracks for individual signals • Use fine-grain routing tracks for buses

  26. Area Efficiency • Benchmarks • 15 datapath circuits from the Pico-java processor • Architectural assumptions • Four BLEs per cluster • Four clusters per super-cluster • Four coarse-grain tracks sharing configuration memory • Logic track length of two • Disjoint switch block topology • Architectural variables • Number of coarse-grain tracks

  27. Area Efficiency normalizedcircuit area circuit area in minimumtransistor area (x106) 100.0% 1.60 95.0% 1.50 90.0% 1.40 0% 0%- 10% 10%- 20% 20%- 30% 30%- 40% 40%- 50% 50%- 60% 60%- 70% % of coarse-grain tracks

  28. Logic Track Length Vs. Area • Architectural assumptions • Four clusters per super-cluster • Four coarse-grain tracks share configuration memory • 50% of tracks are coarse-grain tracks • Disjoint switch block topology • Architectural variables • Number of BLEs per cluster • Logic track length

  29. Logic Track Length Vs. Area circuit area inminimum transistor area (x106) N = 2 N = 4 2.20 N = 8 2.00 N = 10 1.80 track length 1.60 1 2 4 8 16

  30. Conclusion • Proposed a datapath-oriented FPGA architecture and its CAD tools • Best area is achieved when • 40% - 50% of tracks are coarse-grain routing tracks • Four BLEs per cluster • Logic track length of two • Best area is 9.6% smaller than conventional FPGAs

More Related