1 / 29

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates. Yu Hu 1 , Satyaki Das 2 , Steve Trimberger 2 , and Lei He 1 1. Electrical Engineering Dept., UCLA 2. Research Labs, Xilinx Inc. Presented by Yu Hu Address comments to lhe@ee.ucla.edu. Outline.

tarannum
Télécharger la présentation

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu1, Satyaki Das2, Steve Trimberger2, and Lei He1 1. Electrical Engineering Dept., UCLA 2. Research Labs, Xilinx Inc. Presented by Yu Hu Address comments to lhe@ee.ucla.edu

  2. Outline Introduction Design of the Macro-gates Synthesis for the Proposed FPGA Architecture Comparison of Heterogeneous FPGA Architectures Conclusions and Future Work

  3. Heterogeneity in FPGA Architectures • Heterogeneity among SLICEs • Programmable logic and routing • Tiles are not identical • soft logic fabric [Kaviani, FPGA’96]] • hard structures [Jamieson, FPL’05] • Dedicated hard structures • e.g. DSP • e.g memory block • Heterogeneity within a SLICE • Programmable logic and routing • Tiles (SLICEs) are identical • Different logics exist within a SLICE • e.g. LUTs with different size [Cong, FPGA’99] • e.g. mixed PLAs and LUTs [Cong, TODAES’05] • e.g. mixed macro-gates and LUTs (source: Jamieson@FPL’05)

  4. Heterogeneous FPGA with Macro-Gates • There exists programmability and cost trade-off between LUTs and macrogates • Xilinx V4 benefits from small gates (MUX2, XOR2) built in SLICEs. • The benefit of wider macro-gates • Effectiveness of the incorporation of wider logic functions (macro gates) is not clear. • Our contributions • Design a new FPGA architecture with mixed LUTs and macro-gates • Propose a new automatic synthesis flow for mapping a circuit to the proposed FPGA architecture • Evaluate the architecture and show that the proposed architecture reduces delay and area by 16.5% and 30%, respective, compared to the LUT-only architecture.

  5. Outline • Introduction • Design of the Macro-gates • Synthesis for the Proposed FPGA Architecture • Comparison of Heterogeneous FPGA Architectures • Conclusions and Future Work

  6. Overview of Macro-Gate Design • Key problem • Select the logic functions for the macro-gate • Problem formulation: • Input: a set of training circuits, which have been mapped to K-input LUTs • Output: N K-input Boolean functions: f1 , … , fN • Objective: Maximize the number of logics (in the training circuit set) which can be implemented by f1 , … , fN • The proposed solution • Ranking of the logic functions for a set of training circuits

  7. Level3: 3-input Level2: 2-input Level1: 1-input Level0: constant NPN-Class Diagram: Organization of Logics • Canonical and efficient representation of all NPN classes • NPN-Equivalent: functional equivalency under inputs negation, permutation or output negation • E.g., f(a,b,c)=a+bc, g(a,b,c)=b’a+b’c • NPN-Cofactor relationship is indicated • DAG: easy to manipulate • It becomes impractical to compute for more than 6-input functions! • Solution: Utilization NPN-Class Diagram Wider inputs

  8. UND: Utilization NPN-Class Diagram • UND is an DAG, sub-graph of NCD • Help for scoring and ranking functions ab’c’+a’bc’ ab’c’+a’bc’ / 1 / xx% abc/ 1 / xx% abc ab’+a’b a ab’+a’b / 0 / xx% ab / 0 / xx% a / 0 / xx% Implementation capability -0- / 0 / xx% functionality Appearance frequency

  9. UND: Utilization NPN-Class Diagram ab’c’+a’bc’ ab’c’+a’bc’ / 1 / xx% abc/ 1 / xx% abc ab’+a’b a ab’+a’b / 1 / xx% ab’+a’b / 0 / xx% ab / 0 / xx% a / 0 / xx% a / 1 / xx% -0- / 0 / xx%

  10. UND: Utilization NPN-Class Diagram • Calculate Implementation Capability ab’c’+a’bc’ ab’c’+a’bc’ / 1 / 75% abc/ 1 / 50% abc ab’+a’b a ab’+a’b / 1 / 50% ab / 0 / 25% The topology property (DAG) of UND enables us to efficiently explore different metrics for functionality ranking, e.g.,utilization rate. a / 1 / 25% -0- / 0 / xx% Fanout cone of ab’c+a’bc’

  11. f LUT ab’c’+a’bc’ / 1 / xx% ab’c’+a’bc’ / 1 / 75% abc/ 1 / 50% abc/ 1 / xx% g 1+1*2/3+1*1/3=2 1+1*1/3=1.33 and2(3) LUT d ab’+a’b / 1 / 50% ab’+a’b / 1 / xx% ab’+a’b / 0 / xx% ab / 0 / 25% ab / 0 / xx% F e 1*1/2=0.5 1+1*1/2=1.5 h a / 0 / xx% a / 1 / 25% a / 1 / xx% b LUT 1 a -0- / 0 / xx% -0- / 0 / xx% nand2(2) c inv(1) Recap: Overall Flow for Macro-Gate Design 0000001000000000 0000010000000000 0000100000000000 0001000000000000 0010000000000000 0100000000000000 …… Map with LUT-N Extract logic functions Generate Utilization NPN Diagram Calculate score For logic functions Rank logic functions Best function: ab’c’+a’bc’

  12. Proposed Macro-Gates and FPGA Architecture • For IWLS’05 benchmarks, the following four 6-input functions have the highest ranks • GI1=a b c d e f (AND-6) • GI2=a’ b’ c’ + b c f’ + b c’ d’ + b’ c e (MUX-4) • GI3=a b' c d' e + b c e f + d e f • GI4=a b' + a' c d' + b' c' + e' + f‘ • It can implement over 50% of logic functions in IWLS’05 benchmarks. • The architecture of the proposed macro-gate and FPGA SLICE are

  13. Outline • Design of the Embedded Macro-gates • Synthesis for the Proposed FPGA Architecture • Technology Mapping for Heterogeneous FPGAs • SAT-based Packing • Place and Routing • Comparison of Heterogeneous FPGA Architectures • Conclusions and Future Work

  14. w z x y c a b Yes d Functional & Structural Cut Enumeration b=y+wz a=(x+y)’ 4-input macro gate lib 0000001000000000 0000010000000000 0000100000000000 0001000000000000 0010000000000000 0100000000000000 …… d=ab=(x+y)’(y+wz)=x’y’wz Is x’v’wz in library? • Phase1:Enumerate and label cuts from PIs to Pos • Check the feasibility of a cut w.r.t. the macro-gate • Phase2:Select best choice from POs to Pis • A general yet efficient solution is SAT based Boolean matching • Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology Mapping , Session 5C.1, ICCAD 07

  15. Key in Technology Mapping: Balance Resource Utilization • Asymmetric architecture causes problem to resource utilization • Exclusively use of one logic resource leads to lots of unused fabric • Simple yet effective solution : • Change LUT-MG ratio by adjusting their area weights. • Precise calibration is hard to reach by this approach. Total# too large! Objective architecture: LUT6:MacroGate6 =1:1 Hard to obtain precise calibration Best LUT-MG ratio = 1:1 LUT-MG ratio = LUT#/MG#

  16. MG6 MG6 MG6 MG6 Post-Mapping Area Recovery (motivation example) • Given: • Target architecture = LUT6 + MG6 • LUT-MG ratio in target architecture = 1:1 • LUT# < MG# in the mapped design • Intrinsic delay (LUT6 : MG6) = 5:4 • Objective: balance LUT MG number without increasing delay 5 / 5 9 / 13 PO LUT6 PI 17 / 17 9 / 9 13 / 13 4 / 5 MG6 MG6 8 / 9

  17. MG6 MG6 MG6 Post-Mapping Area Recovery (motivation example) • Given: • Target architecture = LUT6 + MG6 • LUT-MG ratio in target architecture = 1:1 • LUT# < MG# in the mapped design • Intrinsic delay (LUT6 : MG6) = 5:4 • Objective: balance LUT MG number without increasing delay 5 / 5 10 / 13 PO LUT6 LUT6 PI 17 / 17 9 / 9 13 / 13 4 / 5 MG6 MG6 8 / 9

  18. MG6 MG6 MG6 Post-Mapping Area Recovery (motivation example) • Given: • Target architecture = LUT6 + MG6 • LUT-MG ratio in target architecture = 1:1 • LUT# < MG# in the mapped design • Intrinsic delay (LUT6 : MG6) = 5:4 • Objective: balance LUT MG number without increasing delay Timing slack budgeting is necessary! 5 / 5 10 / 13 PO LUT6 LUT6 PI 18 / 17 9 / 9 14 / 13 5 / 5 LUT6 LUT6 Timing target violation! 10 / 9

  19. MG6 MG6 MG6 MG6 MG6 MG6 Post Mapping Area Recovery by Timing Budgeting • Formulated as an Integer Linear Programming (ILP) Problem • Objective (minimize gap between target and actual LUT-MG ratios): min |m2+…+m7-7/2| • Arrival time constraints: ai+dj+bj<=aj • Clock period target: ai<=17 • LUT assignment with given timing slack: (5-4)*mj<=bj, mj={0,1} a1 • Easy to be generalized to handle arch • with multiple macro gates • with different input pin numbers a2 PO LUT6 PI a3 a5 a4 a6 a7

  20. Outline • Design of the Embedded Macro-gates • Synthesis for the Proposed FPGA Architecture • Technology Mapping for Heterogeneous FPGAs • SAT-based Packing • Comparison of Heterogeneous FPGA Architectures • Conclusions and Future Work

  21. SAT-Based Packing • Motivation • Traditional packing tools, e.g., T-VPack, hard-codes the architecture specification of a SLICEs…. • Re-impalement from scratch when architecture changes • Propose a unified implementation of the packers for different architectures: easy to perform architecture exploration! • The architecture dependent sub-problem in packing • Structural feasibility checking for a sub-circuit to the SLICE • Solution • Solve the problem of validating SLICE packing as a local place&route problem • A SAT solver is used to carry out the validation checking

  22. Example of SAT-Based SLICE Packing • Examples of constraints: (for each classes of constraint…) • Placement and routing choice variables: X@A, X@B, U5@N10 • Exclusively constraint: (¬X@A) ∨ (¬X@B) • Presence constraint: (X@A) ∨ (¬X@B) • Input/Output constraint: X@A → U5@N10 • Routing constraint: G0 →out ∧ U5@N10) → U5@N12

  23. f LUT g LUT LUT d F e LUT6 LUT6 MG6 MG6 MG6 h b LUT6 LUT a MG6 MG6 MG6 LUT6 MG6 MG6 MG6 LUT6 LUT c LUT6 MG6 MG6 MG6 LUT MG6 Recap: Overall Synthesis Flow Area weight Setting Cut-based Mapping Y Area-Balance Trade-off? Post-mapping Area recovery N packing

  24. Outline • Motivation and Objectives • Methodology for Logic Function Exploration • Technology Mapping for Heterogeneous FPGAs • Evaluation of Heterogeneous FPGA Architectures • Conclusions and Future Work

  25. Experimental Setting • Design library parameters [Cong, TODAES’05] • Benchmark set: IWLS 2005 • Four architectures are compared: • LUT4, LUT4 + macro gate, LUT6, and LUT6 + macro gate • Synthesize the proposed macro-gate by SIS1.2 • Delay and area model • Interconnect delay is igonired

  26. Delay Comparisons • Compared to LUT4, LUT4+MG reduces both logic depth and delay by 9.2%. • Compared to LUT6, LUT6+MG reduces delay by 30% while increasing logic depth by 36.5%. • A LUT6 can implement more logics than a macro-gate

  27. Logic Area Comparisons • Compared to LUT4, LUT4+MG reduces logic area by 12.5%. • Compared to LUT6, LUT6+MG reduces logic area by 16.9%.

  28. Outline • Motivation and Objectives • Methodology for Logic Function Exploration • Technology Mapping for Heterogeneous FPGAs • Comparison of Heterogeneous FPGA Architectures • Conclusions and Future Work

  29. Conclusions • Conclusions • A novel FPGA architecture with the mixed LUTs and macro-gates is proposed • A synthesis flow for the proposed architecture is implemented • The preliminary experimental results show the effectiveness of the proposed architecture for the area and delay reduction • Future Work • Perform the physical design for the synthesized circuits and compare the routing costs, architecture evaluation considering interconnect delay • Study the effectiveness of the power reduction for the proposed architecture • Macro-gates with wider inputs will be examined

More Related