1 / 23

Overview September 2004

Implementing High Performance DSP Systems on Heterogeneous Programmable Platforms Roger Woods and John McAllister Programmable Systems Laboratory, Institute of Electronics, Communications and Information Technology (ECIT), Queen’s University Belfast. Overview September 2004. PSL@QUB.

emmly
Télécharger la présentation

Overview September 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing High Performance DSP Systems on Heterogeneous Programmable Platforms Roger Woods and John McAllister Programmable Systems Laboratory, Institute of Electronics, Communications and Information Technology (ECIT), Queen’s University Belfast Overview September 2004

  2. PSL@QUB • Sonic Arts Research Centre • Involves Music, Computer Science, Electrical Engg. • FPGAs for synthesis of musical instruments • Electronic Communications and Inf. Tech. • RF, SoC, Software engineering, speech recognition, image processing • Strong application focus • Programmable Systems Lab • Programmable IC Platforms for Programmable IP Networks (PIPPIN) – network solutions for streaming video • System level design for heterogeneous FPGA-centric embedded DSP systems – Abhainn

  3. Contents • Motivation • Drivers • Current design approaches • Platform based design • Function/architecture co-design • Modelling languages • Abhainn – System level FPGA-centric Embedded System Design • Multi-dimensional array dataflow graph • Relationship to hardware cores • Normalised Lattice Filter • Conclusions

  4. Drivers • Heterogeneous platforms • GPPs, DSP, FPGA. • Programmable resource for complex DSP • Clear need for holistic, system level design flow • Increased abstraction - gives HW/SW view • Optimisation from high level – bigger impact • Huge body of pre-defined IP • Wide range of existing functions – optimised for performance • Element of bottom-up design e.g. pre-defined timing

  5. Design approaches • Platform based design • Defined system platform, too big of design space • Function/Architecture codesign • Concurrent architecture derivation and algorithm refinement • Formalized approaches e.g. (DFG) are mature for multiprocessors e.g. GRAPE-II, Ptolemy • Independent model of computation (MoC) based specification. • Rapid system implementation from algorithm specification • Automated inter-processor communication (IPC) realization • Issues: • Optimization whilst protecting core implementation • Need for increased core utilization • Balance synthesis of dedicated /programmable FPGA resource

  6. Abhainn Ethos

  7. Abhainn Rapid Implementation • Generic Input: • Algorithm modelling tool • Target technologies • Target Device Technology Specific Mapping: Mapped to specific processing and IPC technologies Target Specific Mapping: Mapped to specific target devices

  8. Gedae Multiprocessor Synthesis • Gedae provides: • Rapid implementation for multiprocessors in a platform portable manner • Designer control of the implementation via standardised transformations

  9. Arc: streams of tokens Actor Input port: consume T tokens per firing Output port: produces T tokens per firing Dataflow Specification • Actors fire granularity (G) times per iteration • Designer control • T at each port • G of each actor • Dimensions (X) of token traversing arcs

  10. Threshold Optimisation • Run-time overheads: • Inter-processor communication • Dynamic scheduling • Actor firing overheads • Solution: threshold multiplication • Threshold Multiplication • Enhanced run-time performance • Higher memory requirements

  11. Granularity Optimisation • Sub-scheduling • Break N firings in one execution into one firing in each of N executions • Granularity factorisation • Granularity Scaling • Execute in smaller memory • Higher run-time overheads

  12. Hardware Design Ethos • Parallelise on the p dimension of m2. • No. processors v. input matrix dimensions • Regular and parameterisable trade-off • Represents powerful trade-off when enabled in DFG • Enabled from GEDAE

  13. Multidimensional Array SDF • Complements MSDF with variable size actor families - defined by processing graph method & used in GEDAE • y parameter variation trades-off number of m_mult operations and token dimensions for each • MASDF actor family→ family pipelined hardware components with variable token dimensions

  14. Mult-iteration Token Processing • Tacomposed of a family of base tokens Tb • Each family child consumed in an invocation of the actor • Different behaviour of the actor over multiple firings • Cyclic dataflow

  15. MASDF Actor Sharing • m_mult is now cyclo-static operator • On ith firing • T tokens consumed from ith input child port • T tokens produced on ith output child port

  16. SFO Structure Control and Communications Wrapper:Implementing cyclic schedule switching data into and out of central WBC unit Parameter Bank: Local storage for core parameters e.g. tap weights White Box Component: Flexible pipelined core configurable for various token sizes

  17. NLF Design Example • 8-stage NLF, 8 element vector tokens • Base token scalar • Only manipulating the y graph parameter • SFG architectural synthesis only capable of retiming • Smallest supporting Virtex-II Pro family member

  18. NLF SFG • Primitive (lowest) level components single stage pipelined: • Adders: Programmable CLBs • Multipliers: Embedded mult18x18s

  19. WBC Inefficiency • New input sample every 4 clock cycles

  20. NLF Core Design • Results • VirtexII-Pro Target Device • Factor 3.9 increase in SFO throughput for no extra hardware • Order of magnitude reduction in required device size • All enabled by altering one parameter (y) on the DFG

  21. Conclusions • FPGA is viewed as a hardware resource • Using existing functionality (IP cores) is a key aspect of the design process • Key is to represent this at the system level • Restrictions • Streaming based • Fixed hardware target platform • Reconfiguration • Specifically more suitable “reconfigurable” hardware is needed • Clear need to emphasise reconfiguration in design flow • Reconfiguration mux (Imperial College)

  22. And finally….. • Acknowledgements Ying Yi J-P Heron Richard Turner Gaye Lightbody David Trainor Scott Fischaber Eoin Malins Tim Courtney Lok Kee Ting Sakir Sezer • Thanks for the invitation – great fun!

More Related