1 / 19

R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

Partitioning of Discrete Signal Transforms for Distributed Hardware Architectures. R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayag üez. Motivation and Objective. Discrete Signal Transforms (DSTs)

field
Télécharger la présentation

R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Partitioning of Discrete Signal Transforms for Distributed Hardware Architectures R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

  2. Motivation and Objective • Discrete Signal Transforms (DSTs) • DFT, DCT, lots of applications • Hardware accelerated but at high area cost • Distributed (dedicated) hardware architectures (DHAs) • Cost-effective • Partitioning plays key role DHA Partitioning DST • Objective: Use inherent properties of DSTs to improve their hardware partitioning to distributed hardware architectures.

  3. Previous Work • Automated partitioning of DST to DHA’s • DSTs treated as any other algorithm/benchmark [Srinivasan01][Bringmann00] • Converted to high-level or structural DFG and treated as such. • Manual partitioning & automated code generation • DST specific properties exploited [Kumhom01] • New formulations developed to exploit architectural features. [VanLoan92] • SPIRAL and FFTW – code generation platforms exploring the space of equivalent algorithms. ([Pueschel05], [Frigo05]) • [Arce05] – Automated partitioning methodology that incorporates DST features and formulation exploration

  4. Partitioning Methodology Architectural Description KPA DST Formulation Hypergraph Representation KPA Formulation Formulation Manipulator Formulation To DFG KPA Formulation DFG Rule Selection Cost and Indicators Heuristic Control Partition/ Placement Estimators High-level partition solution

  5. DSTs – General Concepts • General formula for d-dimensional DST • α’s determine type of transform, e.g. DFT: • Essentially a vector-matrix multiplication • Fast versions exists, using divide and conquer techniques • Highly regular • Highly connected • Rules can be applied at formulation level: permutation,index-set..

  6. W W W W W W W W Kronecker Algebra • Compact framework for formulation of DSTs • Multidimensional, e.g. • Fast versions of DSTs • Governed by well known rules and properties • Formulation ‘implies’ structure F4 F2 F2 F2 F4 F2

  7. M0 M1 Mk-1 D0 D1 Dk-1 Crossbar Target topology • Similar to existing platforms in market and academia. • Annapolis Micro Systems (Wildforce) • Gidel (PROC20KE) • Berkeley Emulation Engine (BEE) – being proposed as a cost effective alternative to traditional high performance computing systems.

  8. Partitioning Methodology Architectural Description KPA DST Formulation Hypergraph Representation KPA Formulation Formulation Manipulator Formulation To DFG KPA Formulation DFG Rule Selection Cost and Indicators Heuristic Control Partition/ Placement Estimators High-level partition solution

  9. DST properties in our methodology • Incorporated graph considerations to partitioning/placement process • Exploration of equivalent formulations Partition/ Placement

  10. M0 M1 Mk-1 D0 D1 Dk-1 Crossbar Graph partitioning considerations Kernigan Lin - bipartitioning Heterogeneous channel k-way partitioning • Focus on horizontal partitioning schemes (SIMD-like implementation) • Initial solution = balanced horizontal linear partitioning • scheduling consideration: swap nodes from same computational stages.

  11. Formulation exploration Number of possible reformulations grows exponentially with DST size Heuristic control method, first answer questions: Do reformulations have an effect on solution quality? How can we effectively explore the equivalent formulation space to find more apt formulations? Experiments  Gain an understanding of algorithmic level effects on solution quality and convergence. KPA Formulation Rule Formulation Manipulator Formulation To DFG DFG Rule Selection Cost and Indicators Heuristic Control Partition/ Placement Formulation Manipulator Applies permutation and factorization to Kronecker formulation of DSTs to obtain equivalent formulations

  12. D0 D1 D2 D3 Measuring quality of solution where ‘weight’ of channel i required communications through i D0 Example: W01 = W12 = W23 = 1, WXBAR = 2 D1 D2 D3

  13. Experiment #1 – Inter-stage permutations • Since Cooley-Tukey’s FFT several common formulations available. Pease formulation here • Experiment – several sizes of 5 common formulations where partitioned. • ISP have effect on solution quality, yet no clear winner formulation. Stockahm Tr. Stockahm Cooley-Tukey G. Sande Pease

  14. The weight of the nodes for the various computational stages of the transform. Experiment #2 - Granularity finer coarser

  15. Experiment #2 – Granularity • Decomposition rules: Large DST = combinations of smaller DSTs  analogous to node clustering * Multiple formulations achieved best cost. Coarsest granularity is shown. • Effect of topology: Ring vs. Linear: 57% cost reduction • Finest granularity not necessarily best.

  16. Experiment #3 – Breakdown strategy • Breakdown strategy – order and divisors with which a transform is decomposed. • Split trees – a common graphical representation of break. Strategy • Example: Two split tress for a DFT size 64. (a) (b) (a) (b)

  17. Experiment #3 – Results • Procedure • Exhaustive generation of split trees for DFT sizes n=16 to 256. • Formulations partitioned for various topologies • Observation of split tree decisions that lead to ‘partition friendly’ formulations • Generation of n > 256 formulations using rules.

  18. Conclusions and Future Work • Methodology for partitioning of DST to DHAs: • DST graph considerations • Formulation exploration • Graph considerations • Generation of initial partition linear – provides better results than random. • Limitation of node moves – faster convergence time. • Exploration at the algorithmic level  experiments • Isolated features such as permutations and granularity • Effect was evidenced, but hard to establish a relation to solution quality. • Coarse granularity = better convergence, good solution quality • Breakdown strategy – ‘partition friendly’ formulations generated. • Current Work: • Experimentation with DCTs. • Experimentation with other properties  define overall exploration strategy

  19. Acknowledgements • Puerto Rico Experimental Program to Stimulate Competitive Research (PR-EPSCoR) • WALSAIP - Wide-Area Large Scale Automated Information Project • Puerto Rico NASA Space Grant QUESTIONS?

More Related