150 likes | 331 Vues
A Fast Fourier Transform Compiler. Silvio D Carnevali. Contents. FFTW and genfft : an introduction genfft: How it works 1.) DAG Creation 2.) Simplifier 3.) Scheduler 4.) Unparsing Conclusion: similar applications. genfft. special purpose compiler objective Camelot
                
                E N D
A Fast Fourier Transform Compiler Silvio D Carnevali
Contents • FFTW and genfft: an introduction • genfft: How it works 1.) DAG Creation 2.) Simplifier 3.) Scheduler 4.) Unparsing • Conclusion:similarapplications
genfft • special purpose compiler • objective Camelot • produces DFT subroutines • Outputs C code • parameterized according to: - Input length - Data type
FFTW • Collection of “Codelets” • Codelets: fragments of C code • Generated by genfft • plan: optimal composition of codelets  depends on input size and HW  automatically selected by FFTW (FJ98)
Performance of FFTW Powers of 2 Any powers of 2, 3, 5, 7
genfft: creation of the codelet’s DAG • Nodes: data types  Encode arithmetic expressions  Use real numbers for C compatibility • Generic node = operator • Children = operands • DAG Algorithm depends on input size
FT Equation • X = input vector • Y = FT of X • wn = nth root of unity
genfft: DAG Simplifier • Bottom-up traversal of DAG • local improvements:  Algebraic transformations (constant folding, +/* simplification)  CSE: eliminate existing + create new ones  DFT-specific improvements
Algebraic transformations • Simplifies multiplication by 1, 0 or -1 • Simplifies addition by 0 • Distribution: kx + ky = k(x + y)
DFT-Specific improvements • Numeric constants made positive (Local)  Constants: generally k and -k  Reduces number of loads • DAG transposition (for Linear Function)  Simplifies DAG, transpose + simplify, transpose + simplify  Reduces number of multiplications only
5 X A 2 3 Y B 4 5 X A 2 3 Y B 4 5 X A 2 3 Y B 4 DFT-Specific improvements Simplify DAG E DAG D Transpose Simplify DAG FT DAG ET Transpose Simplify DAG F DAG E
genfft: DAG Scheduler • Goal: minimize use of regs • No instruction scheduling • Partitions DAG in 2 recursively  register mapping  Optimal for n = 2k  Partitioning heuristics • Optimality? Not for n != 2k
genfft: Unparsing • Schedule unparsed to C • Pipeline usage managed by C compiler • genfft + C compiler: performance problems  egcs “optimizer”
Conclusion & future work • FFTW: The best of the best of the best… • Over 100 downloads every week! • genfft: specialized for linear functions  Crystallographic FT  FIR & IIR filters  Image processing (JPEG discrete cosine transform)