Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven

Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4) Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu

Fast Fourier transform • One of the most important subroutines in scientific computing • Used in many applications including: signal and image processing, solution of differential equations, multiplication of polynomial functions, data compression, …, etc • One of the most widely implemented hardware accelerators

Discrete Fourier transform DFT Maps a set of input points to another set of output points. The operation is reversible.

Roots of the unity • What are the Nth roots of unity? • If N = 8 then we have (0, j) imaginary real (-1, 0) (1, 0) (0, -j) Define

Calculating the DFT How many arithmetic (+ and *) operations do we need to calculate the DFT?

Computing the DFT using the FFT • How can we do better? Fast Fourier Transform (FFT) The sum of N point DFT has been broken into two N/2 point DFTs DFT of odd indices DFT of even indices

Example when N=8 Objective: Compute X0, X1, … X7 given x0, x1, …, x7 magic box x0 X0 x2 X1 x4 X2 x6 X3 magic box x1 X4 x3 X5 x5 X6 x7 X7 Note that

Now let’s apply the idea recursively x0 X0 x4 x2 X1 x6 X2 X3 x1 x5 X4 x3 X5 x7 X6 X7

One more time x0 X0 x4 x2 x6 X1 X2 x1 X3 x5 x3 X4 x7 X5 X6 • How many operations do we need now? • What is the execution time on a general purpose CPU? • What is the execution time on a FPGA? How many resources u need? X7

Another way to visualize FFT computations How can we determine the order of the first inputs? X0 x0 Butter fly Butter fly Butter fly X4 x4 x2 Butter fly Butter fly X2 Butter fly X6 x6 x1 Butter fly Butter fly Butter fly X1 x5 X5 x3 Butter fly Butter fly X3 Butter fly x7 X7

Use Horner’s rule Application of FFT: faster multiplication of two polynomials Suppose we want to evaluate A(x) at x0, how many operations do we need? Suppose you have two polynomials represented by the coefficient vectors • How many operations it takes to add these two polynomials? • How many operations it takes to multiply these two polynomials?

Point value representation A point-value representation of a polynomial A(x) of degree-bound N is a set of N point-value pairs such that all of the xk are distinct and yk=A(xk) for k=0, 1, …, N-1 How many operations do we need to compute the point representation of a polynomial? How can we do better?

Interpolation of polynomials from point-value representations Given the point representation of a polynomial, how can we inverse the evaluation, i.e., determine the coefficient form of a polynomial from a point representation? How can we find the a’s?

Adding and multiplying polynomials in point representation Polynomial A Polynomial B If polynomial C(x)=A(x)+B(x) then we can get point representation of C easily How many operations do we need? How about C(x)=A(x)*B(x)?

How can we convert a polynomial quickly from coefficient form to point-value and back? Ordinary multiplication O(N2) Interpolate O(N2) Evaluate O(N2) Point-wise multiplication O(N) It does not make sense now. How can we evaluate and interpolate faster than O(N2)? Can we choose the evaluation points smartly?

. . . Choosing the evaluation points smartly

Finally multiplying polynomials in O(NlogN) Ordinary multiplication O(N2) Inverse FFT FFT O(N log N) Point-wise multiplication O(N)

Back to signal processing Linear system with Impulse response (b0, b1, …, bN-1) (a0, a1, …, aN-1) T=0: a0b0 T=1: a0b1+a1b0 T=2: a0b2+a1b1+a2b0 …. …. The response of the system to the input signal at different times is equal to the coefficients of the polynomial produced from multiplying the input signal polynomial with the impulse response polynomial? Commonly known as the convolution of the input and the system’s impulse response. How to do to find the output response faster than O(N2)?

The lecture covered one of the most important hardware accelerators: FFT We have seen how it can be parallelized and speed up Examined some of the applications Summary

Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven