VLSI Implementation for Fast Fourier Transform: A Review of Methods
250 likes | 394 Vues
Explore various FFT implementation methods discussing speed, area, control, and design considerations. Compare radix types and decomposition strategies. Analyze architectures for efficient VLSI processing in FFT applications.
VLSI Implementation for Fast Fourier Transform: A Review of Methods
E N D
Presentation Transcript
FFT VLSI Implementation VLSI Signal Processing 台灣大學電機系 吳安宇 • Shousheng He and Mats Torkelson, A new approach to pipeline FFT processor. IEEE Proc. Of IPPS, P766-770, 1996. • E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, A fast single-chip implementation of 8192 complex point FFT. IEEE J. Solid-State Circuits, P300-305, March 1995
Implementation--- Two Extreme Method Slow ----------------- Speed ----------------- Fast Small ------------------Area------------------- Large Complicated ------------ Control --------------- Simple
Design Consideration • System Requirement • e.g., speed, area,power … • Trade-off in these two cases, we need • More Processing Elements (PE’s) • Better Processing Element Utilization Rate • Better Control Scheme
Some Current Themes Radix-2 Multi-path Delay Commutator. ( N = 16 ) Radix-2 Single-path Delay Feedback. ( N = 16 )
Radix-4 Single-path Delay Commutator. ( N = 256 ) Radix-4 Multi-path Delay Commutator. ( N = 256 ) Radix-4 Single-path Delay Feedback. ( N = 256 ) Some Current Themes (cont.)
Distinctive merit of the above • The delay-feedback are more efficient than delay-commutator in terms of memory utilization • Radix-4 has higher multiplier utilization ,however,Radix-2 has simpler BF which are better utilized
Comparison Radix / Speed Low ----------------------------------- High Processing Ability / Unit Low ----------------------------------- High Control Theme Simple ----------------------------------- Complex Combine the advantages Further decompose high radix PE
Decompose Method (1) • Simply ‘‘reuse’’ the repeated micro unit A radix-4 PE
Decompose Method (2) • From algorithm level Applying 3 index: n=<n1*N/2 + n2*N/4 + n3>N k=<k1 + 2k2 + 4k3>N where n1,n2={0,1} ;n3={0~N/4-1} Summation of n1
Decompose Method (2) cont. Summation of n2 Only real-imaginary swapping & sign inversion
Graphical Explanation (N=16) Trivial multiplication
Graphical Explanation (cont.) • The Eqs are equivalent to the operations below
Circuit of BF2I First N/2 cycles Xr(n) Zr(n+N/2) Xi(n) Zi(n+N/2) Xr(n+N/2) Zr(n) Xi(n+N/2) Zi(n) Second N/2 cycles
Circuit of BF2II Xr(n) Zr(n+N/2) Xi(n) Zi(n+N/2) Xr(n+N/2) Zr(n) Xi(n+N/2) Zi(n) Swap Re&Im and sign inversion
Radix-22 Single-path Delay Feedback FFT architecture using the above technique, for N=256 Compare with original architecture, for N=256
Structural advantage 2 • Radix-2 has the same complexity as radix-4,but still retain radix-2 BF structure • The stage has non-trivial multiplication • Control is simple; synchronization controller address counter for W n
Conclusions • FFT Applications: Radar Signal Processing, Fast convolution, Spectrum Estimation, OFDM-based Modulation/demodulations • Efficient VLSI architectures (parallel processing) are required for real-time processing. • However, most systems still employ DSP processors (e.g., TI C3x/C5x) for computations (fast algorithms like DIT and DIF FFT). • VLIW (Very Long-length Instruction Word)-based processors (TI C6x) need new programming skills to utilize the two parallel MAC units.