Download
accelerating fast fourier transform for wideband channelization n.
Skip this Video
Loading SlideShow in 5 Seconds..
Accelerating Fast Fourier Transform for Wideband Channelization PowerPoint Presentation
Download Presentation
Accelerating Fast Fourier Transform for Wideband Channelization

Accelerating Fast Fourier Transform for Wideband Channelization

119 Views Download Presentation
Download Presentation

Accelerating Fast Fourier Transform for Wideband Channelization

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Accelerating Fast Fourier Transform for Wideband Channelization Carlo del Mundo*, VigneshAdhinarayanan§, Wu-chunFeng*§ * Department of Electrical and Computer Engineering, § Department of Computer Science, Virginia Tech

  2. Forecast • Goal: Accelerate the Fast Fourier Transform (FFT) using graphics processing units (GPUs) • Replace fixed hardware ASICs with programmable GPUs Accelerating Fast Fourier Transform for Wideband Channelization

  3. Forecast • Goal: Accelerate the Fast Fourier Transform (FFT) using graphics processing units (GPUs) • Replace fixed hardware ASICs with programmable GPUs http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg Accelerating Fast Fourier Transform for Wideband Channelization

  4. Motivation • FFT is a critical building blockacross many disciplines Accelerating Fast Fourier Transform for Wideband Channelization

  5. Motivation • FFT is a critical building blockacross many disciplines http://www.ajnr.org/content/27/6/1230/F1.large.jpg Accelerating Fast Fourier Transform for Wideband Channelization

  6. Motivation • FFT is a critical building blockacross many disciplines http://www.ajnr.org/content/27/6/1230/F1.large.jpg Accelerating Fast Fourier Transform for Wideband Channelization

  7. Motivation • FFT is a critical building blockacross many disciplines http://www.elektrodaily.com/wp-content/uploads/2013/02/shazam-app.png http://www.ajnr.org/content/27/6/1230/F1.large.jpg Accelerating Fast Fourier Transform for Wideband Channelization

  8. Motivation • FFT is a critical building blockacross many disciplines http://www.elektrodaily.com/wp-content/uploads/2013/02/shazam-app.png http://www.ajnr.org/content/27/6/1230/F1.large.jpg Accelerating Fast Fourier Transform for Wideband Channelization

  9. Motivation • FFT is a critical building blockacross many disciplines http://www.wireless.vt.edu/symposium/2012/tutorials/sessionA2.html http://www.elektrodaily.com/wp-content/uploads/2013/02/shazam-app.png http://www.ajnr.org/content/27/6/1230/F1.large.jpg Accelerating Fast Fourier Transform for Wideband Channelization

  10. Motivation • FFT is a critical building blockacross many disciplines http://www.wireless.vt.edu/symposium/2012/tutorials/sessionA2.html http://www.elektrodaily.com/wp-content/uploads/2013/02/shazam-app.png http://www.ajnr.org/content/27/6/1230/F1.large.jpg Accelerating Fast Fourier Transform for Wideband Channelization

  11. Introduction • Wideband Channelization • Purpose: To isolate channels within a wideband signal Accelerating Fast Fourier Transform for Wideband Channelization

  12. Introduction • Wideband Channelization • Purpose: To isolate channels within a wideband signal Accelerating Fast Fourier Transform for Wideband Channelization

  13. Introduction • Wideband Channelization • Purpose: To isolate channels within a wideband signal http://www.wireless.vt.edu/symposium/2012/tutorials/sessionA2.html Accelerating Fast Fourier Transform for Wideband Channelization

  14. Introduction • Wideband Channelization • Purpose: To isolate channels within a wideband signal Figure: Stages in a PFB Channelizer http://www.wireless.vt.edu/symposium/2012/tutorials/sessionA2.html Accelerating Fast Fourier Transform for Wideband Channelization

  15. Introduction (Channelization) • Algorithm: Polyphase filter bank (PFB) channelizer Figure: Stages in a PFB Channelizer Accelerating Fast Fourier Transform for Wideband Channelization

  16. Introduction (Channelization) • Algorithm: Polyphase filter bank (PFB) channelizer • Problem: FFT stage grows fastest in channelization Figure: Stages in a PFB Channelizer Accelerating Fast Fourier Transform for Wideband Channelization

  17. Introduction (Channelization) • Algorithm: Polyphase filter bank (PFB) channelizer • Problem: FFT stage grows fastest in channelization Figure: Stages in a PFB Channelizer Accelerating Fast Fourier Transform for Wideband Channelization

  18. Choosing the Right Processor • Criteria:Programmability & Performance Accelerating Fast Fourier Transform for Wideband Channelization

  19. Choosing the Right Processor • Criteria:Programmability & Performance http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg Carlo del Mundo, cdel@vt.edu, carlodelmundo.com http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga

  20. Choosing the Right Processor • Criteria:Programmability & Performance http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg Carlo del Mundo, cdel@vt.edu, carlodelmundo.com http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga

  21. Choosing the Right Processor • Criteria:Programmability & Performance http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg Carlo del Mundo, cdel@vt.edu, carlodelmundo.com http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga

  22. Choosing the Right Processor • Criteria:Programmability & Performance http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg Carlo del Mundo, cdel@vt.edu, carlodelmundo.com http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga

  23. Choosing the Right Processor • Criteria:Programmability & Performance http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg Carlo del Mundo, cdel@vt.edu, carlodelmundo.com http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga

  24. Choosing the Right Processor • Criteria:Programmability & Performance http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://images.bit-tech.net/content_images/2011/12/amd-radeon-hd-7970-3gb-review/amd-radeon-hd7970-e.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://www.maximumpc.com/files/u154082/intel_cpu_socket3.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg http://fr.academic.ru/pictures/frwiki/70/Fpga_xilinx_spartan.jpg Carlo del Mundo, cdel@vt.edu, carlodelmundo.com http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga http://upload.wikimedia.org/wikipedia/commons/7/79/SSDTR-ASIC_technology.jpga

  25. Outline • Motivation • Introduction • Background • Approach • System-level optimizations • Algorithm-level optimizations • Results • Optimizations in isolation • Optimizations in concert • Conclusion Accelerating Fast Fourier Transform for Wideband Channelization

  26. Background (GPUs) • GPU Memory Hierarchy Accelerating Fast Fourier Transform for Wideband Channelization

  27. Background (GPUs) • GPU Memory Hierarchy Accelerating Fast Fourier Transform for Wideband Channelization

  28. Background (GPUs) • GPU Memory Hierarchy • Global Memory Accelerating Fast Fourier Transform for Wideband Channelization

  29. Background (GPUs) • GPU Memory Hierarchy • Global Memory Table: Memory Read Bandwidth for Radeon HD 6970 Accelerating Fast Fourier Transform for Wideband Channelization

  30. Background (GPUs) • GPU Memory Hierarchy • Global Memory • Image Memory Table: Memory Read Bandwidth for Radeon HD 6970 Accelerating Fast Fourier Transform for Wideband Channelization

  31. Background (GPUs) • GPU Memory Hierarchy • Global Memory • Image Memory • Constant Memory Table: Memory Read Bandwidth for Radeon HD 6970 Accelerating Fast Fourier Transform for Wideband Channelization

  32. Background (GPUs) • GPU Memory Hierarchy • Global Memory • Image Memory • Constant Memory • Local Memory Table: Memory Read Bandwidth for Radeon HD 6970 Accelerating Fast Fourier Transform for Wideband Channelization

  33. Background (GPUs) • GPU Memory Hierarchy • Global Memory • Image Memory • Constant Memory • Local Memory • Registers Table: Memory Read Bandwidth for Radeon HD 6970 Accelerating Fast Fourier Transform for Wideband Channelization

  34. Outline • Motivation • Introduction • Background • Approach • System-level optimizations • Algorithm-level optimizations • Results • Optimizations in isolation • Optimizations in concert • Conclusion Accelerating Fast Fourier Transform for Wideband Channelization

  35. Approach • Act as the “human compiler” Accelerating Fast Fourier Transform for Wideband Channelization

  36. Approach • Act as the “human compiler” • Derive a candidate set of optimizations for FFT on GPUs Candidate Optimizations Accelerating Fast Fourier Transform for Wideband Channelization

  37. Approach • Act as the “human compiler” • Derive a candidate set of optimizations for FFT on GPUs • Apply optimizations in isolation Optimizations in Isolation Candidate Optimizations Accelerating Fast Fourier Transform for Wideband Channelization

  38. Approach • Act as the “human compiler” • Derive a candidate set of optimizations for FFT on GPUs • Apply optimizations in isolation • Apply optimizations in concert Optimizations in Isolation Candidate Optimizations Optimizations in Concert Accelerating Fast Fourier Transform for Wideband Channelization

  39. Approach • System-level Optimizations (applicable to any application) • Register Preloading • Vector Access/{Vector,Scalar} Arithmetic • Constant Memory Usage • Dynamic Instruction Reduction • Memory Coalescing • Image Memory • Algorithm-level Optimizations • Transpose via LM • Compute/Transpose via LM • Compute/No Transpose via LM C. del Mundo et al., “Accelerating Fast Fourier Transform for Wideband Channelization,” IEEE ICC, Budapest, Hungary, June 2013. Accelerating Fast Fourier Transform for Wideband Channelization

  40. Approach • System-level Optimizations (applicable to any application) • Register Preloading • Vector Access/{Vector,Scalar} Arithmetic • Constant Memory Usage • Dynamic Instruction Reduction • Memory Coalescing • Image Memory • Algorithm-level Optimizations • Transpose via LM • Compute/Transpose via LM • Compute/No Transpose via LM C. del Mundo et al., “Accelerating Fast Fourier Transform for Wideband Channelization,” IEEE ICC, Budapest, Hungary, June 2013. Accelerating Fast Fourier Transform for Wideband Channelization

  41. Approach • System-level Optimizations (applicable to any application) • Register Preloading • Vector Access/{Vector,Scalar} Arithmetic • Constant Memory Usage • Dynamic Instruction Reduction • Memory Coalescing • Image Memory C. del Mundo et al., “Accelerating Fast Fourier Transform for Wideband Channelization,” IEEE ICC, Budapest, Hungary, June 2013. Accelerating Fast Fourier Transform for Wideband Channelization

  42. Approach • System-level Optimizations (applicable to any application) • Register Preloading • Vector Access/{Vector,Scalar} Arithmetic • Constant Memory Usage • Dynamic Instruction Reduction • Memory Coalescing • Image Memory • Algorithm-level Optimizations • Naïve Transpose (LM-CM) • Compute/Transpose via LM (LM-CC) • Compute/No Transpose via LM (LM-CT) C. del Mundoet al., “Accelerating Fast Fourier Transform for Wideband Channelization,” IEEE ICC, Budapest, Hungary, June 2013. Accelerating Fast Fourier Transform for Wideband Channelization

  43. System-level Optimizations Accelerating Fast Fourier Transform for Wideband Channelization

  44. System-level Optimizations • Register Preloading (RP) • Load to registers first Accelerating Fast Fourier Transform for Wideband Channelization

  45. System-level Optimizations • Register Preloading (RP) • Load to registers first Without Register Preloading 79 __kernel void unoptimized(__global float2 *buffer) 80 { 81 int index = …; 82 buffer += index; 83 84 FFT4_in_order_output(&buffer[0], &buffer[4], &buffer[8], &buffer[12]); Accelerating Fast Fourier Transform for Wideband Channelization

  46. System-level Optimizations • Register Preloading (RP) • Load to registers first Without Register Preloading 79 __kernel void unoptimized(__global float2 *buffer) 80 { 81 int index = …; 82 buffer += index; 83 84 FFT4_in_order_output(&buffer[0], &buffer[4], &buffer[8], &buffer[12]); With Register Preloading 79 __kernel void optimized(__global float2 *buffer) 80 { 81 int index = …; 82 buffer += index; 83 84 __private float2 r0, r1, r2, r3;// Register Declaration85 // Explicit Loads 86 r0 = buffer[0]; r1 = buffer[1]; r2 = buffer[2]; r3 = buffer[3]; 87 FFT4_in_order_output(&r0, &r1, &r2, &r3); Accelerating Fast Fourier Transform for Wideband Channelization

  47. System-level Optimizations • Vector Access(float{2, 4, 8, 16}) Accelerating Fast Fourier Transform for Wideband Channelization

  48. System-level Optimizations • Vector Access(float{2, 4, 8, 16}) a[0] Accelerating Fast Fourier Transform for Wideband Channelization

  49. System-level Optimizations • Vector Access(float{2, 4, 8, 16}) a[0] a[1] Accelerating Fast Fourier Transform for Wideband Channelization

  50. System-level Optimizations • Vector Access(float{2, 4, 8, 16}) a[0] a[1] a[2] a[3] Accelerating Fast Fourier Transform for Wideband Channelization