An FFT/IFFT Accelerator for OCT Application

An FFT/IFFT Accelerator for OCT Application • Zhenhong Liu

What is OCT? • OCT = Optical Coherence Tomography • An optical analogy of Ultrasound Tomography • Provide micrometer-resolution • Light source is not harmful (unlike x-ray) Optical coherence tomogram of a fingertip (http://en.wikipedia.org/wiki/File:HautFingerspitzeOCT.gif)

Data Processing • 3 FFT/IFFT in the algorithm • # of data point is large: 1024/2048

Sample Data • 16-bit int for image data, converted to floating-point during processing • single precision floating-point for background and calibration data • output to a gray scale bmp file, 1024x1024 pixels

Using Floating-point

Using Fixed-point • Fixed-point number (WL, FL): • Keep twiddle factors (32, 30) • Change the fractional length for input/output data. • Prevent overflow during FFT/IFFT: arithmetic right shift the output by 1 bit after every butterfly operation in FFT or IFFT.

Using Fixed-point (32, 2) (32, 4)

Using Fixed-point (32, 4) (32, 6)

Using Fixed-point f-p (32, 6)

Fixed-point + Approx. Twiddle Factor • Very sensitive to twiddle factor • Simply reduce the fraction length is not effective: • OK for the twiddle factors >> 0 • Large errors for twiddle factors ~ 0

Approx. Twiddle Factor • A suitable approx. multiplier • Finish a multiplication in n iterations • Round A to a number that has n 1’s at most • Store the positions of the 1’s in SRAM • Requires that A does not change often • The larger n is, the more accurate the product is

Fixed-point + Approx. Twiddle Factor n=1 n=2

Fixed-point + Approx. Twiddle Factor n=3 n=4

Fixed-point + Approx. Twiddle Factor fp n=4

Hardware Implementation • Original design only supports positive A • need an extra sign bit in SRAM for each entry • xor B with the sign bit. • Support for complex multiplication • two units share one SRAM • no add/sub operation after multiplying • Cannot pipeline the design, use multiple unit in the butterfly unit to increase throughput • n iteration -> n units in a butterfly unit • For IFFT, only need a 1-bit control signal.

Hardware Implementation schematic: One complex multiplying in 2n cycles

Hardware Implementation schematic: Butterfly unit, DIF FFT/IFFT

Hardware Implementation • For N-point FFT/IFFT, each stage takes N/2 cycles • Hardware cost even smaller than using fixed-point accurate multiplier • Should be more power efficient • No visible changes to the output images

Thank you!

An FFT/IFFT Accelerator for OCT Application