Video on DSP and FPGA
200 likes | 447 Vues
Video on DSP and FPGA. John Johansson April 12, 2004. Agenda. Overview of video processing A typical video encoder and the DCT Requirements of DCT Comparison of DSP and FPGA chips Analysis and conclusions Questions. Overview of Video Processing. Video processing generally involves
Video on DSP and FPGA
E N D
Presentation Transcript
Video on DSP and FPGA John Johansson April 12, 2004
Agenda • Overview of video processing • A typical video encoder and the DCT • Requirements of DCT • Comparison of DSP and FPGA chips • Analysis and conclusions • Questions
Overview of Video Processing Video processing generally involves • Compression / Decompression • Special Effects • TV Broadcasting • Focus on Compression
Video Encoding Typical Video Encoder • Focus on DCT algorithm
The Discrete Cosine Transformation • DCT is a spatial transform, like the FFT • Rearranges data into a more compressible format • Typically done on 64 (8x8) pixels at a time • Big nasty equation … • … But no sharp teeth (optimizes extremely well)
Requirements for DCT Basic Idea • Read in data (64 values, 8-24 bits signed / unsigned) • Do transformation • Write out data • Profit !!! • Easy, right ??
Requirements for DCT Memory Limitations • Load an entire frame? • One frame can vary from 50K to 50 MB in size when uncompressed • External memory is much slower, more plentiful • Do the DCT in chunks (8x8 block)
Requirements for DCT Degree of Parallelism • DCT can be done serially, or broken up and done in parallel • Parallelism depends largely on available memory • Price / Performance tradeoffs
The Challengers Xilinx Spartan-3 FPGA • 50K – 5M gates • 326 MHz • 100 KB – 2.3 MB internal memory • 4 - 104 dedicated multipliers • Oodles of I/O pins (up to 784) Look at XC3S1000 • 1M gates, 560 KB memory, 24 multipliers, 376 I/O pins
The Challengers ADSP-BF5xx Blackfin Processor • 200 – 750 MHz • Single or dual core • DMA memory controller • 52 KB – 326 KB internal memory • Other processor goodies Look at ADSP-BF533 • 500 MHz, single core, 148 KB memory
Performance How do we correctly benchmark an algorithm between two completely different processors? • I don’t really know • Look at some rough performance indicators and try and draw a conclusion
Performance FPGA • Varies from 1-25 cycle(s) / pixel for DCT • Reading and writing of data takes additional time • Clock speed limited by degree of parallelism DSP • Roughly 5 cycles / pixel for DCT • DMA controller allows parallel reading and writing with some setup overhead
(Ideal) Performance Spartan-3 • 64 read + 64 compute + 64 write = 196 cycles / block • 326 MHz = 1.66 Mblocks / second Blackfin • 319 compute + 10 DMA transfer = 329 cycles / block • 500 MHz = 1.52 Mblocks / second
Advantages FPGA • Potential for very high parallelism • Existing video designs available for purchase • Good middleman functionality DSP • Higher potential clock speed • Much more flexible design • DMA memory controller
Disadvantages FPGA • Low flexibility • Hard to optimize • Limited logic blocks DSP • Difficult to achieve full utilization • Higher power consumption
Conclusions FPGA • Best for well defined roles, like DCT • Faster in situations where throughput matters • Can be very expensive DSP • Better off for more flexible roles, like full encoder • Situations where large amounts of (additional) memory are needed
References Xilinx Spartan III http://www.xilinx.com/xlnx/xil_prodcat_landingpage.jsp?title=Spartan-3 Analog Devices Blackfin http://www.analog.com/processors/processors/blackfin/index.html
References Other articles http://www.xilinx.com/publications/products/services/xc_pdf/xc_videoapps44.pdf http://www.xilinx.com/publications/products/sp2e/xc_dspvid43.htm http://www.reed-ectronics.com/ednmag/article/CA336860?stt=000&pubdate=11%2F27%25