Platform-Based Design for MPEG-4 Video Encoder: Enhancements in Compression and Performance
This paper presents a platform-based design for an MPEG-4 video encoder, outlining key components that enhance performance in video encoding. It details a hybrid motion estimation architecture aimed at high compression rates while maintaining quality, suitable for applications like video streaming, surveillance, and more. The discussion includes optimized software models, hardware accelerators, interleaved DCT/IDCT processing, and cost-effective design principles. The proposed system effectively navigates the challenges of limited bandwidth and error-prone environments, showcasing significant performance improvements in real-time applications.
Platform-Based Design for MPEG-4 Video Encoder: Enhancements in Compression and Performance
E N D
Presentation Transcript
Platform-based Design forMPEG-4 Video Encoder Presenter: Yu-Han Chen
Video Coding Standards Storage Broadcasting Storage HDTV MPEG2 SDTV Telcomm MPEG1 1994 Resolution/Quality Telcomm 1992 Storage H.261 CIF 1990 H.263 MPEG4 Multimedia QCIF 1999 1995 10K 100K 1M 10M bps Data Rate
Introduction • Multimedia applications are emerging • Video-phone, camcorder, surveillance, and video streaming • MPEG-4 provides a total solution for these applications • High compression ratio for limited bandwidth • Error robustness to error-prone environment • Content interactivity for more functionalities besides ‘seeing’
Proposed MPEG-4 Encoder • MPEG-4 video encoding • Platform-based system architecture • Motion encoding module • Texture encoding module
Complexity Analysis of Optimized Software Model • SPL3 foreman sequence at 30 fps • ME – full search with half-stop algorithm • DCT/IDCT – row-column decomposition
Implementation Demands • Computational power is up to 12 GIPS • ME is the most important key component • DCT/IDCT is the second one • Dedicated hardware accelerators is employed • Implementation for various features of algorithms • Software for irregular and sequential ones • Hardware for high-processing rate ones • HW/SW co-design is the most promising solution to achieve a cost-effective system
Platform for MPEG-4 Video Coding • Platform-based system includes • HYRISC, RBUS and DBUS, DMA, MEMIF • Hardware accelerators includes • ME, MC, BE(DCT/IDCT,Q,IQ,ACDCP), Bitstream Unit, Share Memory (CG, CB)
Summary of ME • Low cost and high performance hybrid motion estimation is proposed • Dynamic modes for various applications • Applications of real-time and low power • PDS (Predictive Diamond Search) mode • Applications of high compression quality • FFS (Fast Full Search) mode • Spiral full search with PDE (Partial Distortion Elimination)
Texture Encoding Module • Interleaving DCT/IDCT schedule • DCT and IDCT are performed interleaved for the same block • Sub-structure sharing technique • Applied on AC/DC prediction datapath and Q/IQ by extracting the same formula term
Sub-structure Sharing of Q/IQ and ACDC Prediction • Scalar operation : (QAC x QPA) / QPX • Share partial result (QAC x QP = M) in IQ module • Share data-path of Q for M / QPx
FFS (QP = 16 PSNR_Y=32.4012, Bits=9537) PDS (QP = 16, PSNR_Y=32.0256, Bits=9465) Subject View Worse case of PSNR drop (0.3962 dB) at the 69th frame
Conclusion • A cost-effective MPEG-4 video encoder is proposed • Hardware accelerators • A novel hybrid motion estimation architecture • A cost-effective texture block engine architecture • Platform-based system backbone • Compromise flexibility and high performance • HW/SW co-design flow and tools
DCT/IDCT Coefficient Matrix • N=8 Even Symmetric Odd Symmetric
1-D DCT and IDCT • 1-D DCT (Y=AX) • 1-D IDCT (Y=ATX) Preprocessing Data Reordering Data Reordering 8 MAC operation down to 4! Postprocessing
DCT/IDCT Architecture • DRU(Data Reordering Unit): Two parallel MAC Two 1-D operation multiplexing Preprocessing Postprocessing DCT IDCT
Multiplication of Constant Coefficients • Only 7 constant coefficients used • Sign Digit representation • Minimum nonzero term (1, -1) • Shift and Add • Avoid dedicated multiplier
Power Consumption Estimation • Case 1 – 0.18μm • Case 2 – 0.18μm, 1/8 computational power • Case 3 – 0.18μm, 1/8 computational power, gated clock