1 / 31

Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder

Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder. Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2005. Outline. Introduction H.264/AVC Intra Coding Computation Reduction

Télécharger la présentation

Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis, Fast Algorithm, and VLSI Architecture Design for H.264/AVC Intra Frame Coder Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2005

  2. Outline • Introduction • H.264/AVC Intra Coding • Computation Reduction • Hardware Architecture

  3. Input Video Signal Split into Macroblocks 16x16 pixels Coder Control Control Data Transform/Scal./Quant. Quant.Transf. coeffs - Decoder Scaling & Inv. Transform Entropy Coding De-blocking Filter Intra-frame Prediction Output Video Signal Motion- Compensation Intra/Inter Motion Data Motion Estimation Introduction Multiple Reference Frames & Variable Block sizes

  4. Introduction Compressed Data Source Prediction Transform Quantization Entropy Coding 44/1616 Luma 88 Chroma 4 4 DCT Scalar Nonuniform Q CAVLC CABAC lossy lossless (Bit per pixel)

  5. DWT 53 Introduction • H.264/AVC I-Frame Coder (CAVLC) vs. JPEG2000 (DWT 53) • Computational Complexity • Block-based coding vs. Frame-based coding Hardware-friendly Memory-wasting

  6. Introduction • Comparison between different image coding standards JPEG 2000 DWT53 H.264 I-Frame CAVLC JPEG 0.225 bpp

  7. Introduction • Two solutions for platform-based design of H.264/AVC intra frame coder • Fast algorithm for software implementation • Reduce 45% complexity • PSNR drop 0.3 dB • Hardware accelerator • Max. clock rate 55 MHz • 31 fps for 4:2:0 SDTV (All intra frames)

  8. 8 1 6 3 4 7 5 0 H.264/AVC Intra Coding • Intra Prediction • I4MB (44) • I16MB (1616) Current + DC + DC + Plane 1 0

  9. H.264/AVC Intra Coding • Mode Decision • Low complexity mode • SATD (Original pels – Predictors) • Rate (bit of Mode information) • High complexity mode • MSE (Original pels – Reconstructed pels) • Rate (Mode information + Residual)

  10. H.264/AVC Intra Coding • Transform and Quantization • 4  4 integer transform Hadamard transform DCT-based integer transform

  11. H.264/AVC Intra Coding • Entropy Coding • Context-Based Adaptive Binary Arithmetic Coding (CABAC) • Context-Based Adaptive Variable Length Coding (CAVLC)

  12. H.264/AVC Intra Coding • Run-time percentage • 720  480 4:2:0 30fps • 10829 MIPS

  13. Computation Reduction • Intra Prediction • Table look-up • Cost generation • Sub-sampling

  14. Computation Reduction • Fast Intra Prediction • The smaller the mode number is, the more possible it will occur. • global statistics cannot reflect the correlation of local modes. • Local statistics of neighboring blocks are applied.

  15. Computation Reduction • Fast Intra Prediction • Skip unlikely candidates

  16. Computation Reduction • Rate-distortion under different numbers of local-searched I4MB modes without insertion of full-search blocks 6 4 1 All DC modes 2 

  17. Computation Reduction • Fast Intra Prediction • Prevention of error propagation • Periodic insertion of full-search 4x4 blocks • Adaptive threshold on the distortion for a MB • If min SATD of P > THMinSATD, then search all modes. • THMinSATD =   (min SATD of F) •  = 2.0 F P F P P P P P F P F P P P P P

  18. Computation Reduction • Subsampling Patterns

  19. Computation Reduction • Saved Computation and PSNR Drop PSNR drop < 0.3 dB Global: subsampling + partial search using global statistics Local: subsampling + partial search Proposed: subsampling + partial search + periodic insertion of full search + adaptive SATD threshold

  20. Hardware Architecture • Assumptions • A RISC can execute one instruction per cycle, except multiplication requiring two. • A processing element (PE) can generate predictors of one pixel per cycle.

  21. Hardware Architecture • Solutions luma chroma Produce all modes per cycle Produce one mode per cycle 30fps # of modes Avg. cycles per predictors

  22. Hardware Architecture • Comparisons in different degrees of parallelism

  23. M A B C D E F G H I J K L Hardware Architecture DRAM Register

  24. Hardware Architecture • Four-Parallel Reconfigurable Intra Prediction Generator 8-bit adder 9-bit adder

  25. M A B C D E F G H I J K L Hardware Architecture • Intra Prediction Generator

  26. Hardware Architecture Top PE0 PE1 PE2 PE3 Cycle 1: T0+T4+T8+T12 Cycle 1: T1+T5+T9+T13 Cycle 1: T2+T6+T10+T14 Cycle 1: T3+T7+T11+T15 Cycle 2: +L0+L4+L8 Cycle 2: +L0+L5+L9 Cycle 2:  +L2+L6+L10 Cycle 2:  +L3+L7+L11 Cycle 3: +L12 Cycle 3: +L13 Cycle 3:  +L14 Cycle 3:  +L15 Left Cycle 4: +++ I16MB DC Prediction Mode

  27. A0 A1 A2 A3 Hardware Architecture • I16MB Plane Prediction Mode Pred[y, x] = Clip1((a + b (x – 7) + c  (y – 7) >> 5) a = 16  (p[-1, 15] + p[15, -1]) b = (5  H + 32) >> 6 c = (5  V + 32) >> 6 H = 7x’=0 (x’+1)  (p[-1, 8+x’] – p[-1, 6 – x’]) V = 7x’=0 (y’+1)  (p[8+y’, -1] – p[6 – y’, -1]) Pred[0,0] Pred[0,8] Pred[0,4] Pred[0,12]

  28. Hardware Architecture A0 A1 A2 A3

  29. Hardware Architecture

  30. Hardware Architecture • Transform (Implemented by shifters and adders) DCT iDCT Hadamard

  31. Hardware Architecture

More Related