Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard Kai-Chao Yang Kai-Chao Yang, NTHU, Taiwan

Outline • Introduction • Problems • Definition • Functionality • Goal • Competition • Applications • Targets • History of SVC • Structure of SVC • Temporal Scalability • Spatial Scalability • Quality Scalability • Combined Scalability • Profiles of SVC • Conclusions Kai-Chao Yang, NTHU, Taiwan

Introduction - problem • Non-Scalable Video Streaming • Multiple video streams are needed for heterogeneous clients 8Mb/s 512Kb/s 1Mb/s 6Mb/s 4Mb/s Kai-Chao Yang, NTHU, Taiwan

Introduction - definition • Scalable video stream • Scalability • Removal of parts of the video bit-stream to adapt to the various needs of end users and to varying terminal capabilities or network conditions High quality Sub-stream n Sub-stream ki … … reconstruction Sub-stream 2 Sub-stream k2 Low quality Sub-stream 1 Sub-stream k1 Kai-Chao Yang, NTHU, Taiwan

Introduction - functionality • Functionality of SVC • Graceful degradation when “right” parts of the bit-stream are lost • Bit-rate adaptation to match the channel throughput • Format adaptation for backwards compatible extension • Power adaptation for trade-off between runtime and quality Kai-Chao Yang, NTHU, Taiwan

Introduction - mode • Example • Scalability mode • Fidelity reduction (SNR scalability) • Picture size reduction (spatial scalability) • Frame rate reduction (temporal scalability) • Sharpness reduction (frequency scalability) • Selection of content (ROI or object-based scalability) Most significant bit residual Enhancement 1 0 1 0 0 1 0 Enhancement layer Enhancement 2 1 0 1 1 0 1 Enhancement 3 1 1 0 0 1 0 Enhancement 4 0 1 1 0 0 1 Base layer Enhancement 5 1 0 0 1 0 1 Kai-Chao Yang, NTHU, Taiwan

Structure of SVC SNR scalable coding Prediction Base layer coding Temporal scalable coding Multiplex Spatial decimation SNR scalable coding Temporal scalable coding Prediction Base layer coding Kai-Chao Yang, NTHU, Taiwan

Temporal Scalability • Hierarchical prediction structures Hierarchical B pictures 0 4 3 5 2 7 6 8 1 12 11 13 10 15 14 16 9 GOP Non-dyadic hierarchical prediction 0 3 4 2 6 7 5 8 9 1 12 13 11 15 16 14 17 18 10 Hierarchical prediction with zero delay Kai-Chao Yang, NTHU, Taiwan 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Temporal Scalability Video Coding Experiment with H.264/MPEG4-AVC Foreman, CIF 30Hz @ 1320kbps Performance as a function of N Cascaded QP assignment QP(P)  QP(B0)-3  QP(B1)-4  QP(B2)-5 N=1 I P P P P P P P P N=2 Temporal scalability I P B0 B0 P B0 P B0 P N=4 I P B0 B1 B0 B1 B1 B1 P N=8 Kai-Chao Yang, NTHU, Taiwan I B2 B1 B0 B1 P B2 B2 B2 This slide is copied from JVT-W132-Talk

Spatial Scalability texture Hierarchical MCP & Intra-prediction Base layer coding motion • Inter-layer prediction • Intra • Motion • Residual Spatial decimation texture Hierarchical MCP & Intra-prediction Base layer coding Multiplex Scalable bit-stream motion • Inter-layer prediction • Intra • Motion • Residual Spatial decimation H.264/AVC compatible base layer bit-stream texture H.264/AVC MCP & Intra-prediction Base layer coding motion H.264/AVC compatible coder Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Similar to MPEG-2, H.263, and MPEG-4 • Arbitrary resolution ratio • The same coding order in all spatial layers • Combination with temporal scalability • Inter-layer prediction Spatial 1 Temporal 2 Intra Spatial 0 Temporal 0 Temporal 1 Intra Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • The prediction signals are formed by • MCP inside the enhancement layer (Temporal) (small motion and high spatial detail) • Up-sampling from the lower layer (Spatial) • Average of the above two predictions (Temporal + Spatial) • Inter-layer prediction • Three kinds of inter-layer prediction • Inter-layer motion prediction • Inter-layer residual prediction • Inter-layer intra prediction • Base mode MB • Only residual are transmitted, but no additional side info. Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Inter-layer motion prediction • base_mode_flag = 1 • The reference layer is inter-coded • Data are derived from the reference layer • MB partitioning • Reference indices • MVs • motion_pred_flag • 1: MV predictors are obtained from the reference layer • 0: MV predictors are obtained by conventional spatial predictors. (2x2,2y2) (2x1,2y1) 16 16 (x2,y2) (x1,y1) Reference layer 8 8 Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Inter-layer residual prediction • residual_pred_flag = 1 • Predictor • Block-wise up-sampling by a bi-linear filter from the corresponding 88 sub-MB in the reference layer • Transform block basis Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Inter-layer intra prediction • base_mode_flag = 1 • The reference layer is intra-coded • Up-sampling from the reference layer • Luma: one-dimensional 4-tap FIR filter • Chroma: bi-linear filter Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Past spatial scalable video: • Inter-layer intra prediction requires completely decoding of base layer. • Multiple motion compensation and deblocking filter are needed. • Full decoding + inter-layer prediction: complexity > simulcast. • Single-loop decoding • Inter-layer intra prediction is restricted to MBs for which the co-located base layer is intra-coded Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Single-loop vs. multi-loop decoding Inter I B P Kai-Chao Yang, NTHU, Taiwan This slide is copied from http://iphome.hhi.de/wiegand/assets/pdfs/H264AVC_SVC.pdf

Spatial Scalability • Generalized spatial scalability in SVC • Arbitrary ratio • Neither the horizontal nor the vertical resolution can decrease from one layer to the next. • Cropping • Containing new regions • Higher quality of interesting regions Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Encoder control (JSVM) • Base layer • p0’ is optimized for base layer • Enhancement layer • p1’is optimized for enhancement layer • Decisions of p1 depend on p0 • Efficient base layer coding but inefficient enhancement layer coding Kai-Chao Yang, NTHU, Taiwan

Spatial Scalability • Encoder control (optimization) • Base layer • Considering enhancement layer coding • Eliminating p0’s disadvantaging enhancement layer coding • Enhancement layer • No change • w • w = 0: JSVM encoder control • w = 1: Single-loop encoder control (base layer is not controlled) Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • Coarse-grain quality scalability (CGS) • A special case of spatial scalability • Identical sizes for base and enhancement layers • Smaller quantization step sizes of for higher enhancement residual layers • Designed for only several selected bit-rate points • Supported bit-rate points = Number of layers • Switch can only occur at IDR access units Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • Medium-grain quality scalability (MGS) • More enhancement layers are supported • Refinement quality layers of residual • Key pictures • Drift control • Switch can occur at any access units • CGS + key pictures + refinement quality layers Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • Drift control • Drift: The effect caused by unsynchronized MCP at the encoder and decoder side • Trade-off of MCP in quality SVC • Coding efficiency  drift Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • MPEG-4 quality scalability with FGS • Base layer is stored and used for MCP of following pictures • Drift: Drift free • Complexity: Low • Efficiency: Efficient based layer but inefficient enhancement layer • Refinement data are not used for MCP Refinement (possibly lost or truncated) Base layer Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • MPEG-2 quality scalability (without FGS) • Only 1 reference picture is stored and used for MCP of following pictures • Drift: Both base layer and enhancement layer • Frequent intra updates is necessary • Complexity: Low • Efficiency: Efficient enhancement layer but inefficient base layer Refinement (possibly lost or truncated) Base layer Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • 2-loop prediction • Several closed encoder loops run at different bit-rate points in a layered structure • Drift: Enhancement layer • Complexity: High • Efficiency: Efficientbase layer and medium efficient enhancement layer Refinement (possibly lost or truncated) Base layer Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • SVC concepts • Key picture • Trade-off between coding efficiency and drift • MPEG-4 FGS: All key pictures • MPEG-2 quality scalability: No key pictures Refinement (possibly lost or truncated) Base layer Kai-Chao Yang, NTHU, Taiwan

Quality Scalability • Drift control with hierarchical prediction • Key pictures • Based layer is stored and used for the MCP of following pictures • Other pictures • Enhancement layer is stored and used for the MCP of following pictures • GOP size adjusts the trade-off between enhancement layer coding efficiency and drift Refinement (possibly lost or truncated) Base layer P B2 B1 B2 P B2 B1 B2 P Kai-Chao Yang, NTHU, Taiwan

Combined Scalability • SVC encoder structure The same motion/prediction information Dependency layer Temporal Decomposition The same motion/prediction information Kai-Chao Yang, NTHU, Taiwan

Combined Scalability • Dependency and Quality refinement layers Q = 2 D = 2 Q = 1 Q = 0 Q = 2 Scalable bit-stream D = 1 Q = 1 Q = 0 Q = 2 D = 0 Q = 1 Q = 0 Kai-Chao Yang, NTHU, Taiwan

Combined Scalability Q1 D1 Q0 T0 T2 T1 T2 T0 Q1 D0 Q0 Kai-Chao Yang, NTHU, Taiwan

Combined Scalability • Bit-stream format NAL unit header NAL unit header extension NAL unit payload 2 6 3 3 2 1 1 1 1 1 3 P T D Q P (priority_id): indicates the importance of a NAL unit T (temporal_id): indicates temporal level D (dependency_id): indicates spatial/CGS layer Q (quality_id): indicates MGS/FGS layer Kai-Chao Yang, NTHU, Taiwan

Combined Scalability • Bit-stream switching • Inside a dependency layer • Switching everywhere • Outside a dependency layer • Switching up only at IDR access units • Switching down everywhere if using multiple-loop decoding Kai-Chao Yang, NTHU, Taiwan

Profiles of SVC • Scalable Baseline • For conversational and surveillance applications requiring low decoding complexity • Spatial scalability: fixed ratio (1, 1.5, or 2) and MB-aligned cropping • Temporal and quality scalability: arbitrary • No interlaced coding tools • B-slices, weighted prediction, CABAC, and 8x8 luma transform • The base layer conforms Baseline profile of H.264/AVC Kai-Chao Yang, NTHU, Taiwan

Profiles of SVC • Scalable High • For broadcast, streaming, and storage • Spatial, temporal, and quality scalability: arbitrary • The base layer conforms High profile of H.264/AVC • Scalable High Intra • Scalable High + all IDR pictures Kai-Chao Yang, NTHU, Taiwan

References • H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard,” CSVT 2007. • T. Wiegand, “Scalable Video Coding,” Joint Video Team, doc. JVT-W132, San Jose, USA, April 2007. • T. Wiegand, “Scalable Video Coding,” Digital Image Communication, Course at Technical University of Berlin, 2006. (Available on http://iphome.hhi.de/wiegand/dic.htm) • H. Schwarz, D. Marpe, and T. Wiegand, “Constrained Inter-Layer Prediction for Single-Loop Decoding in Spatial Scalability,” Proc. of ICIP’05. Kai-Chao Yang, NTHU, Taiwan

Overview of the Scalable Video Coding Extension of the H.264/AVC Standard