Video Coding For Compression . . . and Beyond

Compression Video Coding For Compression. . .and Beyond Bernd Girod Information Systems LaboratoryDepartment of Electrical Engineering Stanford University

Bit Consumption of US Households Bit equivalent, assuming state-of-the-art compression, year 2000

CIF QCIF Desirable Compression Ratios SDTV broadcasting~2Mbps ITU-R 601 166 Mbps ~ 100 : 1 DSL ~200 kbps ~ 1,000 : 1 Dial-up modem, wireless link ~ 20 kbps ~ 10,000 : 1

Outline • Video compression – state-of-the-art • Beyond compression • Rate-scalable video • Wavelet video coding • Error-resilient video transmission • Unequal error protection • Optimal scheduling for packet networks • Distributed video coding

“It has been customary in the past to transmit successive complete images of the transmitted picture.” [...] “In accordance with this invention, this difficulty is avoided by transmitting only the difference between successive images of the object.”

Coder Control Control Data Transform/Quantizer Quant.Transf. coeffs - Decoder Deq./Inv. Transform Entropy Coding 0 Motion- Compensated Predictor Intra/Inter Motion Data Motion Estimator Motion-Compensated Hybrid Coding Video in Standards: H.261, MPEG-1, MPEG-2, H.263, MPEG-4, H.264/AVC

Coder Control Control Data Transform/Quantizer Quant.Transf. coeffs - Decoder Deq./Inv. Transform Entropy Coding 0 Motion- Compensated Predictor Intra/Inter Motion Data Motion Estimator Motion-Compensated Hybrid Coding Video in ¼-pixel accuracy Standards:H.261, MPEG-1, MPEG-2, H.263,MPEG-4, H.264/AVC

Coder Control Control Data Transform/Quantizer Quant.Transf. coeffs - Decoder Deq./Inv. Transform Entropy Coding 0 Motion- Compensated Predictor Intra/Inter Adaptive block sizes . . . Motion Data Motion Estimator Motion-Compensated Hybrid Coding Video in Standards:H.261, MPEG-1,MPEG-2,H.263, MPEG-4, H.264/AVC

Coder Control Control Data Transform/Quantizer Quant.Transf. coeffs - Decoder Deq./Inv. Transform Entropy Coding 0 Motion- Compensated Predictor Intra/Inter Multiple Past Reference Frames Motion Data Motion Estimator Motion-Compensated Hybrid Coding Video in Standards:H.261, MPEG-1, MPEG-2,H.263, MPEG-4, H.264/AVC

Coder Control Control Data Transform/Quantizer Quant.Transf. coeffs - Decoder Deq./Inv. Transform Entropy Coding 0 Motion- Compensated Predictor Intra/Inter Motion Data Motion Estimator Motion-Compensated Hybrid Coding Video in Generalized B-Frames Standards:H.261, MPEG-1, MPEG-2,H.263, MPEG-4, H.264/AVC

Total bit-rate Total distortion Distortionfor block i Ratefor block i Lagrangiancostfor block i Rate-Distortion Optimized Coder Control • Minimize Lagrangian cost function • Strategy: minimize Ji for each block i separately, using a common Lagrange multiplier l

~15% Multiple Reference Frames in H.264/AVC Mobile & Calendar (CIF, 30 fps) 38 37 36 35 34 33 32 PSNR Y [dB] 31 30 29 PBB... with generalized B pictures 28 PBB... with classic B pictures PPP... with 5 previous references 27 PPP... with 1 previous reference 26 0 1 2 3 4 R [Mbit/s]

>25% Multiple Reference Frames in H.264/AVC Mobile & Calendar (CIF, 30 fps) 38 37 36 35 34 33 32 PSNR Y [dB] 31 30 29 PBB... with generalized B pictures 28 PBB... with classic B pictures PPP... with 5 previous references 27 PPP... with 1 previous reference 26 0 1 2 3 4 R [Mbit/s]

~40% Multiple Reference Frames in H.264/AVC Mobile & Calendar (CIF, 30 fps) 38 37 36 35 34 33 32 PSNR Y [dB] 31 30 29 PBB... with generalized B pictures 28 PBB... with classic B pictures PPP... with 5 previous references 27 PPP... with 1 previous reference 26 0 1 2 3 4 R [Mbit/s]

Internet video streaming Surprising Success of ITU-T Rec. H.263 . . . and what is was used for. What H.263 was developed for . . . ?? Analog videophone

Internet Video Streaming Streaming client • How to accommodate heterogeneous bit-rates? • How to react to network congestion? • How to mitigate late or lost packets? DSL Media Server Internet dial-up modem wireless

Efficiency gap Enhancement layer variable bit-rate Base layer 20 kbps Fine Granular Scalability (FGS) H.264 with/without FGS option Foreman sequence (5fps) ~2dB gap

7 H H 6 5 H H H H H H H H 4 H H H H H H H H 3 2 1 0 Wavelet Video Coder Originalvideoframes LH LH LLH LLL Spatial WaveletTransform TemporalWavelet Transform Embedded Quantization & Entropy Coding • [Taubman & Zakhor,1994] [Ohm, 1994][Choi & Woods, 1999] [Hsiang & Woods, VCIP ’99] . . . and others

Low Band Even Frames Analysis: P U Motion Compensation Odd Frames High Band Low Band Even Frames Synthesis: P U [Secker & Taubman, 2001] [Popescu & Bottreau, 2001] Odd Frames High Band Lifting

MC Wavelet Coding vs. H.264/AVC 38 36 Non-scalable H.264/AVC 34 32 30 Luminance PSNR (dB) 28 26 Scalable MC 5/3 Wavelet • Sequence: Mobile CIF • H.264/AVC • high complexity RD control • CABAC • PBBPBBP . . . • 5 prev/3 future reference frames • data courtesy of M. Flierl 24 22 20 2.0 1.8 1.6 0.6 1.4 0.4 1.2 0.2 1.0 0.8 [Taubman & Secker, VCIP 2003]courtesy D. Taubman bit-rate (Mbps)

Wavelet Synthesis with Lossy Motion Vector Videoin Videoout Inverse Wavelet Transform MC Wavelet Transform Embedded Encoding Decoder Minimize J=D+lR Embedded Encoding Decoder Motion Estimator Minimize J=D+lR [Taubman & Secker, ICIP03]

40 38 Non-embedded single-rate 36 34 Video PSNR (dB) 32 Embedded wavelet coefficients Lossless motion 30 28 Embedded wavelet coefficientsLossy motion 26 CIF Foreman 24 0 200 400 600 800 1000 1200 - Bit Rate (kbps) R-D Performance with Lossy Motion Vector [Taubman & Secker, VCIP 2003]courtesy D. Taubman

base layer enhancement layer packet network redundancy symbols K Reed-Solomon codeword N-K Priority Encoding Transmission (PET) information symbols … block of packets [Albanese, Blömer, Edmonds, Luby, Sudan, 1996] [Davis & Danskin, 1996] [Horn, Stuhlmuller, Link, Girod, 1999] [Puri, Ramchandran, 1999] [Mohr, Riskin, Ladner, 2000] [Stankovic, Hamzaoui, Xiong, 2002] [Chou, Wang, Padmanabhan, 2003] . . . and many more . . .

loss probability loss probability lead-time lead-time Packet Delay Jitter and Loss pdf e (1-e) loss k  delay

Updated Packet Schedule Updated Packet Schedule Updated Packet Schedule Updated Packet Schedule Rate-distortion preamble Video packets Request stream Packet Schedule Smart Prefetching Idea: Send more important packets earlier to allow for more retransmissions Server Client Internet [Podolsky, McCanne, Vetterli 2000] [Miao, Ortega 2000] [Chou, Miao 2001]

Rate-Distortion Preamble I I • Each media packet n is labeled by • Bn— size [in bits] of data unit n • Ddn —distortion reduction if n is decoded • tn — decoding deadline for n I B P B P B P I B P B P B P … … …

For video: Ddn must be made “state-dependent” to accurately capture concealment Rate-Distortion Preamble I I • Each media packet n is labeled by • Bn— size [in bits] of data unit n • Ddn —distortion reduction if n is decoded • tn — decoding deadline for n I B P B P B P I B P B P B P … … …

ack: 1 ack: 1 send: 1 1 send: 1 0 0 0 0 1 0 0 1 0 send: 1 1 0 0 0 “Policy“ minimizing J = D + lR 0 1 0 0 Observation Action tcurrent+Dt tcurrent+2Dt tcurrent Markov Decision Tree for One Packet ... N transmission opportunities before deadline

~50 % R-D Optimized Streaming Performance PSNR [dB] • Foreman • 120 frames • 10 fps, I-P-P-… • H.263+ 2 Layer SNR scalable • 20 frame GOP • Copy Concealment • 20 % loss forward and back • Γ-distributed delay • κ = 10 ms • μ = 50 ms • σ = 23 ms • Pre-roll 400ms Bit-Rate [kbps]

Naive Coding Questions • To achieve graceful degradation in case of channel error for a digitally encoded signal, is an embedded signal representation (aka layers, aka data partitioning) always needed? • Can one, in general, send refinement information for an analog (i.e. uncoded) signal transmission over a noisy channel?

Side info Digital Channel Wyner- Ziv Encoder Wyner- Ziv Decoder Digitally Enhanced Analog Transmission • Forward error protection of the signal waveform • Information-theoretic bounds [Shamai, Verdu, Zamir,1998] • “Systematic lossy source-channel coding” Analog Channel (uncoded)

Wyner-Ziv Decoder A Wyner-Ziv Encoder A S* Wyner-Ziv Decoder B Wyner-Ziv Encoder B S** Forward Error Protection of Compressed Video Analog channel (uncoded) Any OldVideo Encoder Video Decoder with Error Concealment Graceful degradation without a layered signal representation S S’ Error-Prone channel [Aaron, Rane, Girod, ICIP 2003]

main S* S MPEG Encoder ED + Q-1 T-1 MC Reconstructed Frame at Encoder Channel S’ q-1 ED T-1 + MC R-S Decoder R-S Encoder MPEG Encoder coarse MPEG Encoder Side information Slepian-Wolf Encoder coarse Wyner-Ziv Encoder Wyner-Ziv MPEG Codec [Rane, Aaron, Girod, VCIP 2004]

Main Stream @ 1.092 Mbps FEC (n,k) = (40,36) FEC bitrate = 120 Kbps Total = 1.2 Mbps WZ Stream @ 270 Kbps FEP (n,k) = (52,36) WZ bitrate = 120 Kbps Total = 1.2 Mbps Graceful Degradation with Forward Error Protection

Visual Comparison of Degradation at Same PSNR Foreman 50 CIF frames @ symbol error rate = 4 x 10-4 With FEC 1 Mbps + 120 kbps (38.32 db) With FEP 1 Mbps + 120 kbps (38.78 db)

Superior Robustness of FEP Foreman 50 CIF frames @ symbol error rate = 10-3 With FEC 1 Mbps + 120 kbps (33.03 db) With FEP 1 Mbps + 120 kbps (38.40 db)

Lossy Compression with Side Information Source Decoder Encoder [Wyner, Ziv, 1976] For mse distortion and Gaussian statistics, rate-distortion functions of the two systems are the same. Source Decoder Encoder

Interframe Decoder Intraframe Encoder WZ frames Slepian-Wolf Codec X’ Reconstruction Turbo Decoder Turbo Encoder X Scalar Quantizer Buffer Request bits Y Key frames K’ Conventional Intraframe decoding Interpolation/ Extrapolation Conventional Intraframe coding K [Aaron, Zhang, Girod, Asilomar 2002] [Aaron, Rane, Zhang, Girod, DCC 2003] Ultra-Low-Complexity Video Coding

3 dB 8 dB R-D Performance Ultra-Low-Complexity Video Coder • Sequence: Foreman • WZ frames - even frames • Key frames - odd frames • Side information - motion compensated interpolation of key frames

Ultra-Low-Complexity Video Coder Wyner-Ziv Codec 274 kbps, 39.0 dB H263+ Intraframe Coding 330 kbps, 32.9 dB

Ultra-Low-Complexity Video Coder Wyner-Ziv Codec 274 kbps, 39.0 dB H263+ I-B-I-B 276 kbps, 41.8 dB

Stanford Camera Array Courtesy Marc Levoy, Stanford Computer Graphics Lab

Light Field Compression Wyner-Ziv, Pixel-Domain JPEG-2000 Rate: 0.11 bpp PSNR 39.9 dB Rate: 0.11 bpp PSNR 37.4 dB

Conclusions • Video compression is very important. . . but there is more to video coding than compression • Rate-scalable video representations: mc lifting break-through • Robust video transmission • Virtual priority mechanisms by packet scheduling • RD gains easily larger than from super-clever compression • Distributed video coding: radically different approach • Graceful degradation w/o layers • Ultra-low-complexity coders • UbiquitousJ=D+lR

Acknowledgments Anne M. Aaron Jacob Chakareski Philip A. Chou J=D+lR Markus Flierl Sang-eun Han Mark Kalman Marc Levoy Yi Liang Shantanu Rane David Rebollo-Monedero Andrew Secker David Taubman Thomas Wiegand Xiaoqing Zhu Rui Zhang

Progress is a wonderful thing,if only it would stop . . . Robert Musil

Video Coding For Compression . . . and Beyond

Video Coding For Compression . . . and Beyond

Presentation Transcript

Video Compression and Standards

Video Coding

Video Compression

Wyner-Ziv Coding for Video: Applications to Compression and Error Resilience

Source Coding-Compression

Video Compression

Video Compression

Video Compression

MPEG Video Coding II — MPEG-4, 7 and Beyond

Video Compression

MPEG Video Coding II — MPEG-4, 7 and Beyond

Deblocking Algorithms in Video and Image Compression Coding

Image and Video Compression

Efficient Scalable Video Compression by Scalable Motion Coding

Image and Video Compression

Video Encoding and Compression

Video Compression

Video Compression and Standards

Video coding

Video Compression

Coding Theory, Compression, and Cryptography

Image Compression: Coding and Decoding