Motion Compensated Prediction and the Role of the DCT in Video Coding

Motion Compensated Prediction and the Role of the DCT in Video Coding Michael Horowitz Applied Video Compression michael@avcompression.com

Outline • Overview block-based hybrid motion compensated predictive video coding • ITU-T standards H.261, H.263, H.264 • ISO/IEC standards: MPEG-1, MPEG-2 & MPEG-4 • Survey motion estimation & compensation • Discrete cosine transform (DCT) • Coding efficiency • Computational complexity • Perceptual implications

Block-Based Hybrid Motion Compensated Predictive Coding • Video picture partitioned into macroblocks • Macroblock (MB) has three components • One luma • “Y”, represents “lightness” • 16x16 luma samples • Two chroma • “Cb” & “Cr”, represent color • 16x16, 8x16, or 8x8 chroma samples

4:4:4 4:2:2 4:2:0 Y Cb Cr Block-Based Hybrid Motion Compensated Predictive Coding (continued) • Human Visual System more sensitive to luma • Chroma frequently sub-sampled • Sub-sampling examples • Two coding modes for macroblocks

Motion Estimate Location of input MB Search Region Reference Picture Inter-Picture Macroblock Coding • Estimate motion of blocks from picture-to-picture • Search previously coded (reference) pictures • Encode • Location of motion estimate (motion vector) • Difference between input MB and motion estimate

Intra-Picture Macroblock Coding • Input MB coded using intra-picture prediction • Prediction derived from spatially adjacent MBs • Earlier algorithms offer no intra-picture prediction • Significantly lower coding efficiency than inter-coded MBs at low data rates • Useful when motion estimate is poor • Can be used to stop error propagation

Block-Based Hybrid Motion Compensated Predictive Coding

Survey Motion Estimation and Motion Compensation • Motion models • Translational (focus of talk) • Location of kth motion compensated block • (Xk,Yk) is location of kth input block • (MVx,k,MVy,k) is motion vector (MV) for kth block • Affine motion models • Rotation • Scaling • Video standards do not use affine models

Motion Estimation • Estimate inter-picture block translation • Luma samples (and sometimes chroma) • Example • Distortion: Sum of Absolute Differences (SAD) • Low complexity • Commonly used in real-time production encoders • Find (MVx,k, MVy,k) that minimizes SAD between • Input block sk(i,j) • Motion compensated prediction in reference picture r(i,j) • Subject to search range

Motion Estimation (continued) • Fast motion estimation algorithms (Xk,Yk) Reference Picture r(i,j) Sample Locations Search Range

z(x1,y1) z(x2,y1) z(x*,y*) z(x1,y2) z(x2,y2) Fractional Sample Motion Estimation • Estimate content between samples • Example: bilinear interpolation x1 ≤ x* < x2 and y1 ≤ y* < y2 fx= (x*-x1)/(x2-x1) fy= (y*-y1)/(y2-y1) z(x*,y*) = (1-fx)(1-fy)z(x1,y1) + fx(1-fy) z(x2,y1) + fxfy z(x2,y2) + (1-fx)fyz(x1,y2)

Fractional Sample Motion Estimation(continued) • H.261 • No fractional sample motion estimation • MPEG-1, MPEG-2 and H.263 • 1/2-sample, bilinear interpolation • H.264 | MPEG-4 AVC & SVC • Luma • 1/2-sample, 6-tap interpolation • 1/4-sample, simple average • Chroma (1/8-sample, bilinear)

Fractional Sample Motion Estimation(continued) • Coding efficiency gain H.263, [from Wang 2002]

Multiple Motion Vectors per MB • One motion vector for each sub-block • H.264 results [Bjontegaard 2001]

Multiple Reference Pictures [Wiegand, Zhang, & Girod 1997] • Coding gains • Uncovered areas • More integer motion vector estimates Integer sample location Integer sample location t-3 t-2 t-1 t0 Direction of motion

Multiple Reference Pictures(Continued) • H.263 Annex U [Horowitz 2000]

Multi-Hypothesis Motion Compensated Prediction[Flierl, Wiegand & Girod 1998] • Linear combination of multiple predictions • One motion vector for each prediction • Bi-predicted pictures are special case (2 MVs) • Predictions may be forward & backward in time

Multi-Hypothesis for H.263 • Sequences Mobile & Calendar and Foreman • Results [Flierl 1998]

Overlapped Block Motion Compensation[Orchard & Sullivan 1994] • Special case of multi-hypothesis coding • H.263 advanced prediction mode (Annex F) • Overlapped block motion compensation • 1 coded + 2 “derived” motion vectors • Non-uniform spatial weighting of samples • 4 motion vectors per macroblock

Rate-Distortion Optimization • MV resulting in lowest distortion often not optimal • Goal: Find best tradeoff between distortion and rate • Strategy [Everett III 1963], [Shoham & Gersho 1988] • Minimize Jk for each block k separately, using common  Total bit-rate Total distortion Distortionfor block k Ratefor block k

Perceptual Tuning • Prevent transparent foreground macroblocks • Blurring of fast moving objects • Deblocking filter • Artifacts in the motion wake

Coding Summary • Macroblock-based coding • Two basic macroblock coding modes • Inter-coded MB  motion compensated prediction • Intra-coded MB

1-D Discrete Cosine Transform • Type IIforward DCT [Ahmed et al. 1974] • Type IIinverse DCT

2-Dimensional DCT • Forward • Inverse

Basis Functions for 8x8 DCT

Why Choose the DCT? • Coding efficiency • Computational complexity • Perceptual implications

^ X1 ^ X ^ X2 Coding Efficiency X1 Q1 • Source X = [X1, X2] • Xi is a Gaussian random variable • Mean = 0, Variance = i2 • Rate of quantizer Qi is Ri(bits / index) • Total rate R = R1 + R2 X X2 Q2

Coding Efficiency (continued) • Distortion • Square error • High-rate assumption • High-rate implies R ≥ 3 bits / sample • Often works well for lower rates • Asymptotic Quantization Theory [Gray & Neuhoff 1998] • Total distortion

Rate Allocation Problem • What is smallest D = D* subject to ? • Find optimal value for • Minimizing D with respect to R1 yields

Rate Allocation Problem(continued) It follows that and which implies

^ ^ ^ X X3 X12 ^ X2 ^ X1 Generalize for k Quantizers • Rate • Distortion • Recall X1 Q1 X12 X2 X Q2 X3 Q3

Generalization (continued) • 2 quantizers with subject to • Minimize with respect to R3

Generalization (continued) • It follows that • Generalize to k quantizers by induction

Optimal Rate and Distortion [Huang & Schultheiss 1963] • Rate • Distortion

Observations and Comments • #1 Optimal rate for Qi proportional to • #2 Optimal distortion • #3 In practice, systems use positive [Segall 1976] integer [Farber & Zeger 2005]Ri

Question • Given Gaussian source X & fixed encoder structure (i.e., k scalar quantizers) how can we minimize D subject to ?

^ ^ ^ ^ ^ ^ ^ X1 X2 Xk Y2 X Yk Y1 Transform Coding [Kramer & Mathews 1956] Y1 X1 T T-1 Q1 • For orthogonal T Y2 X2 X Q2 … Yk Xk Qk

Fact 1 • Karhunen-Loeve Transform (KLT) produces smallest . [Huang et al. 1963] • a) Gaussian input random variables • b) High-rate quantizers • c) Rate of each quantizer is arbitrary real value • d) Square error distortion measure

Fact 2 • The autocorrelation matrix of the KLT transform vector is diagonal. • KLT coefficients are uncorrelated • There is no general theorem stating uncorrelated quantities can be more efficiently quantized than correlated ones

Fact 3 • If KLT produces , orthogonal produces ≥ then for &  Energy compaction

Practical Considerations • KLT impractical for many systems • Computational complexity • Transform is signal dependent • Compute and apply transform for each input • Consider Fourier based transforms • Fast algorithms exist • Examine loss of coding efficiency resulting from loss of energy compaction

Energy Compaction of Some Discrete Transforms • 1x32 block in natural images [Lohscheller]

2-D Energy Compaction [from Hedberg & Nilsson 2004] • KLT DCT • DFT

Computational Complexity • Recall DCT may be derived from DFT • First N coefficients of 2N-point DFT • Requires appropriate input sequence symmetry • Requries scaling [Tseng & Miller 1978] wherefm is mth DFT coefficient • Leverage FFT to compute DCT

Computational Complexity(continued) • 1-D 8-point DCT from 16-point DFT • 13 mults, 29 adds [Arai et al. 1988] • 8 final scaling multiplies rolled into quantization • Net 5 mults, 29 adds  best known • Fast 2-D DCT (8x8) • Separable [from Pennebaker & Mitchell 1992] • 80 mults, 464 adds  best known • Non-separable [Feig 1992] • 54 mults, 416 adds, 6 shifts

Perceptual Implications • Contrast sensitivity of HVS • See last page of handout [Barlow & Mollen 1982] • Perceptually tuned quantization tables [Watson] • Filter coefficients prior to quantization • Shape frequency content of source • Exploit HVS contrast sensitivity

Concluding Summary • Motion estimation & compensation • Translation-based motion models • Fractional sample motion estimation • Multiple motion vectors per macroblock • Multiple reference pictures • Multi-hypothesis motion compensated prediction • Overlapped block motion compensation

Concluding Summary • DCT • Near optimal R-D performance for wide range of sources (Gaussian, high-rate assumptions) • Simple relationship to DFT  fast • Perceptual relevance

References • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90–93, Jan. 1974. • Y. Arai, T. Agui, M. Nakajima, “A Fast DCT-SQ Scheme for Images”, Trans. of the IEICE.E 71(11):1095(Nov.1988). E. Feig, S. T.Winograd, “Fast Algorithms for Discrete Cosine Transform”, IEEE Trans. Signal Proc., 40, 2174-2193 (1992). • H. B. Barlow and J. D. Mollon, The Senses. Cambridge: Cambridge University Press, 1982. • G. Bjontegaard “Objective simulation results”, Document VCEG-M34, Video Coding Experts Group (VCEG),Thirteenth Meeting: Austin, Texas, USA, 2-4 April, 2001 • H. Everett III, “Generalized Lagrangian Multiplier Method for Solving Problems of Optimum Allocation of Resources,” Operations Research, vol. 11, pp. 399-417, 1963. • B. Farber and K. Zeger, “Quantization of Multiple Sources Using Integer Bit Allocation" Data Compression Conference (DCC) Salt Lake City, Utah, March 2005 (to appear).

References (continued) • M. Flierl, T. Wiegand, B. Girod, “Locally Optimal Design Algorithm for Block-Based Multi-Hypothesis Motion-Compensated Prediction,” Proc. of the IEEE Data Compression Conference (DCC'98), pp. 239-248, Snowbird, USA, Apr. 1998. • A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. • B. Girod, Lecture for EE368b, Video and Image Compression Stanford University. • R. M. Gray and D. L. Neuhoff, "Quantization," IEEE Transactions on Information Theory, vol. 44, pp. 2325-2384, Oct. 1998. • R. M. Haralick, “A Storage Efficient Way to Implement the Discrete Cosine Transform”, IEEE Transactions on Computers, 25 (6) (1976) 764–765. • H. Hedberg, and P. Nilsson, “A Survey of Various Discrete Transforms used in Digital Image Compression Algorithms,” Proceedings of the Swedish System-On-Chip Conference 2004, Bastad, Sweden, April 13-14, 2004.

Motion Compensated Prediction and the Role of the DCT in Video Coding

Motion Compensated Prediction and the Role of the DCT in Video Coding

Presentation Transcript

Video Coding

Motion-Compensated JPEG 2000

Clinical Coding – The Clinicians’ role

Investigation of Motion-Compensated Lifted Wavelet Transforms

Research on the Motion Estimation Algorithm in the Multi-View Video Coding

The Role of Savings Goals in the Prediction of Personal Spending

Learning the Appearance and Motion of People in Video

Affine Motion-compensated Prediction

Wyner-Ziv Coding of Motion Video

Motion compensated inter-frame prediction

Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT

Manipulation and Compositing of MC-DCT Compressed Video

Wyner-Ziv Coding of Motion Video

VIDEO OF MOTION

Basics of Video Coding and H.263 Video Coding

Motion-Compensated Lifted Wavelet transform for video coding

Video coding

The Role of Motion Cues in Simulation Testing

The Role of History and Prediction in Data Privacy

Distributed Video Coding with Unsupervised Learning of Motion Estimation

Learning the Appearance and Motion of People in Video

Source-Channel Prediction in Error Resilient Video Coding