1 / 71

Motion Compensated Prediction and the Role of the DCT in Video Coding

Motion Compensated Prediction and the Role of the DCT in Video Coding. Michael Horowitz Applied Video Compression michael@avcompression.com. Outline. Overview block-based hybrid motion compensated predictive video coding ITU-T standards H.261, H.263, H.264

gur
Télécharger la présentation

Motion Compensated Prediction and the Role of the DCT in Video Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motion Compensated Prediction and the Role of the DCT in Video Coding Michael Horowitz Applied Video Compression michael@avcompression.com

  2. Outline • Overview block-based hybrid motion compensated predictive video coding • ITU-T standards H.261, H.263, H.264 • ISO/IEC standards: MPEG-1, MPEG-2 & MPEG-4 • Survey motion estimation & compensation • Discrete cosine transform (DCT) • Coding efficiency • Computational complexity • Perceptual implications

  3. Block-Based Hybrid Motion Compensated Predictive Coding • Video picture partitioned into macroblocks • Macroblock (MB) has three components • One luma • “Y”, represents “lightness” • 16x16 luma samples • Two chroma • “Cb” & “Cr”, represent color • 16x16, 8x16, or 8x8 chroma samples

  4. 4:4:4 4:2:2 4:2:0 Y Cb Cr Block-Based Hybrid Motion Compensated Predictive Coding (continued) • Human Visual System more sensitive to luma • Chroma frequently sub-sampled • Sub-sampling examples • Two coding modes for macroblocks

  5. Motion Estimate Location of input MB Search Region Reference Picture Inter-Picture Macroblock Coding • Estimate motion of blocks from picture-to-picture • Search previously coded (reference) pictures • Encode • Location of motion estimate (motion vector) • Difference between input MB and motion estimate

  6. Intra-Picture Macroblock Coding • Input MB coded using intra-picture prediction • Prediction derived from spatially adjacent MBs • Earlier algorithms offer no intra-picture prediction • Significantly lower coding efficiency than inter-coded MBs at low data rates • Useful when motion estimate is poor • Can be used to stop error propagation

  7. Block-Based Hybrid Motion Compensated Predictive Coding

  8. Survey Motion Estimation and Motion Compensation • Motion models • Translational (focus of talk) • Location of kth motion compensated block • (Xk,Yk) is location of kth input block • (MVx,k,MVy,k) is motion vector (MV) for kth block • Affine motion models • Rotation • Scaling • Video standards do not use affine models

  9. Motion Estimation • Estimate inter-picture block translation • Luma samples (and sometimes chroma) • Example • Distortion: Sum of Absolute Differences (SAD) • Low complexity • Commonly used in real-time production encoders • Find (MVx,k, MVy,k) that minimizes SAD between • Input block sk(i,j) • Motion compensated prediction in reference picture r(i,j) • Subject to search range

  10. Motion Estimation (continued) • Fast motion estimation algorithms (Xk,Yk) Reference Picture r(i,j) Sample Locations Search Range

  11. z(x1,y1) z(x2,y1) z(x*,y*) z(x1,y2) z(x2,y2) Fractional Sample Motion Estimation • Estimate content between samples • Example: bilinear interpolation x1 ≤ x* < x2 and y1 ≤ y* < y2 fx= (x*-x1)/(x2-x1) fy= (y*-y1)/(y2-y1) z(x*,y*) = (1-fx)(1-fy)z(x1,y1) + fx(1-fy) z(x2,y1) + fxfy z(x2,y2) + (1-fx)fyz(x1,y2)

  12. Fractional Sample Motion Estimation(continued) • H.261 • No fractional sample motion estimation • MPEG-1, MPEG-2 and H.263 • 1/2-sample, bilinear interpolation • H.264 | MPEG-4 AVC & SVC • Luma • 1/2-sample, 6-tap interpolation • 1/4-sample, simple average • Chroma (1/8-sample, bilinear)

  13. Fractional Sample Motion Estimation(continued) • Coding efficiency gain H.263, [from Wang 2002]

  14. Multiple Motion Vectors per MB • One motion vector for each sub-block • H.264 results [Bjontegaard 2001]

  15. Multiple Reference Pictures [Wiegand, Zhang, & Girod 1997] • Coding gains • Uncovered areas • More integer motion vector estimates Integer sample location Integer sample location t-3 t-2 t-1 t0 Direction of motion

  16. Multiple Reference Pictures(Continued) • H.263 Annex U [Horowitz 2000]

  17. Multi-Hypothesis Motion Compensated Prediction[Flierl, Wiegand & Girod 1998] • Linear combination of multiple predictions • One motion vector for each prediction • Bi-predicted pictures are special case (2 MVs) • Predictions may be forward & backward in time

  18. Multi-Hypothesis for H.263 • Sequences Mobile & Calendar and Foreman • Results [Flierl 1998]

  19. Overlapped Block Motion Compensation[Orchard & Sullivan 1994] • Special case of multi-hypothesis coding • H.263 advanced prediction mode (Annex F) • Overlapped block motion compensation • 1 coded + 2 “derived” motion vectors • Non-uniform spatial weighting of samples • 4 motion vectors per macroblock

  20. Rate-Distortion Optimization • MV resulting in lowest distortion often not optimal • Goal: Find best tradeoff between distortion and rate • Strategy [Everett III 1963], [Shoham & Gersho 1988] • Minimize Jk for each block k separately, using common  Total bit-rate Total distortion Distortionfor block k Ratefor block k

  21. Perceptual Tuning • Prevent transparent foreground macroblocks • Blurring of fast moving objects • Deblocking filter • Artifacts in the motion wake

  22. Coding Summary • Macroblock-based coding • Two basic macroblock coding modes • Inter-coded MB  motion compensated prediction • Intra-coded MB

  23. 1-D Discrete Cosine Transform • Type IIforward DCT [Ahmed et al. 1974] • Type IIinverse DCT

  24. 2-Dimensional DCT • Forward • Inverse

  25. Basis Functions for 8x8 DCT

  26. Why Choose the DCT? • Coding efficiency • Computational complexity • Perceptual implications

  27. ^ X1 ^ X ^ X2 Coding Efficiency X1 Q1 • Source X = [X1, X2] • Xi is a Gaussian random variable • Mean = 0, Variance = i2 • Rate of quantizer Qi is Ri(bits / index) • Total rate R = R1 + R2 X X2 Q2

  28. Coding Efficiency (continued) • Distortion • Square error • High-rate assumption • High-rate implies R ≥ 3 bits / sample • Often works well for lower rates • Asymptotic Quantization Theory [Gray & Neuhoff 1998] • Total distortion

  29. Rate Allocation Problem • What is smallest D = D* subject to ? • Find optimal value for • Minimizing D with respect to R1 yields

  30. Rate Allocation Problem(continued) It follows that and which implies

  31. ^ ^ ^ X X3 X12 ^ X2 ^ X1 Generalize for k Quantizers • Rate • Distortion • Recall X1 Q1 X12 X2 X Q2 X3 Q3

  32. Generalization (continued) • 2 quantizers with subject to • Minimize with respect to R3

  33. Generalization (continued) • It follows that • Generalize to k quantizers by induction

  34. Optimal Rate and Distortion [Huang & Schultheiss 1963] • Rate • Distortion

  35. Observations and Comments • #1 Optimal rate for Qi proportional to • #2 Optimal distortion • #3 In practice, systems use positive [Segall 1976] integer [Farber & Zeger 2005]Ri

  36. Question • Given Gaussian source X & fixed encoder structure (i.e., k scalar quantizers) how can we minimize D subject to ?

  37. ^ ^ ^ ^ ^ ^ ^ X1 X2 Xk Y2 X Yk Y1 Transform Coding [Kramer & Mathews 1956] Y1 X1 T T-1 Q1 • For orthogonal T Y2 X2 X Q2 … Yk Xk Qk

  38. Fact 1 • Karhunen-Loeve Transform (KLT) produces smallest . [Huang et al. 1963] • a) Gaussian input random variables • b) High-rate quantizers • c) Rate of each quantizer is arbitrary real value • d) Square error distortion measure

  39. Fact 2 • The autocorrelation matrix of the KLT transform vector is diagonal. • KLT coefficients are uncorrelated • There is no general theorem stating uncorrelated quantities can be more efficiently quantized than correlated ones

  40. Fact 3 • If KLT produces , orthogonal produces ≥ then for &  Energy compaction

  41. Practical Considerations • KLT impractical for many systems • Computational complexity • Transform is signal dependent • Compute and apply transform for each input • Consider Fourier based transforms • Fast algorithms exist • Examine loss of coding efficiency resulting from loss of energy compaction

  42. Energy Compaction of Some Discrete Transforms • 1x32 block in natural images [Lohscheller]

  43. 2-D Energy Compaction [from Hedberg & Nilsson 2004] • KLT DCT • DFT

  44. Computational Complexity • Recall DCT may be derived from DFT • First N coefficients of 2N-point DFT • Requires appropriate input sequence symmetry • Requries scaling [Tseng & Miller 1978] wherefm is mth DFT coefficient • Leverage FFT to compute DCT

  45. Computational Complexity(continued) • 1-D 8-point DCT from 16-point DFT • 13 mults, 29 adds [Arai et al. 1988] • 8 final scaling multiplies rolled into quantization • Net 5 mults, 29 adds  best known • Fast 2-D DCT (8x8) • Separable [from Pennebaker & Mitchell 1992] • 80 mults, 464 adds  best known • Non-separable [Feig 1992] • 54 mults, 416 adds, 6 shifts

  46. Perceptual Implications • Contrast sensitivity of HVS • See last page of handout [Barlow & Mollen 1982] • Perceptually tuned quantization tables [Watson] • Filter coefficients prior to quantization • Shape frequency content of source • Exploit HVS contrast sensitivity

  47. Concluding Summary • Motion estimation & compensation • Translation-based motion models • Fractional sample motion estimation • Multiple motion vectors per macroblock • Multiple reference pictures • Multi-hypothesis motion compensated prediction • Overlapped block motion compensation

  48. Concluding Summary • DCT • Near optimal R-D performance for wide range of sources (Gaussian, high-rate assumptions) • Simple relationship to DFT  fast • Perceptual relevance

  49. References • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90–93, Jan. 1974. • Y. Arai, T. Agui, M. Nakajima, “A Fast DCT-SQ Scheme for Images”, Trans. of the IEICE.E 71(11):1095(Nov.1988). E. Feig, S. T.Winograd, “Fast Algorithms for Discrete Cosine Transform”, IEEE Trans. Signal Proc., 40, 2174-2193 (1992). • H. B. Barlow and J. D. Mollon, The Senses. Cambridge: Cambridge University Press, 1982. • G. Bjontegaard “Objective simulation results”, Document VCEG-M34, Video Coding Experts Group (VCEG),Thirteenth Meeting: Austin, Texas, USA, 2-4 April, 2001 • H. Everett III, “Generalized Lagrangian Multiplier Method for Solving Problems of Optimum Allocation of Resources,” Operations Research, vol. 11, pp. 399-417, 1963. • B. Farber and K. Zeger, “Quantization of Multiple Sources Using Integer Bit Allocation" Data Compression Conference (DCC) Salt Lake City, Utah, March 2005 (to appear).

  50. References (continued) • M. Flierl, T. Wiegand, B. Girod, “Locally Optimal Design Algorithm for Block-Based Multi-Hypothesis Motion-Compensated Prediction,” Proc. of the IEEE Data Compression Conference (DCC'98), pp. 239-248, Snowbird, USA, Apr. 1998. • A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. • B. Girod, Lecture for EE368b, Video and Image Compression Stanford University. • R. M. Gray and D. L. Neuhoff, "Quantization," IEEE Transactions on Information Theory, vol. 44, pp. 2325-2384, Oct. 1998. • R. M. Haralick, “A Storage Efficient Way to Implement the Discrete Cosine Transform”, IEEE Transactions on Computers, 25 (6) (1976) 764–765. • H. Hedberg, and P. Nilsson, “A Survey of Various Discrete Transforms used in Digital Image Compression Algorithms,” Proceedings of the Swedish System-On-Chip Conference 2004, Bastad, Sweden, April 13-14, 2004.

More Related