720 likes | 1.01k Vues
Motion Compensated Prediction and the Role of the DCT in Video Coding. Michael Horowitz Applied Video Compression michael@avcompression.com. Outline. Overview block-based hybrid motion compensated predictive video coding ITU-T standards H.261, H.263, H.264
E N D
Motion Compensated Prediction and the Role of the DCT in Video Coding Michael Horowitz Applied Video Compression michael@avcompression.com
Outline • Overview block-based hybrid motion compensated predictive video coding • ITU-T standards H.261, H.263, H.264 • ISO/IEC standards: MPEG-1, MPEG-2 & MPEG-4 • Survey motion estimation & compensation • Discrete cosine transform (DCT) • Coding efficiency • Computational complexity • Perceptual implications
Block-Based Hybrid Motion Compensated Predictive Coding • Video picture partitioned into macroblocks • Macroblock (MB) has three components • One luma • “Y”, represents “lightness” • 16x16 luma samples • Two chroma • “Cb” & “Cr”, represent color • 16x16, 8x16, or 8x8 chroma samples
4:4:4 4:2:2 4:2:0 Y Cb Cr Block-Based Hybrid Motion Compensated Predictive Coding (continued) • Human Visual System more sensitive to luma • Chroma frequently sub-sampled • Sub-sampling examples • Two coding modes for macroblocks
Motion Estimate Location of input MB Search Region Reference Picture Inter-Picture Macroblock Coding • Estimate motion of blocks from picture-to-picture • Search previously coded (reference) pictures • Encode • Location of motion estimate (motion vector) • Difference between input MB and motion estimate
Intra-Picture Macroblock Coding • Input MB coded using intra-picture prediction • Prediction derived from spatially adjacent MBs • Earlier algorithms offer no intra-picture prediction • Significantly lower coding efficiency than inter-coded MBs at low data rates • Useful when motion estimate is poor • Can be used to stop error propagation
Survey Motion Estimation and Motion Compensation • Motion models • Translational (focus of talk) • Location of kth motion compensated block • (Xk,Yk) is location of kth input block • (MVx,k,MVy,k) is motion vector (MV) for kth block • Affine motion models • Rotation • Scaling • Video standards do not use affine models
Motion Estimation • Estimate inter-picture block translation • Luma samples (and sometimes chroma) • Example • Distortion: Sum of Absolute Differences (SAD) • Low complexity • Commonly used in real-time production encoders • Find (MVx,k, MVy,k) that minimizes SAD between • Input block sk(i,j) • Motion compensated prediction in reference picture r(i,j) • Subject to search range
Motion Estimation (continued) • Fast motion estimation algorithms (Xk,Yk) Reference Picture r(i,j) Sample Locations Search Range
z(x1,y1) z(x2,y1) z(x*,y*) z(x1,y2) z(x2,y2) Fractional Sample Motion Estimation • Estimate content between samples • Example: bilinear interpolation x1 ≤ x* < x2 and y1 ≤ y* < y2 fx= (x*-x1)/(x2-x1) fy= (y*-y1)/(y2-y1) z(x*,y*) = (1-fx)(1-fy)z(x1,y1) + fx(1-fy) z(x2,y1) + fxfy z(x2,y2) + (1-fx)fyz(x1,y2)
Fractional Sample Motion Estimation(continued) • H.261 • No fractional sample motion estimation • MPEG-1, MPEG-2 and H.263 • 1/2-sample, bilinear interpolation • H.264 | MPEG-4 AVC & SVC • Luma • 1/2-sample, 6-tap interpolation • 1/4-sample, simple average • Chroma (1/8-sample, bilinear)
Fractional Sample Motion Estimation(continued) • Coding efficiency gain H.263, [from Wang 2002]
Multiple Motion Vectors per MB • One motion vector for each sub-block • H.264 results [Bjontegaard 2001]
Multiple Reference Pictures [Wiegand, Zhang, & Girod 1997] • Coding gains • Uncovered areas • More integer motion vector estimates Integer sample location Integer sample location t-3 t-2 t-1 t0 Direction of motion
Multiple Reference Pictures(Continued) • H.263 Annex U [Horowitz 2000]
Multi-Hypothesis Motion Compensated Prediction[Flierl, Wiegand & Girod 1998] • Linear combination of multiple predictions • One motion vector for each prediction • Bi-predicted pictures are special case (2 MVs) • Predictions may be forward & backward in time
Multi-Hypothesis for H.263 • Sequences Mobile & Calendar and Foreman • Results [Flierl 1998]
Overlapped Block Motion Compensation[Orchard & Sullivan 1994] • Special case of multi-hypothesis coding • H.263 advanced prediction mode (Annex F) • Overlapped block motion compensation • 1 coded + 2 “derived” motion vectors • Non-uniform spatial weighting of samples • 4 motion vectors per macroblock
Rate-Distortion Optimization • MV resulting in lowest distortion often not optimal • Goal: Find best tradeoff between distortion and rate • Strategy [Everett III 1963], [Shoham & Gersho 1988] • Minimize Jk for each block k separately, using common Total bit-rate Total distortion Distortionfor block k Ratefor block k
Perceptual Tuning • Prevent transparent foreground macroblocks • Blurring of fast moving objects • Deblocking filter • Artifacts in the motion wake
Coding Summary • Macroblock-based coding • Two basic macroblock coding modes • Inter-coded MB motion compensated prediction • Intra-coded MB
1-D Discrete Cosine Transform • Type IIforward DCT [Ahmed et al. 1974] • Type IIinverse DCT
2-Dimensional DCT • Forward • Inverse
Why Choose the DCT? • Coding efficiency • Computational complexity • Perceptual implications
^ X1 ^ X ^ X2 Coding Efficiency X1 Q1 • Source X = [X1, X2] • Xi is a Gaussian random variable • Mean = 0, Variance = i2 • Rate of quantizer Qi is Ri(bits / index) • Total rate R = R1 + R2 X X2 Q2
Coding Efficiency (continued) • Distortion • Square error • High-rate assumption • High-rate implies R ≥ 3 bits / sample • Often works well for lower rates • Asymptotic Quantization Theory [Gray & Neuhoff 1998] • Total distortion
Rate Allocation Problem • What is smallest D = D* subject to ? • Find optimal value for • Minimizing D with respect to R1 yields
Rate Allocation Problem(continued) It follows that and which implies
^ ^ ^ X X3 X12 ^ X2 ^ X1 Generalize for k Quantizers • Rate • Distortion • Recall X1 Q1 X12 X2 X Q2 X3 Q3
Generalization (continued) • 2 quantizers with subject to • Minimize with respect to R3
Generalization (continued) • It follows that • Generalize to k quantizers by induction
Optimal Rate and Distortion [Huang & Schultheiss 1963] • Rate • Distortion
Observations and Comments • #1 Optimal rate for Qi proportional to • #2 Optimal distortion • #3 In practice, systems use positive [Segall 1976] integer [Farber & Zeger 2005]Ri
Question • Given Gaussian source X & fixed encoder structure (i.e., k scalar quantizers) how can we minimize D subject to ?
^ ^ ^ ^ ^ ^ ^ X1 X2 Xk Y2 X Yk Y1 Transform Coding [Kramer & Mathews 1956] Y1 X1 T T-1 Q1 • For orthogonal T Y2 X2 X Q2 … Yk Xk Qk
Fact 1 • Karhunen-Loeve Transform (KLT) produces smallest . [Huang et al. 1963] • a) Gaussian input random variables • b) High-rate quantizers • c) Rate of each quantizer is arbitrary real value • d) Square error distortion measure
Fact 2 • The autocorrelation matrix of the KLT transform vector is diagonal. • KLT coefficients are uncorrelated • There is no general theorem stating uncorrelated quantities can be more efficiently quantized than correlated ones
Fact 3 • If KLT produces , orthogonal produces ≥ then for & Energy compaction
Practical Considerations • KLT impractical for many systems • Computational complexity • Transform is signal dependent • Compute and apply transform for each input • Consider Fourier based transforms • Fast algorithms exist • Examine loss of coding efficiency resulting from loss of energy compaction
Energy Compaction of Some Discrete Transforms • 1x32 block in natural images [Lohscheller]
2-D Energy Compaction [from Hedberg & Nilsson 2004] • KLT DCT • DFT
Computational Complexity • Recall DCT may be derived from DFT • First N coefficients of 2N-point DFT • Requires appropriate input sequence symmetry • Requries scaling [Tseng & Miller 1978] wherefm is mth DFT coefficient • Leverage FFT to compute DCT
Computational Complexity(continued) • 1-D 8-point DCT from 16-point DFT • 13 mults, 29 adds [Arai et al. 1988] • 8 final scaling multiplies rolled into quantization • Net 5 mults, 29 adds best known • Fast 2-D DCT (8x8) • Separable [from Pennebaker & Mitchell 1992] • 80 mults, 464 adds best known • Non-separable [Feig 1992] • 54 mults, 416 adds, 6 shifts
Perceptual Implications • Contrast sensitivity of HVS • See last page of handout [Barlow & Mollen 1982] • Perceptually tuned quantization tables [Watson] • Filter coefficients prior to quantization • Shape frequency content of source • Exploit HVS contrast sensitivity
Concluding Summary • Motion estimation & compensation • Translation-based motion models • Fractional sample motion estimation • Multiple motion vectors per macroblock • Multiple reference pictures • Multi-hypothesis motion compensated prediction • Overlapped block motion compensation
Concluding Summary • DCT • Near optimal R-D performance for wide range of sources (Gaussian, high-rate assumptions) • Simple relationship to DFT fast • Perceptual relevance
References • N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90–93, Jan. 1974. • Y. Arai, T. Agui, M. Nakajima, “A Fast DCT-SQ Scheme for Images”, Trans. of the IEICE.E 71(11):1095(Nov.1988). E. Feig, S. T.Winograd, “Fast Algorithms for Discrete Cosine Transform”, IEEE Trans. Signal Proc., 40, 2174-2193 (1992). • H. B. Barlow and J. D. Mollon, The Senses. Cambridge: Cambridge University Press, 1982. • G. Bjontegaard “Objective simulation results”, Document VCEG-M34, Video Coding Experts Group (VCEG),Thirteenth Meeting: Austin, Texas, USA, 2-4 April, 2001 • H. Everett III, “Generalized Lagrangian Multiplier Method for Solving Problems of Optimum Allocation of Resources,” Operations Research, vol. 11, pp. 399-417, 1963. • B. Farber and K. Zeger, “Quantization of Multiple Sources Using Integer Bit Allocation" Data Compression Conference (DCC) Salt Lake City, Utah, March 2005 (to appear).
References (continued) • M. Flierl, T. Wiegand, B. Girod, “Locally Optimal Design Algorithm for Block-Based Multi-Hypothesis Motion-Compensated Prediction,” Proc. of the IEEE Data Compression Conference (DCC'98), pp. 239-248, Snowbird, USA, Apr. 1998. • A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. • B. Girod, Lecture for EE368b, Video and Image Compression Stanford University. • R. M. Gray and D. L. Neuhoff, "Quantization," IEEE Transactions on Information Theory, vol. 44, pp. 2325-2384, Oct. 1998. • R. M. Haralick, “A Storage Efficient Way to Implement the Discrete Cosine Transform”, IEEE Transactions on Computers, 25 (6) (1976) 764–765. • H. Hedberg, and P. Nilsson, “A Survey of Various Discrete Transforms used in Digital Image Compression Algorithms,” Proceedings of the Swedish System-On-Chip Conference 2004, Bastad, Sweden, April 13-14, 2004.