1 / 51

CSE 489-02 & CSE 589-02 Multimedia Processing Lecture 11 Video Coding

CSE 489-02 & CSE 589-02 Multimedia Processing Lecture 11 Video Coding. Spring 2009 New Mexico Tech. History. H.264/AVC. MC-DCT Coding Framework. Motion estimation/compensation based on previously decoded frames Block-translation motion model

leoma
Télécharger la présentation

CSE 489-02 & CSE 589-02 Multimedia Processing Lecture 11 Video Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 489-02 & CSE 589-02 Multimedia ProcessingLecture 11 Video Coding Spring 2009 New Mexico Tech

  2. History H.264/AVC

  3. MC-DCT Coding Framework • Motion estimation/compensation based on previously decoded frames • Block-translation motion model • Inter-coding: DCT-based coding of prediction error (residue) • Intra-coding: If motion estimation fails or synchronization is desired, macro-block is encoded in intra-mode • Most international video coding standards are based on this coding framework • Video teleconferencing: H.261, H.263, H.263++, H.264 • Video archive & play-back: MPEG-1, MPEG-2 (in DVDs), MPEG-4

  4. Decoder Hybrid MC-DCT Encoder Input Macro-Block Transform, Quantization, Entropy Coding Encoded Residual (To Channel) Motion Compensated Prediction Entropy Decoding, Inverse Q, Inverse Transform Decoded Input Macro-Block (To Display) Motion Comp. Predictor Frame Buffer (Delay) Motion Vector and Block Mode Data (Side-Info, To Channel) Motion Estimation

  5. Inter and Intra Coding • Intra • MB is encoded as is without motion compensation • DCT followed by Q, zig-zag, run-length, Huffman • Inter • Block-matching motion estimation • Predictive motion residue from best-match block is DCT encoded (similarly to intra-mode) • Motion vector is differentially encoded

  6. Intra-Coding Mode input MB to bit-stream Encoder to motion compensated frame bit-stream to display frame Decoder

  7. Inter-Coding Mode to bit-stream input MB Encoder reference frame

  8. Video Sequence and Picture Intra 0 Inter 1 Inter 2 Inter 3 Inter 4 Inter 5 • Intra Picture (I-Picture) • Encoded without referencing others • All MBs are intra coded • Inter Picture (P-Picture, B-Picture) • Encoded by referencing other pictures • Some MBs are intra coded, and some are inter coded

  9. Group of Pictures GOP GOP GOP … I B B P B B P … B B I B B P … Video stream Frame order: 0 1 2 3 4 5 6 Encoding order: 0 2 3 1 5 6 4 • Group of Pictures (GOP)

  10. Coding of I-Slice DCT Original block Transformed block Quantization matrix Bit-stream 15 0 -2 -1 -1 -1 0 … Entropy coding Zig-zag scan

  11. Coding of P-Slice Motion Compensation - = Original current frame Residual = Motion Vectors + Motion Estimation Frame buffer Reconstructed reference frame

  12. 8 8 Y Y Cr Cb Y Y Motion Estimation in H.261 • Macro-block • Luminance: 16x16, four 8x8 blocks • Chrominance: two 8x8 blocks • Motion estimation only performed for luminance component • Motion vector range • [ -15, 15] 15 15 15 MB 15 Search Area in Reference Frame

  13. Coding of Motion Vectors • MV has range [-15, 15] • Integer pixel ME search only • Motion vectors are differentially & separably encoded • 11-bit VLC for MVD • Example MV = 2 2 3 5 3 1 -1… MVD = 0 1 2 -2 -2 -2… Binary: 1 010 0010 0011 0011 0011…

  14. Inter/Intra Switching • Based on energy of prediction error • High energy: scene change, occlusions, uncovered areas…  use intra mode • Low energy: stationary background, translational motion …  use inter mode VAR INTER 64 INTRA MSE 64

  15. Loop Filter • Optional • Can be turned on or off for each block, usually go together with MC • Advantage • Decreases prediction error by smoothing the prediction frame • Reduces high-frequency artifacts like mosquito effects • Disadvantage • Increases complexity & overhead

  16. ^ ^ = X = X Quantization • Uniform mid-rise quantizer for intra DC coefficients • Uniform mid-tread quantizer with double dead zone for inter DC and all AC coefficients Y Y 2 2 1 1 X -2Q -Q -2Q -Q 0 X Q 2Q 0 -1 Q 2Q -1 -2 -2 For intra DC For inter DC and all AC

  17. H.263 • Standardization effort started Nov 1993 • Aim • low bit-rate video communications, less than 64 kbps • target PSTN and mobile network: 10-32 kbps • Near-term • H.263 and H.263+: established late 1997 • Long-term • H.26L, H.264: still under investigation • Main properties • H.261 with many MPEG features optimized for low bit rates • Performance: 3-4 dB improvements over H.261 at less than 64 kbps; 30% bit rate saving over MPEG-1

  18. MPEG • Coding and communications of moving pictures and associated audio for digital storage and archival • MPEG: Moving Picture Expert Group • MPEG family • MPEG-1, Nov 1992 • MPEG-2, Nov 1994 • MPEG-4, Oct 1998 • MPEG-7, ongoing work • Main features of the MPEG video family • Bi-directional MEMC • I-frame, P-frame, B-frame • Structure: Group of Pictures (GOP), picture, slice, macro-block • Coding decisions

  19. MPEG Goals and Applications • MPEG-1 • Optimized for applications that support a continuous transfer bit rate of about 1.5 Mbps (example, CD-ROM) • Target 1.2 Mbps for video and 250-300 kbps for audio, around analog VHS quality • Does not support interlaced sources • Main target source: SIF YCrCb 4:2:0 360 x 240 x 30 fps • VCD • MPEG-2 • The most commercially successful international coding standard • Wide range of bit rates: 4 – 80 Mbps; optimized for 4 Mbps • Target high-resolution, high-quality video broadcast & playback • DVD, Digital TV: DirecTV, HDTV…

  20. Requirements • Coding of generic video at around 1.5 Mbps at reasonable quality (VHS) • Random access capability, frequent access point • Fast forward and fast rewind capability • Audio-video synchronization during play and access • Simple decoder • Flexibility of data format • Certain degree of robustness to communication errors • Real-time encoder possibility

  21. From H.261 to MPEG-1 • There are a few new features in MPEG-1 comparing to the pioneering H.261 codec • Flexible data sizes and frame rates • More flexible slice structure to replace the fixed GOB structure • Data structure: introducing Group of Picture (GOP) allowing frequent access points • Bi-directional motion compensation, B-frames • Half-pixel motion compensation • More finely tuned VLCs for different purposes • Quantization table (like JPEG) replaces single Q step size

  22. Bidirectional MC Properties • Advantage • Higher coding efficiency, frame rate can be increased significantly with few bits • More accurate motion estimation & compensation • No error propagation • Disadvantage • More memory buffer for frame storage (minimum of 3) • More end-to-end delay

  23. H.264/AVC History • In the early 1990’s, the first video compression standards were introduced: • H.261 (1990) and H.263 (1995) from ITU • MPEG-1 (1993) and MPEG-2 (1996) from ISO • Since then, the technology has advanced rapidly • H.263 was followed by H.263+, H.263++, H.26L • MPEG-1/2 followed by MPEG-4 visual • But industry and research coders are still way ahead • H.264/AVC is a joint project of ITU and ISO, to create an up-to-date standard.

  24. Scope and Context • Aimed at providing high-quality compression for various services: • IP streaming media (50-1500 kbps) • SDTV and HDTV Broadcast and video-on-demand (1 - 8+ Mbps) • DVD • Conversational services (<1 Mbps, low latency) • Standard defines: • Decoder functionality (but not encoder) • File and stream structure • Final results: 2-fold improvement in compression Same fidelity, half the size --- Compared to H.263 and MPEG-2

  25. Video Compression • Motion compensation / prediction • Described current frame based on previous frame • Output description + residual image • Predicted frames are called “inter-frames”. • Some frames (intra-frames) are encoded without prediction, as natural images. • Image transform • Concentrate image energy in relatively few numeric coefficients • Lossy coding • Compress coefficient values in a lossy manner • Try to keep most important information

  26. The H.263 Standard Coder original video compressed video Image Transform Lossy Coding Motion Compensation

  27. The H.263 Standard Coder original video compressed video • H.263 Motion Compensation • Image is divided into 16x16 macroblocks, • Each macroblock is matched against nearby blocks in previous frame (called referenceframe), • “Nearby” = within 15-pixel horizontal/vertical range • Half-pixel accuracy (with bilinear pixel interpolation) • Best match is used to predict the macroblock, • The relative displacement, or motion vector, is encoded and transmitted to decoder • Prediction error for all blocks constitute the residual. Image Transform Lossy Coding Motion Compensation

  28. Motion Compensation Example T=1 (reference) T=2 (current)

  29. The H.263 Standard Coder original video compressed video • H.263 Image Transform • Residual is divided into 8x8 blocks, • 8x8 2-d Discrete Cosine Transform (DCT) is applied to each block independently • DCT coefficients describe spatialfrequencies in the block: • High frequencies correspond to small features and texture • Low frequencies correspond to larger features • Lowest frequency coefficient, called DC, corresponds to the average intensity of the block Image Transform Lossy Coding Motion Compensation

  30. 8x8 DCT Example

  31. 8x8 DCT Example

  32. 8x8 DCT Example

  33. The H.263 Standard Coder original video compressed video • H.263 Lossy Coding • Transform coefficients are quantized: • Some less-significant bits are dropped • Only the remaining bits are encoded • For inter-frames, all coefficients get the same number of bits, except for the DC which gets more. • For intra-frames, lower-frequency coefficients get more bits • To preserve larger features better • The actual number of bits used depends on a quantization parameter (QP), whose value depends on the bit-allocation policy • Finally, bits are encoded using entropy (lossless) code • Traditionally Huffman-style code Image Transform Lossy Coding Motion Compensation

  34. Changes in Motion Compensation • Quarter-pixel accuracy • A gain of 1.5-2dB across the board over ½-pixel • Variable block-size: • Every 16x16 macroblock can be subdivided • Each sub-block gets predicted separately • Multiple and arbitrary reference frames • Vs. only previous (H.263) or previous and next (MPEG). • Anti-aliasing sub-pixel interpolation • Removes some common artifacts in residual

  35. Variable Block-Size MC • Motivation: size of moving/stationary objects is variable • Many small blocks may take too many bits to encode • Few large blocks give lousy prediction • In H.264, each 16x16 macroblock may be: • Kept whole, • Divided horizontally (vertically) into two sub-blocks of size 16x8 (8x16) • Divided into 4 sub-blocks • In the last case, the 4 sub-blocks may be divided once more into 2 or 4 smaller blocks.

  36. H.264 Variable Block Sizes

  37. Motion Scale Example T=1 T=2

  38. Motion Scale Example T=1 T=2

  39. Motion Scale Example T=1 T=2

  40. H.264 VBS Example T=1 T=2

  41. Arbitrary Reference Frames • In H.263, the reference frame for prediction is always the previous frame • In MPEG and H.26L, some frames are predicted from both the previous and the next frames (bi-prediction) • In H.264, any one frame may be used as reference: • Encoder and decoder maintain synchronized buffers of available frames (previously decoded) • Reference frame is specified as index into this buffer • In bi-predictive mode, each macroblock may be: • Predicted from one of the two references • Predicted from both, using weighted mean of predictors

  42. Intra Prediction • Motivation: intra-frames are natural images, so they exhibit strong spatial correlation • Implemented to some extent in H.263++ and MPEG-4, but in transform domain • Macroblocks in intra-coded frames are predicted based on previously-coded ones • Above and/or to the left of the current block • The macroblock may be divided into 16 4x4 sub-blocks which are predicted in cascading fashion • An encoded parameter specifies which neighbors should be used to predict, and how

  43. Intra-Prediction Example

  44. Intra-Prediction ExampleVertical

  45. Intra-Prediction ExampleHorizontal

  46. Intra-Prediction ExampleMain Diagonal

  47. H.264 Image Transform • Motivation: • DCT requires real-number operations, which may cause inaccuracies in inversion • H.264 uses a very simple integer 4x4 transform • A (pretty crude) approximation to 4x4 DCT • Transform matrix contains only +/-1 and +/-2 • Can be computed with only additions, subtractions, and shifts • Results show negligible loss in quality (~0.02dB)

  48. Deblocking Filter Non Deblocked Image Deblocked Image Courtesy : Images from http://compression.ru/video/deblocking/

  49. Entropy Coding • Motivation: traditional coders use fixed, variable-length codes • Essentially Huffman-style codes • Non-adaptive • Can’t encode symbols with probability > 0.5 efficiently, since at least one bit required • H.263 Annex E defines an arithmetic coder • Still non-adaptive • Uses multiple non-binary alphabets, which results in high computational complexity

  50. Entropy Coding: CABAC • Context-adaptive binary arithmetic coding (CABAC) framework designed specifically for H.264 • Binarization: all syntax symbols are translated to bit-strings • 399 predefined context models, used in groups • E.g. models 14-20 used to code macroblock type for inter-frames • The model to use next is selected based on previously coded information (the context)

More Related