1 / 39

Topic for lecture 2

Topic for lecture 2. Topic: video compression The ultimate compression task? Color image (300 x 300 x 24bit): 2.16Mbit/image x 30 image/s = 64.8Mbps Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit

elana
Télécharger la présentation

Topic for lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic for lecture 2 • Topic: video compression • The ultimate compression task? • Color image (300 x 300 x 24bit): • 2.16Mbit/image x 30 image/s = 64.8Mbps • Motion picture: 90min = 64.8Mbps x 60 x 90 = 349.92Gbit • 56.6K modem => Raw download time (excl. sound and overhead) ~ 1717 hours or ~ 72 days!!!

  2. Agenda for lecture 2 • What makes video compression possible? • Implementations of motion compensation • Block matching • The YCbCr color representation • MPEG

  3. Video compression • A sequence of images that needs to be compressed: storage and/or transmission • Ignore audio as images >> audio • Straight forward methods • Motion JPEG • 3D DCT

  4. Temporal redundancy • Less than 10% of the pixels changes more than 1% between frames • Temporal redundancy or interframe correlation • Temporal redundancy > spatial redundancy • Origin: slow camera- and object movements

  5. Motion compensated coding • Second generation of temporal compression method • More efficient (especially with rapid changes) but also more complex: • Ok since the cost of computer power is decreasing faster than the cost of bandwidth • Basic idea: only difference between two images are the moving objects (draw) • Estimate the motion and simply code this information • From prediction and the initial frame we can encode/decode all other frames

  6. Practical issues • Due to noise, camera movements, light changes etc. => the object and background changes => • Calculate the predicted error (difference) and code this • Very hard to track and describe a general object (contour and texture) instead a block of pixels is used as ’object’ • The estimated motion is represented as pure translation: no rotation and scaling • This is justified since we have high frame rates and ’slow’ changes • Denoted the displacement vector or motion vector

  7. Procedure for motion compensated coding • Image sequence => image => blocks of pixels • Step 1: Motion analysis: • Estimate the motion vector of the current block, i.e. the position of the block in the previous image(s) • Step 2: Prediction and differentiation • Predict how the block found in the previous image(s) will look like in the current image • Subtract the predicted block from the current block => difference • Step 3: Entropy encoding of the difference and motion vector • Encoded difference and motion vector << raw image => video compression • Step 3 we know

  8. Motion analysis and prediction • In general we seek the trajectory of a block so we can predict its current position e.g. using weights • In praxis this is too complicated and instead a 0th order predictor is applied: • Predicted block(x,y,t) = block(a,b,t-1) • MPEG uses two 0th order predictors • The only unknown issue: step 1: how do we find the block in the previous frame that best matches the block in the current frame? • Three methods: • Block matching (by far the most applied method) • Pel-recursion (block = 1 pixel) • Optical flow (block = 1 pixel)

  9. Block matching (1) • Principle • The displacement of the pixels in a block are assumed to have the same motion vector • Search window • Maximum from frame rate and context • Usually a square region • Usually p=q => square block • The smaller the block size =>the better prediction, but moreoverhead (motion vectors) • Usually block size = 16 x 16

  10. Block matching (2) • Overlapping blocks improve reconstructed image quality but decrease the bit-rate • Usually non-overlapping blocks are applies • Block matching via a similarity measure: • Sum of squared differences (SSD): S(u,v) = (u-v)^2 • Mean absolute differences (MAD): S(u,v) = |u-v|

  11. Searching strategies • Full search: • Finds global minimum but requires heavy processing! • Only one minimum in the search region => A less computational demanding search strategy • Accept a local minimum => • Larger difference but less processing • Searching strategies with one (local) minimum: • Coarse-fine three-step search • 2D logarithmic search • Conjugate direction search • Etc.

  12. Coarse-fine three-step search • Step 1) Test 9 points within a fixed pattern • Step 2+3) Centre the pattern around the best match and change the distance within the pattern

  13. YCbCr color representation

  14. YCbCr color representation • A camera captures color in RGB format (show) • We would like a representation where the intensity and color is separated: • So we can transmit and decode both a color and gray-scale signal • [R,G,B]: [50,50,50] same color as [100,100,100] • HSI (hue-saturation-intensity) • HSI is complex to calculate so we seek a more simple rep. • YUV-representation is a simple approximation: • Y = Luminance (intensity) = 0.299 R + 0.587 G + 0.114 B • The non-uniform weighting comes from the HVS • U = B – intensity = ”pure” blue color = 0.492 (B - Y) • V = R – intensity = ”pure” red color = 0.877 (R - Y) • Rough approximation but very simple to compute

  15. 1 1 1 2 2 2 3 3 3 4 4 4 YCbCr color representation (3) • The HVS is more sensitive to intensity (Y) than to color (Cb and Cr) so more bits can be used to represent the intensity • Formats: 4:4:4 (24 bits) 4:2:2 (16 bits) 4:2:0 (12 bits) = Y sample = Cb and Cr sample

  16. MPEG • MPEG = Moving pictures experts group • International standard for compression of video (image, sound, and system info.), due to grows in the digital media (e.g. CD-rom, DVD) market. Both transmission and storage • MPEG-1: 1991 • MPEG-2: 1994 • MPEG-2 is MPEG-1 compatible, hence only MPEG-2 used today • MPEG is NOT an algorithm but rather a framework with several algorithms and MANY user-settings. • Fixed protocol, hence fixed decoders (encoder not specified! ) • Asymmetrical codec ~ 100:1 ( JPEG ~1:1 ) • MPEG is a lossy compression algorithm

  17. MPEG-1 • MPEG-2 is an ”add-on” to MPEG-1 • Typical bit rate for MPEG-1 = 1.5Mbps • Meaning that an MPEG-1 decoder can decode and show real-time video that has been compressed to 1.5Mbps. MPEG: Trade off between video quality and bandwidth • Allows resolutions up to 4095 x 4095 at 60Hz • Most used is the CPB (constrained parameter bit steam) • Fixed resolutions and frame rates => HW implementations • Max. resolution = 768 x 576 at 30Hz • Max. bit rate = 1.856Mbps

  18. MPEG-1 compression rate • BT.601 (digital TV-signal): • 704 x 576 x 24bit x 25Hz = 243Mbps • Compression factor: 243Mbps / 1.5Mbps = 162 • JPEG = 10-20 • YCrCb 4:2:0 format: 12 bit per pixel • Basic operation: down-scale to SIF (source input format) • Fixed resolution => HW solutions • 360 x 288 (ignore lines and/or interpolate) • 360 x 288 x 12 x 25Hz = 30.4Mbps => comp. factor = 20 • But can be higher or lower • In general: Fewer input data => better image quality (for fixed bit rate)

  19. MPEG-1 principle (1) • Full-motion-compensated DCT and difference coding • Frames: 1,2,3,4,5,6,7,8,9, … • 1: (DCT-JPEG) • 2,3,4,5,6,7,8,9, …: difference coding • The difference is DCT coded and quantized => loosy compression • Problems? • Error propagation • No random access

  20. MPEG-1 principle (2) • I-picture: intra-coded • Similar to JPEG • P-picture: predictive coded via forward prediction • B-picture: predictive coded via: • forward-, backward-, or bi-directional prediction • Errors in I and P are limited to max one GOP (group of pixels) • Errors in B are limited to one picture • High N and M => good coding but error propagation. • Usually: 13<N<16 and 0<M<4 • Recommended: I each ½ sec. and whenever scene changes • Coding order vs. visualisation order

  21. Entire sequence 16 8 8 8 Cb Cr 8 8 8 16 Y Type: I,P,B MB = Macro Block 4:2:0-format 6 Blocks

  22. Coding one Block (8x8) • Similar to JPEG except for adaptive quantization • DCT, quantization, zig-zag scan, entropy coding • Adaptive quantization controls the quality/amount of data • Intra vs. Inter coding: • I-blocks: Intra • P,B-blocks: Depending on DIFF: 0, motion vectors, Inter, Intra.

  23. Coding one Block (8x8) • Encoding • Decoding

  24. What to remember • Video compression is done by removing the temporal redundancy • Principle: (at block level) • Step 1: Motion analysis => motion vector • Step 2: Calculate the error/difference (subtraction) • Step 3: Entropy encoding of motion vector and difference • Motion analysis: • Pel-recursion • Optical flow • Block matching (the currently applied method) • Block matching • Block of pixels (16 x 16) • Similarity measure • Search region • Different search strategies to avoid the full search

  25. What to remember • Video compression is done by removing the temporal redundancy • Principle: (at (macro)block level) • Step 1: Motion analysis (block matching) => motion vector • Step 2: Calculate the error/difference (subtraction) • Step 3: ’JPEG’-coding (DCT, quantization and entropy encoding) • MPEG-1: • Bit rate ~1.5Mbps • Asymmetrical codec ~ 100:1 ( JPEG ~1:1 ) • Compression rate < 400 (down scaling + YCbCr 4:2:0 => ~20) • Coding-style: I B B P B B P B B I • Questions? • Presentations: email me tbm@cvmt.dk • The end

  26. Xtras

  27. Pel-recursion (1) • The block consists of only one pixel (= pel) • Problem formulation: • Displaced frame difference function: • DFD(x,y,dx,dy) = i(x,y,t) – i(x-dx,y-dy,t-1) • Find (dx,dy) which minimises DFD^2 => most similar pixel => best displacement vector • Solution: • Setting the partial derivatives = 0 • Non-linear programming problem: • Iterative algorithm • Steepest decent method • Newton-Raphson’s method • others

  28. Pel-recursion (2) • Algorithm: • Find the motion vector (dx,dy) for the first pixel • The motion vectors are correlated => • Use ’old’ (dx,dy) as initial guess for the iterative algorithm => recursion

  29. Optical flow • The block consists of only one pixel • Similar to Pel-recursive but calculated in a different manner

  30. Comparing the 3 types of motion analysis • The three: pel-recursion, optical flow and block matching • Optical flow and pel-recursion calculated one motion vector for each pixel => • More precise => predicted block and current block are more similar => smaller difference => more compact coding of the difference. • More overhead as more motion vectors are to be coded • More complex to calculate • Pixel methods avoid the block artefacts of block matching • Block matching is (at present) more suitable • Used in all coding standards

  31. Temporal methods • Two methods which exploit both the spatial and temporal redundancies • Frame replenishment • Motion compensation • Both utilise prediction => short summery

  32. Frame replenishment (1) • Exploit the temporal redundancy • First generation of temporal compression method • If: value changed significantly: | i(x,y,t) – i(x,y,t-1) | > TH • Then: code value and position: i(x,y,t) x,y • Else: code nothing => re-use i(x,y,t-1) • Enhancements: • Send differences instead of values • Remove noise from the images prior to processing

  33. Frame replenishment (2) • A fixed bit rate of 1Mbps means that the decoder can only decode and play-back real-time video compressed to 1Mbps • Many changes between two images => many pixels to be coded. • To achieve the same bit rate => TH is higher => only large changes are coded => poorer reconstruction aka. the dirty window effect

  34. 2D logarithmic search • Test 5 points within a fixed pattern • Centre the pattern around the best match • When best match is in the centre or on the border: reduce distance in pattern

  35. Conjugate direction search • Step 1: Test 3 vertical points next to each other • Step 2: Move to minimum point • Continue step 1 and 2 until a minimum is found. Then repeat the process in the vertical direction

  36. Y Cb Cr 0.257 0.504 0.098 -0.148 -0.291 0.439 0.439 -0.368 -0.071 R G B 16 128 128 0.299 0.587 0.114 -0.147 -0.289 0.436 0.615 -0.515 -0.100 R G B + = = Y U V YCbCr color representation (2) • YUV-representation can have negative values, so YUV-representation is scaled and shifted to avoid this => YCbCr-representation • Cb and Cr are denoted the chrominances • YCbCr is the representation utilised in image/video compression

  37. dB Hz Audio in MPEG-1 • 16 bit sampled at: 16, 22.05, 24, 32, 44.1 and 48Kbps • Stereo at 44.1Kbps = 1.4Mbps • Compression based on psycho-acoustic redundancy: • Three methods: • Layer 1: Target rate = 384Kbps • Layer 2: Target rate = 256Kbps • Layer 3: Target rate = 128Kbps • Layer 3 is the most advanced and often applied • It has a nickname, which? dB Hz

  38. MPEG-2 • Defined in 1994 • Developed for DTV but has lots of other applications • Based on MPEG-1 (backward compatible) • Bit rates: 1.5Mbps – 60Mbps. Target: 2-15Mbps (best: 4) • Lots of new features including: • Support for fields, support for 4:4:4 and 4:2:2 • Alternative zig-zag scan, better motion vectors • Scalability to allow any subset of a stream to be decoded and visualised, etc. • MPEG-3: Purpose: HDTV • Merged with MPEG-2 => no MPEG-3 standard

  39. MPEG-4 • Both for real video and synthetic video • Very low bit rates < 64Kbps => efficient coding • Content based coding: code the objects • Shape, texture and sprite (background objects) • Interactivity • Popular coding standards:

More Related