Motion compensated inter-frame prediction

Motion compensated inter-frame prediction • The previous chapter concentrated on the removing of the spatial redundancy, so the redundancy in one frame or field. • Next topic will be the removing of temporal redundancy between frames. • The technique relies on the fact that within a short sequence of the same general image, most objects remain in the same location, while others move only a short distance. tMyn

Difference coding is a very simple interframe compression process during which each frame of a sequence is compared with its predecessor and only pixels that have changed are updated. • If the number of pixels to be updated is large, then this overhead can adversely affect compression. • How to make it better? • Firstly, the intensity of many pixels will change only slightly and when coding is allowed to be lossy, only pixels that change significantly need be updated. • Thus, not every changed pixel will be updated. tMyn

Secondly, difference coding need not operate at the pixel level, but at the block level. • If the frames are divided into non-overlapping blocks and each block is compared with its counterpart in the previous frame, then only blocks that change significantly need be updated. • Updating whole blocks of pixels at once reduces the overhead required to specify where updates take place. tMyn

If pixels are updated in blocks, some pixels will be updated unnecessarily, especially if large blocks are used. • Also, in parts of the image where updated blocks border parts of the image that have not been updated, discontinuities might be visible and this problem is worse when larger blocks are used. • Block based difference coding can be further improved upon by compensating for the motion between frames. • Difference coding, no matter how sophisticated, is almost useless where there is a lot of motion. tMyn

Only objects that remain stationary within the image can be effectively coded. • If there is a lot of motion or indeed if the camera itself is moving, then very few pixels will remain unchanged. • Even a very slow pan of a still scene will have too many changes to allow difference coding to be effective, even though much of the image content remains from frame to frame. • To solve this problem it is necessary to compensate in some way for object motion. tMyn

The basic coding unit for removal of spatial redundancy was defined to be an 8*8 block. • However with MPEG 2 motion-compensation is usually based on 16*16 block, termed a macroblock. • This size is a trade-off between the requirement to have a large macroblock size in order to minimize the bit rate needed to transmit the motion representation or motion vectors, and the requirement to have a small macroblock in order to be able to vary the prediction process within the picture content and motion. tMyn

There are many methods available to generate a motion compensated prediction. • These include forward prediction, where a macroblock is predicted from a past block, backward prediction, where a block is predicted from a future block, and intracoding where no prediction is made from the macroblocks. • These prediction modes are applied to MPEG 2 pictures depending on the picture type. tMyn

The motion is described as a two-dimensional motion vector that specifies where to retrieve a macroblock from a previously decoded frame to predict the sample values of the current macroblock. • After a macroblock has been compressed using motion compensation, it contains both the spatialdifference (motion vectors) and content difference (error terms) between the reference macroblock andmacroblock being coded. tMyn

Note that there are cases where information in a scene cannot be predicted from the previous scene, such as when a door opens. • The previous scene doesn’t contain the details of the area behind the door. • Motion-compensated interframe prediction is based on techniques similar to the well-known differential pulse-code modulation (DPCM) principle, Figure 1. tMyn

prediction based on the previous locally-decoded output s(n)-s(n-1) _ Quantized prediction error to channel Input s(n) QUANTIZER + Locally-decoded output PREDICTOR Figure 1. Basic DPCM coder. tMyn

Using DPCM means that what is quantized andtransmitted is only the differences between the inputand a prediction based on the previous locally-decoded output. • Note that the prediction cannot be based on previous source pictures, because the prediction has to be repeatable in the decoder (where the source pictures are not available). • Consequently, the coder contains a local decoder which reconstructs pictures exactly as they would be in the actual decoder. tMyn

The locally-decoded output then forms the input to the predictor. • In interframe prediction, samples from one frame are used in the prediction of samples in other ”reference” frames. tMyn

Many ”moving” images or image sequences consist of a static background with one or more moving foreground objects. • In this simplified case it is easy to see how some coding advantage can be gained. • Figure 2 shows part of two temporary adjacent images from a sequence. • Most of the image is unchanged from one time instant to the next, but a foreground object moves. • Suppose that the first image has been encoded using DCT and quantization. tMyn

previous frame t stationary background current frame x time t y Moving object shifted object displacement vector Prediction for the luminance signal S(x,y,t) within the moving object: Figure 2. Two adjacent frames, stationary backgroung, moving object. tMyn

That image has been transmitted and reconstructed at the decoder. • Now let’s look at the second image, but keep the first image in store at both the encoder and decoder. • This is referred to as the reference image. tMyn

We now treat the second image one block of pixels at a time, but before performing the DCT, we compare the block with the same block in the reference image. • If the block is part of the static background, it will be identical to the corresponding block in the reference image. • Instead of encoding this block, we can just tell the decoder to use the block from its copy of the reference image. tMyn

All we need is one special code, and we can avoid sending any data from the background blocks. • Where the pixel block in either image includes part of the moving foreground object, we will probably not find a match, so we can encode and transmit that block using DCT and quantization just as we did with the first image. • The benefit gained from this approach obviously depends on the picture content, but it is certainlypossible that in an image with a static backgroundhalf or perhaps three-quarters or more of the blockswill need no coding other than the ”same as previousimage” code. tMyn

The previous example is clearly a very special case. • Let’s consider an example where the camera pans slightly to one side between the two image, Figures 3 and 4. • Now if we try the test of comparing a block to the corresponding block in the previous image we will not get any matches. • However, we know that for most of the blocks the data exists in the previous image. • It is just not in quite the same place! tMyn

Figure 3. Camera panning, n:th frame. tMyn

Figure 4. Camera panning, (n+1):th frame. tMyn

We must send to the decoder the instruction to use data from the previous image, plus a motion vector to inform the decoder exactly where in the previous image to get the data. • It would be sensible to make this a relative measure, so that the motion vector would be zero for a static background. • The motion vector is a two-dimensional value, normally represented by a horizontal (x) component and a vertical (y) component. tMyn

The process of obtaining the motion vector is known as motion estimation. • Using the motion vector to eliminate or reduce the effects of motion is known as motion compensation. • Static backgrounds and moving backgrounds provide a simple visualization of how motion vectors may be used to identify correlation between images in a sequence. tMyn

Block based motion compensation uses blocks from a past frame to construct a replica of the current frame. • The past frame is a frame that has already been transmitted to the receiver. • For each block in the current frame a matching block is found in the past frame and if suitable, its motion vector is substituted for the block during transmission. • Depending on the search threshold some blocks will be transmitted in their entirety rather than substituted by motion vectors. tMyn

Block based motion compensated video compression takes place in a number of distinct stages. • Figure 5 illustrates how output from the earlier processes form the input to later processes. • Consequently choices made at early stages can have an impact of the effectiveness of later stages. tMyn

Past/Future Frame Current Frame Frame Segmentation Search Threshold Block Matching Motion Vector Correction Figure 5a. Flow of information through the motion compensation process. tMyn

Prediction Error Coding Vector Coding Block Coding Transmission Figure 5b. Flow of information through the motion compensation process. tMyn

Frame segmentation • The current frame of video to be compressed is divided into equal sized non-overlapping rectangular blocks. • Ideally the frame dimensions are multiples of the block size and square blocks are most common. • Block size affects the performance of compression techniques. • The larger the block size, the fewer the number of blocks, and hence fewer motion vectors need to be transmitted. tMyn

However, borders of moving objects do not normally coincide with the borders of blocks and so larger blocks require more correction data to be transmitted. • Small blocks result in a greater number of motion vectors, but each matching block is more likely to closely match its target and so less correction data is required. • If the block size is too small then the compression system will be very sensitive to noise. tMyn

Thus block size represents a trade off between minimising the number of motion vectors and maximising the quality of the matching blocks. • For architectural reasons block sizes of integer powers of 2 are preferred. • Both the MPEG 2 and H.261 video compression standards use blocks of 16*16 pixels. tMyn

Search Threshold • If the difference between the target block and the candidate block at the same position in the past frame is below some threshold then it is assumed that no motion has taken place and a zero vector is returned. • Most video codecs employ a threshold in order to determine if the computational effort of a search is warranted. tMyn

Block Matching • Block matching is the most time consuming part of the encoding process. • During block matching each target block of the current frame is compared with a past frame in order to find a matching block. • When the current frame is reconstructed by the receiver this matching block is used as a substitute for the block from the current frame. tMyn

Block matching takes place only on the luminance component of frames. • The colour components of the blocks are included when coding the frame but they are not usually used when evaluating the appropriateness of potential substitutes or candidate blocks. • The search can be carried out on all of the past frame, but is usually restricted to a smaller search area centered around the position of the target block in the current frame, Figure 6. tMyn

Target block Search area Current frame Past frame Figure 6. Corresponding blocks from a current and past frame, and the search area in the past frame. tMyn

This practice places an upper limit, known as the maximum displacement, on how far objects can move between frames, if they are to be coded effectively. • The maximum displacement is specified as the maximum number of pixels in the horizontal and vertical directions that a candidate block can be from the position of the target block in the original frame. • The quality of the match can often be improved by interpolating pixels in the search area, effectively increasing the resolution within the search area by allowing hypothetical candidate blocks with fractional displacements. tMyn

The search area need not be square. • Because motion is more likely in the horizontal direction than vertical, rectangular search areas are popular. tMyn

The problem is the lack of adequate temporal sampling. • If the temporal sampling obeyed Nyquist, we would have a very easy task in tracking an object from one sample to the next. • The temporal sampling of any common imaging system is much less than Nyquist. • This fact leads to a simple conclusion: given the position of an object in one image of the sequence, we have no idea where it will be in the next!! tMyn

A sharp edge must not move by more than one spatial sample between two temporal samples, there are 720 columns per line and 50 samples per second. • The fastest permissible motion for a sharp edge is that which travels from one side of the screen to the other in 720/50=14.4 seconds!!! (in order to obey Nyquist criterion). • At first this seems a terrible result. • It implies that all current motion imaging systems suffer from gross temporal aliasing. tMyn

If we aimed at tracking an object that traverses the screen in about half a second we would need to accommodate a displacement of about 50 pixels/image (720/(0,5*25)=57,6) in standard-definition television. • If we wish to predict for, say, three frames, that means a search range of about 150 pixels in each direction. • In real-world scenes, there is usually more or faster motion horizontally than vertically, and research has shown that for a given search area it is optimal for the width to be about twice the height. • This means a total search area of 300 pixels * 150 pixels. tMyn

Full-search block matching tests every possible block within a defined search range against the block it is desired to match. • The technique is accurate and exhaustive – if there is a match within the search range, this method will find it. • It is, however, computationally demanding. tMyn

Matching Criteria • In order for the compressed frame to look like the original, the substitute block must be as similar as possible to the one it replaces. • Thus a matching criterion, or distortion function, is used to quantify the similarity between the target block and candidate blocks. • If, due to a large search area, many candidate blocks are considered, then the matching criteria will be evaluated many times. tMyn

If the matching criterion is slow, then the block matching will be slow. • If the matching criterion results in bad matches then the quality of the compression will be adversely affected. • The mean absolute difference (MAD) is the most popular block matching criterion. • Corresponding pixels from each block are compared and their differences summed. • Blocks A and B are of size n*m. • A[p,q] is the value of the pixel in the p:th row and q:th column of block A. tMyn

The lower the MAD the better the match and so the candidate block with the minimum MAD should be chosen. • The function is alternatively called Mean Absolute Error (MAE). tMyn

The mean square difference function (MSD) is similar to the mean absolute difference function, except that the difference between pixels is squared before summation: • The mean square difference is more commonly called the Mean Square Error (MSE) and the lower this value the better the match. tMyn

The Pel Difference Classification (PDC) distortion function compares each pixel of the target block with its counterpart in the candidate block and classifies each pixel pair as either matching or not matching. • Pixels are matching if the difference between their values is less than some threshold and the greater the number of matching pixels the better the match. ord(e) evaluates to 1 if e is true and 0 if false. tMyn

Integral projections (IP) are calculated by summing the values of pixels from each column and each row of a block. • The most attractive feature of this criterion is that values calculated for a particular candidate block can be reused in calculating the integrals for overlapping candidate blocks. • This feature is of particular value during an exhaustive search, but less useful in the case of sub-optimal searches. tMyn

Sub-Optimal Block Matching Algorithms • The exhaustive search is computationally very intensive and requires the distortion function (matching criteria) to be evaluated many times for each target block to be matched. • Considerable research has gone into developing block matching algorithms that find suitable matches for target blocks but require fewer evaluations. • Such algorithms test only some of the candidate blocks from the search area and choose a match from this subset of blocks. tMyn

Hence they are known as sub-optimal algorithms. • Because they do not examine all of the candidate blocks, the choice of matching block might not be as good as that chosen by an exhaustive search. • The quality-cost trade-off is usually worthwhile however. tMyn

Signature based algorithms successfully reduce the complexity of block matching while preserving many of the advantages of the exhaustive search. • Signature based algorithms reduce the number of operations required to find a matching block by performing the search in a number of stages using several matching criteria. • During the first stage every candidate block in the search area is evaluated using a computationally simple matching criteria (e.g. pel difference classification). tMyn

Only the most promising candidate blocks are examined during the second stage when they are evaluated by a more selective matching criteria. • Signature based algorithms may have several stages and many different matching criteria. tMyn

Coarse quantization of vectors. • While signature based algorithms reduce complexity by minimising the complexity of the criteria applied to each block in the search space, it is also possible to reduce complexity by reducing the number of blocks to which the criterion is applied. These algorithms consider only a subset of the search space. • The decision on which candidate blocks to examine and which candidate blocks to ignore is never arbitrary. • Research indicates that humans cannot perceive fast moving objects with full resolution, which results in fast moving objects appearing blurred. tMyn

Motion compensated inter-frame prediction