Image and Video Quality Assessment Objective Image Quality Metrics

Image and Video Quality Assessment Objective Image Quality Metrics Dr. David Corrigan

Outline • Motivation • Subjective Image and Video Quality Assessment • Test Methodologies • Benchmarking objective Metrics • Objective IQA Metrics • Metrics based on models of the HVS • Metrics based on structural distortion • Objective VQA Metrics • Metrics based on HVS • “Feature-based” metrics • “Motion-based” metrics Covered in this Presentation

Image Quality Metrics Metrics based on the HVS

Key References • The Essential Guide to Image Processing. A. Bovik, Academic Press, 2009. ISBN: 978-0-12-374457-9 • A. B. Watson, “DCTune: A technique for visual optimization of DCT quantization matrices for individual images,” Soc. Inf. Display Dig. Tech. Papers, vol. XXIV, pp. 946–949, 1993. • Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004. • H.R. Sheikh, M.F. Sabir and A.C. Bovik, "A statistical evaluation of recent full reference image quality assessment algorithms", IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, Nov. 2006. • Z. Wang, E. P. Simoncelli and A. C. Bovik, “Multi-scale structural similarity for image quality assessment,” Invited Paper, IEEE Asilomar Conference on Signals, Systems and Computers, Nov. 2003.

Metrics Based on the HVS • These metrics apply mathematical models of certain stages of the HVS to determine the thresholds of visibility of distortions. • Used to normalise error (eg. MSE) between reference and distorted/test images. • Aspects of the HVS • Spatial Frequency Sensitivity • Luminance Masking • Contrast Masking • Error Pooling

Spatial Frequency Sensitivity (Horizontal) Sensitivity to Contrast is a function of frequency This relationship varies with orientation but maintains similar characteristics Grating increases in freq. Left to Right Intensity decreases vertically. Sensitivity is given by the perceived height of the columns.

Luminance Masking Distortions will be more visible at low luminance levels • Given a background Intensity I, the user is asked to increase the intensity of the circle until it is barely visible. • This experiment demonstrates a phenomenon known as Weber’s Law just noticeable difference in intensity

Contrast Masking • Distortions are less visible if they are located in areas of an image having similar frequency composition A 100 x 100 block of noise has been added to each image at two locations. Due to contrast masking the noise is less noticeable in the right image.

Error Pooling • Concerns how local distortions can affect overall perception of quality • Can refer to • Spatial Pooling • Colour Pooling • Temporal Pooling (for Video) • Spatial Pooling is affected by eye fixation. • Usually distortions in luminance are more noticeable than colour • Defects in video are less noticeable if they only last for one frame.

Anatomy of a HVS-based Metric • Preprocessing • Registration • Reference and Distorted images must be spatially aligned. Small displacements can result in lower objective scores while the subjective score remains unaffected. • Calibration • Calibration of viewing distance, physical resolution (dpi). • Some metrics measure frequency in physical units (ie. Cycles/degree) • Calibration of monitors • HVS can only see what the screen reproduces. • Some metrics require physical luminance values as input.

2. Spatial Frequency Sensitivity • Most metrics use a space-frequency decomposition to break down a signal into a number of bands (or subbands) according to a different partitions of spatial frequency. Horizontal Frequency Vertical Frequency DWT DCT Cortex Transform

Exploiting Subbands in the DCT • The Q matrix tell us how sensitive the HVS is to each band Each element is the relative visibility of coefficient differences for the corresponding subbands A Perceptual Error, , can de defined as - 8x8 block from the reference image – 8x8 block from the test image – the DCT of the ref. image block – the DCT of the test image block is a measure of the normalised distortionin the 8 x 8 block of pixels.

A Simple HVS Metric – DCTune (Watson) • Metric for Grayscale Images. • Possible for colour images too • Includes models for luminance and contrast masking as well as spatial frequency sensitivity • Notation • horizontal frequency index • vertical frequency index • horiz. & vert. block index (ie. 8x8 pixel blocks) • – subband i,j of the ref. image • – subband i,j of the test image • - the visibility threshold of band taken from • – the tuned visibility threshold of band for block

Luminance Masking (DCTune) • The value of the visibility threshold for each block is scaled according to the average brightness of that block in the reference image Watson suggests Average dc coeff for all blocks • For bright blocks (x>1) • Scale Factor > 1. • Threshold increases. • Sensitivity to distortion decreases x0.649 • For bright blocks (x<1) • Scale Factor < 1. • Threshold decreases. • Sensitivity to distortion increases x

Contrast Masking (DCTune) • Increases visibility threshold of each band if the DCT coefficient of the reference image is high 0.7 suggested large implies contrast masking Threshold increases and sensitivity decreases is small and threshold is unchanged

Error Pooling (DCTune) • We can calculate perceptual error for each DCT coefficient in each block • Many HVS-based metrics use the Minkowski distance to pool errors across the image. • Spatial Pooling • Frequency Pooling or (ie. )

Other HVS-Based Metrics • Sarnoff JND • Cortex-like Transform based on a Laplacian Pyramid (Similar to DWT) • Visual Differences Predictor (VDP) • Cortex Transform • Visual SNR (VSNR) • Does not use contrast sensitivity function to weight error • Discrete Wavelet Transform • Tao and Heeger method • Steerable Pyramid • Perceptual Image Coder • GQMF Subband Transform • See “Essentials in Image Processing” for more details.References are at the end of the handout

Performance of Test on LIVE IQA data DCTune fairs worse than even PSNR!! Results are taken from both the book “Essentials in Image Processing” and H.R. Sheikh, M.F. Sabir and A.C. Bovik, "A statistical evaluation of recent full reference image quality assessment algorithms", IEEE Transactions on Image Processing, vol. 15, no. 11, pp. 3440-3451, Nov. 2006.

Problems with HVS-based Metrics • Quality Definition Problem • Just because the distortion is visible doesn’t mean it is objectionable (eg. For linear contrast scaling) • Natural Image Complexity Problem • Visibility thresholds based on the visibility of simple patterns (stimuli) rather than real images which are much more complex • Suprathreshold Distortion Problem • Distortions much greater than the limit of visibility. • Assumption is made that visibility thresholds can be used to normalise distortions much greater than those in the near-visible range. “the HVS is a complex and highly nonlinear system, but most models of early vision are based on linear or quasilinear operators that have been characterized using restricted and simplistic stimuli”

Image Quality Metrics Metrics based on Structural Similarity

Original Mean Luminance Increased Distortion of Structure Contrast Increased

Structural Similarity Index Measure (SSIM) • Philosophy • Image Quality is associated with structural similarity. • Luminance and contrast distortions are not as important. • HVS-based methods perceived differences between images and not perceived changes in structure. • A “Top Down” Approach • SSIM attempts to capture the overall behaviour of the HVS. • HVS-based approaches model individual aspects of the HVS in the hope that they will be able to replicate the overall behaviour of the HVS. • A “bottom up” approach

The SSIM Index • Structural Similarity at a pixel location between the images is calculated in a small patch or window centred at that pixel. • A sliding window approach is used to measure the SSIM index for each pixel in the image. • Choice of window size is important. • Depends on resolution and the scale of the distortion. No SSIM index is calculated for pixels near the boundary whose window is partially off the edge of the image

Notation • - the intensity of the reference image at position • - the intensity of the test image at position • - the window on which the SSIM index is being calculated. • and - the luminances of the source and test windows where • and - the contrasts of the source and test windows where

The SSIM Index • Similarity Measure Criteria • Symmetry - . • Boundedness - . • Unique Maximum - , iff Reference Image Window Test Image Window

The SSIM Index • We define as Where • , and are measures based on luminance, contrast and structure. • is a function combining the 3 scores. • Luminance is a constant designed to prevent numerical instability.

The SSIM Index • Contrast • Structure – based on the correlation coefficient between and Where

The SSIM Index • We choose the following combination to satisfy the 3 criteria • Notes • , and • Both and are bounded between 0 and 1. • is bounded between -1 and 1. • If <0 then

The SSIM Index • Notes • The authors define and as • is the dynamic range of the pixel values • and are small constants ( (Suggest vals. 0.01 and 0.03) • If and , then is

The SSIM Index Reference Salt and Pepper Noise 1 = white 0 = black <0 = black SSIM Index Image 11 x 11 window

The SSIM Index • Weighted Windows can be used to avoid blocking artefacts in the SSIM index image. • Define weight field , such that . • Paper suggests 11x11 circularly symmetric Gaussian window with standard deviation of 1.5. • Then the means, variance and covariance becomes

The SSIM Index Salt and Pepper Noise Reference 11 x 11 rectangular window 11 x 11 Gaussian window (std dev1.5)

Examples 1 = white 0 = black <0 = black

SSIM Index v. Absolute Difference Large absolute diff = black Small absolute diff = white 1 = white 0 = black <0 = black

Using SSIM to Measure Quality • Spatial Pooling using the mean of the SSIM index map • Where and are the windows in the reference and distorted images. • Notes • The MSSIM can range between -1 and 1 although usually > 0 • 1 indicates that the ref. and test images are the same. • 0 is very bad quality • Negative values occur due to negative values in structure correlation • Eg. Comparing an image with its negative.

Benchmarking MSSIM Ref. Image MSSIM = 0.913 MSSIM = 1 MSSIM = 0.988 MSSIM = 0.662 MSSIM = 0.840 MSSIM = 0.694 Each Distorted Image has roughly the same MSE when compared with the reference image (top left)

Benchmarking MSSIM • Taken from Wang ‘03 paper on SSIM • Comparison on JPEG and JPEG2000 compressed images

Limitations of SSIM • Dependence on Scale • Quality value measured at a fixed scale • A function of image resolution (in pixels) and the window size • Physical scale a function of viewing distance, display size and resolution.

Limitations of SSIM • Sensitivity to Displacements of the Image • Small offsets in the image dramatically affect MSSIM values Small Clockwise Rotation SSIM = 0.551 Reference Small Translation to the Right SSIM = 0.404

Extensions of SSIM • Multiscale SSIM • Designed to calculate SSIM over multiple resolutions and thus avoid scale issues • Z. Wang, E. P. Simoncelli and A. C. Bovik, “Multi-scale structural similarity for image quality assessment,” Invited Paper, IEEE Asilomar Conference on Signals, Systems and Computers, Nov. 2003. • Complex Wavelet SSIM • Calculates SSIM value based on complex wavelet coefficients rather than intensities. • Z. Wang and E. P. Simoncelli, “Translation insensitive image similarity in complex wavelet domain,” IEEE International Conference on Acoustics, Speech and Signal Processing, vol. II, pp. 573-576, Philadelphia, PA, Mar. 2005.

Assessment on of IQMs LIVE Database

Assessment of IQMS on LIVE Database Statistical Significance Testing • 0 – IQM for row is worse than IQM for column • 1 – IQM for row is better than IQM for column • x – no statistically significant distance

Image and Video Quality Assessment Objective Image Quality Metrics