Perceptual Video Distortion Metrics and Coding

Perceptual Video Distortion Metrics and Coding Dr H. R. Wu Associate Professor Audiovisual Information Processing and Digital Communications (AVIPAC) Monash University, VIC 3800, Australia TEL: +61 3 990-5 3255 or-5 3414 , FAX: +61 3 9905 5146 EMAIL: hrw,mdt@csse.monash.edu.au http://www.csse.monash.edu.au/~hrw H.R. Wu@ISEC.Stanford2K1,

Opening Remark “Cast a brick into the ring to attract jade.” or “Offer a few commonplace remarks by way of introduction so that others may come up with more valuable opinions and contributions.” -Chinese proverb. H.R. Wu@ISEC.Stanford2K1,

OUTLINE 1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective 2. Perceptual Video Quality/Impairment Metrics 2.1. HVS based quality/impairment metrics and measurements 2.2. A vision model based quality metric (VQR) 2.3. A vision model based perceptual blocking impairment metric (PBIM) 3. Perceptual image/video coding 4. Concluding Remarks H.R. Wu@ISEC.Stanford2K1,

1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective H.R. Wu@ISEC.Stanford2K1,

1.1. Half A Century’s Endeavour • Beginning of digital video coding research commonly acknowledged is 1950s [R.J.Clarke] • C.C. Cutler’s U.S. patent on DPCM, 1952. • C.W. Harrison’s experiments with linear prediction in television, 1952. • D.A. Huffman’s paper on a method for the construction of minimum redundancy codes, 1952. • Earlier pioneering work • C.E. Shannon’s monumental work on the mathematical theory of communication, 1948. • D. Gabor’s paper on theory of communication, 1946. • R.D. Kell’s British patent on the principle of frame difference signal transmission, 1920 [A. Seyler and Z. Budrikis]. H.R. Wu@ISEC.Stanford2K1,

1.2. Fundamental Issues 1.2.1. Rate-distortion optimization • Given a bit-rate budget or bandwidth (constant bit-rate), • minimize spatio-temporal (and cross scales) statistical redundancy and • psychovisual redundancy (or irrelevancy) • to achieve the best possible perceptual picture quality. • Given a desired picture quality (constant quality), • to obtain the lowest possible bit-rate or amount of data. 1.2.2. Theoretical and practical lower bounds • Theoretical lower bound for lossless image/video coding, Shannon’s entropy. • Theoretical lower bound for lossy image data coding, quantitative definition of “psychovisual redundancy”? H.R. Wu@ISEC.Stanford2K1,

1.3.1. En route to “superman” in image/ video coding-In Pursuit of Ultimate Goals • Model-based coding • Shape, context, region/object based coding techniques • Other coding methods-matching pursuit, fractal coding, POCS • coding of image data or transform coefficients, versus • coding of transforms or projections. • Picture restoration as an integral part of compression strategy • deblocking • deringing • deblurring • reduction of temporal granular noise, mosquito effects • HVS factors and perceptual coding • The works • Better transforms, in terms of • decorrelation and energy packing efficiencies, minimum MSE using truncated number of transform coefficients for recostruction • spatial, spatial-frequency, spatio-temporal-frequency localisation • Better prediction models, balancing • coding the model, and • coding of the prediction error or residuals. • Smarter/adaptive quantisation algorithms • Better motion prediction techniques • Fast algorithms and implementations • Rate-distortion optimisation • Entropy/variable length coding H.R. Wu@ISEC.Stanford2K1,

1.3.2. Achievements in image/video coding • Statistical redundancy and theoretical lower bound (Shannon’s entropy) for lossless image data coding; • Entropy/Practical variable length coding (Huffman code and arithmetic coding); • Modeling of natural image (first order Morkov process); • Optimal or sub-optimal transform coders • Karhunen-Loeve transform • Cosine transform • Vector quantisation • Subband and wavelet transform coding • Motion prediction and practical motion compensation algorithms; • Rate-distortion optimisation with MSE; • Standards: JPEG, ITU-T H.261, H.263, MPEG-1 & -2 & -4, JPEG2000. H.R. Wu@ISEC.Stanford2K1,

Traditional methods • constant quantisation step-size does not lead to constant perceptual picture quality, nor does constant PSNR/MSE • “Whose coder provides better visual performance?” • more elegant HVS based adaptive quantisation/rate control algorithms (avoiding bit stuffing or forced coarse quantisatiion) • spatial & temporal masking • different rate-distortion slopes for different coefficients • reduction of certain coding artifacts and manifesting other types • New methods • Model-, object-, region- or segmentation-based coding • effective and efficient segmentation algorithms • balance of bit allocations between model description and coding of residual image • Recursive or projection theory based coding (matching pursuit, fractal transforms and projection on to convex sets) • inferior rate-distortion performance with practical systems/applications • Perceptual image/video coders-if only we knew how • Constant quality coding, and above all... 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,

Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • “What’s wrong with mean-squared error?” [B. Girod] • something better than PSNR or MSE • various coding artifacts • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,

Problems of existing subjective test methods • The task: • categorical response, and • absolute judgement; • Human subjects are not accustomed to this task in their daily life; • The subjective data is noisy and vulnerable to biases; • People are good at comparative judgements; • For example,... H.R. Wu@ISEC.Stanford2K1,

Is this slightly annoying? Bit rates: 1.2, 1.4, 1.6, 1.8, 2.5 Mbps H.R. Wu@ISEC.Stanford2K1,

Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • something better than PSNR or MSE • various coding artifacts • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,

Contextual effects in the DSIS II[1] • Large contextual effects in the DSIS II [1] P. Corriveau et. al, ``All subjective scales are not created equal: The effects of context on different scales'', Signal Processing, Vol. 77 (1999) 1-9. H.R. Wu@ISEC.Stanford2K1,

Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • something better than PSNR or MSE • various coding artifacts... • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,

Various coding artifacts 1.3.3. Frustrations in image/video coding Blocking H.R. Wu@ISEC.Stanford2K1,

Various coding artifacts 1.3.3. Frustrations in image/video coding DCT basis image & Mosaic H.R. Wu@ISEC.Stanford2K1,

Various coding artifacts 1.3.3. Frustrations in image/video coding Ringing H.R. Wu@ISEC.Stanford2K1,

Various coding artifacts 1.3.3. Frustrations in image/video coding Sons&Daughter Frame 40 MPEG-1 coded Sons&Daughter Frame 41 MPEG-1 coded Stationary area temporal fluctuations H.R. Wu@ISEC.Stanford2K1,

Various coding artifacts 1.3.3. Frustrations in image/video coding Difference between Sons&Daughter frame 40 and 41 MPEG-1 coded Stationary area temporal fluctuations H.R. Wu@ISEC.Stanford2K1,

Quality/impairment assessment • subjective assessment (ITU-R BT.500-9) • subjectivity- “What do you regard as slightly annoying?” • large deviation, recency and contextual effects • time consuming and expensive • lack of constructive information for coder design • quantitative quality/impairment assessment • something better than PSNR or MSE • various coding artifacts... • different coding artifacts dominate at different coding rates and resolutions • “VQEG movement” • Video Quality Experts Group with delegations from ITU-T Study Groups 9 and 12 and ITU-R Study Group 11; • Validating objective measures of video quality; • Leading to one or more ITU Recommendations; • Results reported in a final report, Mar 2000… 1.3.3. Frustrations in image/video coding H.R. Wu@ISEC.Stanford2K1,

1.3.3. Frustrations in image/video coding VQEG work phase 1 • 1997 - 1999 • Subjective test • 8 independent labs; • 20 test sequences (50 Hz and 60 Hz) and 16 Hypothetical Reference Circuits (HRCs); • Method: ITU-R BT.500 DSCQS. • Objective test • 10 proponents. H.R. Wu@ISEC.Stanford2K1,

1.3.3. Frustrations in image/video coding VQEG proponents • [P0] Peak Signal to Noise Ratio (PSNR) • [P1] CPqD (Brazil) • [P2] Tektronics / Sarnoff (USA) • [P3] NHK (Japan)/ Mitsubishi Electric Corp. (Japan) • [P4] KDD (Japan) • [P5] Swiss Federal Institute of Technology (EPFL) (Switzerland) • [P6] TAPESTRIES (European Union) • [P7] NASA (USA) • [P8] KPN Research (The Netherlands) / Swisscom CIT (Switzerland) • [P9] NTIA/ITS (USA) H.R. Wu@ISEC.Stanford2K1,

1.3.3. Frustrations in image/video coding VQEG result (both 50 & 60 Hz) • Pearson correlations • P0: PSNR • P2: Sarnoff • P5: EPFL • P7: Watson • P8: KPN • VQEG Statement: 8 or 9 statistical indistinguishable models H.R. Wu@ISEC.Stanford2K1,

1.3.3. Frustrations in image/video coding VQEG result (60 Hz) • Pearson correlations • P0: PSNR • P2: Sarnoff • P5: EPFL • P7: Watson • P8: KPN • P5 (EPFL) the highest correlation in 60Hz test H.R. Wu@ISEC.Stanford2K1,

1.3.4. “Is digital video compression dead?” • Who asked the question?-Prof Edward J. Delp of Purdue University, in a keynote at VCIP 2000, Perth, Australia. • The answer?-No, of course. • A historical lessen in speech and audio coding • prior to 1988, focusing on low bit rate and MSE/SNR • source model and MSE-based coding; • weighted MSE. • PAC-perceptual audio coder, Bell Labs [1] • research in the area is unabated. • Not all methods extend well from 1-D to 2-D • weighted MSE has not worked as well as we would hope. [1] J.D. Johnston, “Transform coding of audio signals using perceptual noise criteria”, IEEE Journal on Selected Areas in Communications, 6(2), pp.314-323, 1988. H.R. Wu@ISEC.Stanford2K1,

1.4. A personal perspective • 1994 discussion with Prof Martin Vetterli • What’s the next move, VQ, Wavelet Transform or…? • Mixed channel and source coding; • HVS based quantitative quality metrics. • 1996’s visit to Prof Tsuhan Chen at AT&T Research • talking about BIM; • question raised if BIM would work on video; • blockiness under cell-loss conditions? • Dr Christian ven dan Lambrecht’s (EPFL, HP Labs, EMC) workshop on his MPQM and NVFM at Monash University in 1996 • multichannel vision models! • software implementations; • threshold vision models for supra-threshold vision applications. H.R. Wu@ISEC.Stanford2K1,

1.4. A personal perspective • 1997 discussion with Prof Bernd Girod on GBIM and evaluation of s-hat, MPQM and NVFM in Erlangen and PCS’97 in Berlin • inability to reliably assess digital video quality or degree of impairment; • inability to quantitatively define “psychovisual redundancy”. • 1999 discussion with Prof Bernd Girod at Monash • vision model plus • parameterization and maybe more… • 1999 discussions with Profs Brian Wandell and David Heeger • vision models so far based on very “sparse data”(!!!) • Discussions with Dr Jeff Lubin and Albert Pica at Sarnoff • various models used in quality metric design. H.R. Wu@ISEC.Stanford2K1,

1.4. A personal perspective • (1999) Generous supports from Stephen Wolf and his colleagues at NTIA/ITS and VQEG co-chairs Philip Corriveau and Arthur Webster • VQEG subjective test data; • VQEG test sequence; • Performance criteria. • 2000 VQEG final report • “8 or 9 statistical indistinguishable models” • including PSNR! • Forced to modified our approach • multichannel vision model; • parameterization using application specific data, i.e., VQEG subjective test data. • Bernd was right in 1999 after all. H.R. Wu@ISEC.Stanford2K1,

OUTLINE 1. Introduction 1.1. Half a century’s endeavour 1.2. Fundamental issues 1.3. In pursuit of ultimate goals in digital video compression and visual communications 1.4. A personal perspective 2. Perceptual Video Quality/Impairment Metrics 2.1. HVS based quality/impairment metrics and measurements 2.2. A vision model based quality metric (VQR) 2.3. A vision model based perceptual blocking impairment metric (PBIM) 3. Perceptual image/video coding 4. Concluding Remarks H.R. Wu@ISEC.Stanford2K1,

2. Perceptual Video Quality/Impairment Metrics “When you can measure what you are speaking about and express it in numbers, you know something about it.” -Lord William Thomson Kelvin, (1824-1907) Physicist 2.1. HVS based quality/impairment metrics and measurements 2.2. A vision model based quality metric (VQR) 2.3. A vision model based perceptual blocking impairment metric (PBIM) H.R. Wu@ISEC.Stanford2K1,

2.1. HVS based quality/impairment metrics and measurements for digital video 2.1.1. Introduction to HVS based metrics • Why they are required? • What are they?-quality metrics v.s. impairment metrics • HVS modelling. • How to measure the metrics performance? 2.1.2. Previous work in the area 2.1.3. Latest development H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Applications and standards • Digital video compression techniques are widely used in • Digital TV; • Video conferencing; • Video phone; • Internet video; • VCD, DVD, etc. • International video coding standards • H.261/H.263/MPEG-1/2/4. H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics • Distortions introduced by digital video coding algorithms • fundamentally differ from analog video distortions • Structured distortions. • Various types of distortions • Blocking; • Blurring; • Ringing; • Mosquito; • Jerkiness; • etc. H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Quantitative quality/impairment assessment and metrics are required to • measure/monitor video coding/transmission system performance; • provide a better understanding of the distortions introduced by the video coding system and to improve coding algorithms, such as • reliable HVS-based adaptive quantisation, and • bit rate control algorithms; • design perceptual digital video codec providing constant video quality for visual communication services; and • quantitatively define “psychovisual redundancy” and corresponding lower bound, if possible. H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Assessment methods • Subjective • The quality is evaluated by a group of assessors subjectively; • Very expensive and time-consuming; • Defined in the ITU-R BT.500. • Objective • Given a processed video sequence with or without a reference sequence, a computer program or system will evaluate the quality or the impairment of the processed sequence with an objective score. H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Vision modelling • Vision research is experimental science. • Relies on experiments to reveal mechanisms. • Categorised by test methods: • Detection vs discrimination; • Threshold vs suprathreshold. H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Mechanisms of vision • Colour encoding; • Pattern sensitivity • Spatial contrast sensitivity; • Temporal sensitivity. • Multiresolution image representations; • Masking • Spatial • intra-band; • inter-band; • inter-orientation; • temporal; and • colour masking. H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Colour encoding • Opponent-colours: B-W, R-G and B-Y • Othe colour spaces:YCbCr, CIE L*u*v*, CIE L*a*b*, and CIE XYZ H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Contrast sensitivity - definition • Contrast threshold is the necessary contrast to elicit/get a response; • Contrast sensitivity is defined as the inverse of contrast threshold. H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Spatio-temporal contrast sensitivity function H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Space-time separability • The sensitivity scaling hypothesis H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Temporal channels • One sustained (low-pass) and one transient (band-pass) temporal channel H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Spatial contrast sensitivity H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Multiresolution image representation • 5 frequency levels and 4 orientations in this example H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Implementation by the steerable pyramid • 6 orientations in this example H.R. Wu@ISEC.Stanford2K1,

2.1.1. Introduction to HVS based metrics Steerable Pyramid decomposition 4 orientation and 5/6 spatial frequency levels H.R. Wu@ISEC.Stanford2K1,

Perceptual Video Distortion Metrics and Coding