Créer une présentation
Télécharger la présentation

Télécharger la présentation
## R ATE D ISTORTION O PTIMIZATION U SING S SIM IN H .264

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**RATE DISTORTION OPTIMIZATION USING SSIM IN H.264**MULTIMEDIA PROCESSING Babu Hemanth Kumar Aswathappa babuhemanthkumar.aswathappa@mavs.uta.edu Guidance Dr. K.R.Rao**Introduction**• In the rate-distortion optimization for H.264 I-frame encoder, the distortion (D) is measured as the sum of the squared differences between the reconstructed and the original blocks, which is MSE. • Although PSNR and MSE are currently the most widely used objective metrics due to their low complexity and clear physical meaning, they were also widely criticized for not correlating well with Human Visual System (HVS) [2] for a long time. • The study from previous literature shows that structural similarity metric provides better image assessment than pixel error based metric (mean square error and peak signal-to-noise ratio).**Mean Squared Error: Love It or Leave It?**• So what is the secret of the MSE—why is it still so popular? • What is wrong with the MSE when it does not work well? • Just how wrong is the MSE in these cases? • If not the MSE, what else can be used?**What is MSE?**• MSE is a signal fidelity measure. • The goal of a signal fidelity measure is to compare two signals by • providing a quantitative score that describes the degree of similarity/ • fidelity or, conversely, the level of error/distortion between them. • Suppose that x = {xi |i = 1, 2, · · · , N} and y = {yi |i =1, 2, · · · , N} • are two finite-length, discrete signals , where N is the number of • signal samples and xi and yi are the values of the i th samples in x • and y, respectively. The MSE between the signals is**Why do we love MSE?**The MSE has many attractive features: • It is simple. It is parameter free and inexpensive to compute, with a complexity of only one multiply and two additions per sample. It is also memoryless—the squared error can be evaluated at each sample, independent of other samples. • It has a clear physical meaning—it is the natural way to define the energy of the error signal • The MSE is an excellent metric in the context of optimization • MSE is widely used simply because it is a convention**What is wrong with MSE?**[FIG1] Comparison of image fidelity measures for “Einstein” image altered with different types of distortions. (a)Reference image. (b) Mean contrast stretch. (c) Luminance shift. (d) Gaussian noise contamination. (e) Impulsive noise contamination. (f) JPEGcompression. (g) Blurring. (h) Spatial scaling (zooming out). (i) Spatial shift (to the right). (j) Spatial shift (to the left). (k) Rotation(counter-clockwise). (l) Rotation (clockwise).[2]**Implicit Assumptions when using MSE**• Signal fidelity is independent of temporal or spatial relationships between the samples of the original signal. If the original and distorted signals are randomly re-ordered in the same way, then the MSE between them will be unchanged. • Signal fidelity is independent of any relationship between the original signal and the error signal. For a given error signal, the MSE remains unchanged, regardless of which original signal it is added to. • Signal fidelity is independent of the signs of the error signal samples. • All signal samples are equally important to signal fidelity.**Failures of MSE Metric**[FIG2] Failures of MSE Metric [2]**Alternative Approach**[FIG3] Examples of structural versus nonstructural distortions.[2] If we view the HVS as an ideal information extractor that seeks to identify and recognize objects in the visual scene, then it must be highly sensitive to the structural distortions and automatically compensates for the nonstructural distortions. Consequently, an effective objective signal fidelity measure should simulate this functionality**SSIM**• Recent proposed approach for image quality assessment • Method for measuring the similarity between two images. Full reference metrics • The SSIM is designed to improve on traditional metrics like PSNR and MSE, which have proved to be inconsistent with human eye perception.**Property of SSIM**• Value lies between [0,1] • Symmetry: S(x,y) = S(y,x) • Boundedness: S(x,y) <= 1 • Unique maximum: S(x,y) = 1 if and only if x=y (in discrete representations xi = yi, for all i= 1,2…….,N ).**SSIM Measurement System**[FIG4] Block Diagram of Structural Similarity measurement system[4]**H.264**[FIG 5] Block Diagram of H.264 encoder**Intra-prediction**[FIG 6]. Intra 4 x 4 prediction mode directions (vertical : 0, horizontal : 1, DC : 2, diagonal down left : 3, diagonal down right : 4, vertical right : 5, horizontal down : 6, vertical left : 7, horizontal up : 8)[5] • H.264 is able to gain much of its efficiency by simplifying redundant data not only across a series of frames, but also within a single frame, a technique called intraframe prediction [FIG 6]. • The H.264 encoder uses intraframe prediction with more ways to reference neighboring pixels, so it compresses details and gradients better than previous codecs.**H.264 I-Frame Encoder**• The best prediction mode(s) are chosen utilizing the R-D optimization which is described as: J (s ,c,MODE | QP) = D(s , c,MODE | QP) + MODE * R(s,c ,MODE | QP) • Distortion D(s,c,MODE|QP) is measured as SSD between the original block s and the reconstructed block c, and QP is the quantization parameter, MODE is the prediction mode. R(s,c,MODE|QP) is the number of bits coding the block. • The modes(s) with the minimum J(s,c,MODE|QP) are chosen as the prediction mode(s) of the macroblock.**Proposal**• The main idea of this project is to employ SSIM in the rate-distortion optimizations of H.264 I-frame encoder to choose the best prediction mode(s). • The required modifications will be done on the JVT reference software JM92 program. • Results in terms of total number of bits of the compressed image, SSIM of the whole reconstructed image for H.264-JM92 software and the new method will be compared.**Proposal Method**• The quality of the reconstructed picture is higher when its SSIM index is greater while the SSD performs the other way. Therefore the distortion in this method is measured as: D (s, c, MODE|QP)== 1−SSIM(s, c) s and c are the original and reconstructed image block resp. The new-Rate Distortion can now be written as : J (s , c,MODE|QP) = 1 - SSIM(s , c) + MODE * R( s, c,MODE |QP) • The algorithm uses SSIM index instead of SSD as the distortion measure in RDCost_for_4x4IntraBlock, RDCost_for_8x8IntraBlock and RDCost_for_macroblocks of H.264-JM92 software.**Test Sequences**Coastguard Akiyo Bridge-close Car phone Claire Container Grandma Miss-America**Simulation Results**[TABLE 1]Results of comparison between H.264 JM92 and H.264 JM92-SSIM method for QP=30**Simulation Results**[TABLE 2]Results of comparison between H.264 JM92 and H.264 JM92-SSIM method for QP=20**Simulation Results**[TABLE 3] Results of comparison between H.264 JM92 and H.264 JM92-SSIM method for QP=10**Simulation Results**Encoded by H.264 encoder with QP=30 Coastguard(Original ) Encoded by H.264 SSIM encoder with QP=30 MSSIM = 0.90197 MSSIM = 0.89390 Fig 7. The reconstructed image produced by the two methods respectively for Coastguard**Simulation Results**Encoded by H.264 encoder with QP=30 Encoded by H.264 SSIM encoder with QP=30 Akiyo(Original ) MSSIM = 0.96416 MSSIM = 0.96067 Fig 8. The reconstructed image produced by the two methods respectively for Akiyo**Simulation Results**Encoded by H.264 encoder with QP=30 Encoded by H.264 SSIM encoder with QP=30 Suzie(Original ) MSSIM = 0.93649 MSSIM = 0.93370 Fig 9. The reconstructed image produced by the two methods respectively for Suzie**Conclusions**Simulations show that the proposed method can reduce approximately 2~9% bit rate while maintaining almost the same perceptual quality and costing almost the same encoding time for QP=30, 4 ~20% bit rate reduction for QP=20, 18~35% bit rate reduction for QP=10.**References**• [1] Zhi-Yi Mai, Chun-Ling Yang, Lai-Man Po, and Sheng-Li Xie “A New-Rate Distortion Optimization using Structural Information in H.264 I-Frame Encoder” ACIVS 2005, LNCS 3708, pp. 435–441, 2005. • [2] Z. Wang and A. C. Bovik,“Mean squared error: love it or leave it? - A new look at signal fidelity measures,” IEEE Signal Processing Magazine, vol. 26, no. 1, pp. 98-117, Jan. 2009. • [3] JM Software website: http://iphome.hhi.de/suehring/tml/ • [4] Z. Wang, et al., “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004 [Online] Available: www.cns.nyu.edu/~lcv/ssim/ • [5] S.K. Kwon, A. Tamhankar and K.R. Rao “Overview of H.264 / MPEG-4 Part 10” J. VCIR, Vol. 17, pp. 186-216, April 2006, Special Issue on "Emerging H.264/AVC Video Coding Standard," • [6] T. Wiegand and B. Girod, “Lagrange multiplier selection in hybrid video coder control,” in IEEE Int. Conf. on Image Processing, vol.3, pp. 542–545, 2001**REFERENCES**• [7] T. Wiegand , G. J Sullivan., G .Bjontegaard., and A Luthra., “Overview of the H.264/AVC Video coding Standard,” IEEE Trans. on CAS for Video Technology, no.7, Vol. 13, pp.560-576, July 2003. • [8] Z. Wang, A. C. Bovik and L. Lu, “Why is image quality assessment so difficult?” IEEE International Conference on Acoustics, Speech, & Signal Processing, May 2002. • [9] Z. Wang, L. Lu and A. C. Bovik “ Video quality assessment using structural distortion measurement” IEEE transactions on image processing ,vol.13, no 4, April 2004 • [10] G. J. Sullivan and T. Wiegand., “Rate-distortion dptimization for video compression”, IEEE Signal Processing Magazine, vol. 15, no. 6, pp. 74-90, Nov. 1999 • [11]The SSIM Index for Image Quality Assessmenthttp://www.ece.uwaterloo.ca/~z70wang/research/ssim/ • [12] Z. Wang, and A. C. Bovik, “A universal image quality index,” IEEE Signal Processing Letters, vol. 9, no. 3, pp. 81-84, March 2002.