Structural similarity

The index of structural similarity (English structural similarity , SSIM) is a method for estimating the perceived quality of digital television and feature images and other types of digital photos.

SSIM is used to measure the similarity between two images. The SSIM index is a fully reference metric; In other words: the measurement or estimation of the image quality is based on an uncompressed or interference-free original image as a reference. SSIM is designed to be an improvement over conventional methods such as peak signal-to-noise ratio (English peak signal-to-noise ratio , PSNR) and mean square deviation (English mean error squared , MSE) has to offer, which is little agreement visual with human Proved perception. In the meantime, much more powerful processes are available (for example PSNR-HVS-M and VQM_VFD ).

history

The first version of SSIM called the Universal Quality Index (UQI) or Wang Bovik Index was developed in 2001 by Zhou Wang and Alan Bovik in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin . It was then modified in collaboration with Hamid Sheikh and Eero Simoncelli from New York University to the current version of SSIM (today there are many variations) and published in a printed scientific work entitled Image quality assessment: From error visibility to structural similarity , which appeared in April 2004 in the IEEE Transactions on Image Processing.

According to Google Scholar , the 2004 SSIM publication was cited over 10,000 times, making it one of the most cited works of all time in image processing and video technology. It received the Best Paper Award of 2009 from the IEEE Signal Processing Society. In 2015 the inventors of SSIM each received a Primetime Engineering Emmy Award.

After its first publication in 2002, SSIM and its variants marked the state of the art for a time in the automated estimation of human quality perception. Since 2007, the PSNR-HVS-M metric, which is based on the peak signal-to-noise ratio (PSNR) and expanded to include contrast perception and masking criteria, has been available that performs significantly better in comparisons with human subjects.

Structural similarity

The difference in relation to older techniques mentioned, such as MSE or PSNR, is that these approaches estimate absolute errors , while SSIM, on the other hand, represents a perceptual-based model, which regards image error increase as a perceived change in the structural information , whereby important perceptual psychological phenomena are also included, including terms for Brightness masking and contrast masking. Structural information is the concept that the values of particularly spatially close pixels show strong similarities. These dependencies carry important information about the structure of the object in the picture scene. Brightness masking is a phenomenon that tends to make image disturbances (in this context) appear less noticeable in light areas of the image, while contrast masking is a phenomenon that makes disturbances in image areas with significant activity or structure appear less noticeable.

algorithm

The SSIM index is calculated using different parts of the image ("windows"). The difference between two windows and of the same size N × N is: ${\ displaystyle x}$ ${\ displaystyle y}$

{\ displaystyle {\ hbox {SSIM}} (x, y) = {\ frac {(2 \ mu _ {x} \ mu _ {y} + c_ {1}) (2 \ sigma _ {xy} + c_ {2})} {(\ mu _ {x} ^ {2} + \ mu _ {y} ^ {2} + c_ {1}) (\ sigma _ {x} ^ {2} + \ sigma _ { y} ^ {2} + c_ {2})}}}

With

${\ displaystyle \ mu _ {x}}$ the mean of ; ${\ displaystyle x}$
${\ displaystyle \ mu _ {y}}$ the mean of ; ${\ displaystyle y}$
${\ displaystyle \ sigma _ {x} ^ {2}}$ the variance of ; ${\ displaystyle x}$
${\ displaystyle \ sigma _ {y} ^ {2}}$ the variance of ; ${\ displaystyle y}$
${\ displaystyle \ sigma _ {xy}}$ the covariance of and ; ${\ displaystyle x}$ ${\ displaystyle y}$
${\ displaystyle c_ {1} = (k_ {1} L) ^ {2}}$ , two variables to stabilize the division with small denominators; ${\ displaystyle c_ {2} = (k_ {2} L) ^ {2}}$
${\ displaystyle L}$ the dynamic range of the pixel values (typically this is ); ${\ displaystyle 2 ^ {\ # bits \ per \ pixel} -1}$
${\ displaystyle k_ {1} = 0 {,} 01}$ and . ${\ displaystyle k_ {2} = 0 {,} 03}$

To assess the image quality, this formula is usually only applied to the brightness component , although it can also be applied to color values (for example RGB ) or chrominance values (for example YCbCr ). The resulting SSIM index is a decimal value between −1 and 1 and the value 1 can only be reached in the case of two identical data records. Typically it is calculated on window sizes of 8 × 8 pixels. The window can be moved point by point across the image, but the authors recommend only using a subset of the possible windows in order to reduce the complexity of the calculation.

variants

Multi-scale SSIM

A more advanced form of SSIM, multi-scale SSIM, is performed across multiple scales in a multi-level sampling decrease process reminiscent of multi-scale processing in the early visual system. The performance of both SSIM and multiscale SSIM at the time was very high in terms of compliance with human judgment (as measured by widely used public image quality databases including the LIVE Image Quality Database and the TID database).

Structural dissimilarity

(structural dissimilarity, DSSIM) is a distance metric derived from SSIM (whereby the triangle inequality is not necessarily satisfied).

{\ displaystyle {\ hbox {DSSIM}} (x, y) = {\ frac {1 - {\ hbox {SSIM}} (x, y)} {2}}}

Video quality metrics

The original version of SSIM was designed for assessing the quality of still images. It does not contain any parameters that are directly related to temporal aspects of human perception and judgment. However, some variants of SSIM have been developed that take temporal phenomena into account.

A simple application of SSIM to assess video quality would be to calculate the average SSIM value for all individual images in the video sequence.

Discussion of performance

A publication by Dosselmann and Yang suggests that SSIM is not as accurate as is claimed. They claim that SSIM gives values that do not agree better with human assessment than MSE values (mean square deviation).

They question the perceptual psychological basis of SSIM by claiming that the formula does not contain any detailed model of visual perception and that SSIM may be based on perceptual calculations. For example, the human visual system does not calculate a product between the average values of the two images.

However, as shown in the original work from 2004, the SSIM model and algorithm encompass models of central elements of the perception of image disturbances, including the mechanisms of brightness masking and contrast masking.

Web links

Individual evidence

↑ ^a ^b Nikolay Ponomarenko, Flavia Silvestri, Karen Egiazarian, Marco Carli, Jaakko Astola, Vladimir Lukin: On between-coefficient contrast masking of DCT basis functions collective work = CD-ROM Proceedings of the Third International Workshop on Video Processing and Quality Metrics for Consumer Electronics VPQM-07, Jan. 25-26 January 2007 . Scottsdale AZ 2007 ( ponomarenko.info [PDF]).
↑ Stephen Wolf, Margaret H. Pinson: Video Quality Model for Variable Frame Delay (VQM_VFD) , US Department of Commerce, National Telecommunications and Information Administration, Boulder, Colorado, USA, Technology Memo TM-11-482, September 2011.
^ Laboratory for Image and Video Engineering.
↑ Zhou Wang, AC Bovik, HR Sheikh, EP Simoncelli: Image quality assessment: from error visibility to structural similarity . In: IEEE Transactions on Image Processing . tape 13 , no. 4 , April 2004, ISSN 1057-7149 , p. 600-612 , doi : 10.1109 / TIP.2003.819861 (English).
↑ Best Paper Award. Signal Processing Society
^ IEEE Signal Processing Society, Best Paper Award .
^ Z. Wang, EP Simoncelli, AC Bovik: Multiscale structural similarity for image quality assessment . In: Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2004 . tape 2 , November 2003, p. 1398-1402 , doi : 10.1109 / ACSSC.2003.1292216 (English).
↑ LIVE Image Quality Database.
↑ Richard Dosselmann, Xue Dong Yang: A comprehensive assessment of the structural similarity index . In: Signal, Image and Video Processing . tape 5 , no. 1 , November 6, 2009, ISSN 1863-1703 , p. 81–91 , doi : 10.1007 / s11760-009-0144-1 (English).

[PSNR-HVS-M-1] Nikolay Ponomarenko, Flavia Silvestri, Karen Egiazarian, Marco Carli, Jaakko Astola, Vladimir Lukin: On between-coefficient contrast masking of DCT basis functions collective work = CD-ROM Proceedings of the Third International Workshop on Video Processing and Quality Metrics for Consumer Electronics VPQM-07, Jan. 25-26 January 2007 . Scottsdale AZ 2007 ( ponomarenko.info [PDF]).

[VQM_VFD-2] Stephen Wolf, Margaret H. Pinson: Video Quality Model for Variable Frame Delay (VQM_VFD) , US Department of Commerce, National Telecommunications and Information Administration, Boulder, Colorado, USA, Technology Memo TM-11-482, September 2011.

[3] Laboratory for Image and Video Engineering.

[SSIM-4] Zhou Wang, AC Bovik, HR Sheikh, EP Simoncelli: Image quality assessment: from error visibility to structural similarity . In: IEEE Transactions on Image Processing . tape 13 , no. 4 , April 2004, ISSN 1057-7149 , p. 600-612 , doi : 10.1109 / TIP.2003.819861 (English).

[5] Best Paper Award. Signal Processing Society

[Best_Paper-6] IEEE Signal Processing Society, Best Paper Award .

[MSSIM-7] Z. Wang, EP Simoncelli, AC Bovik: Multiscale structural similarity for image quality assessment . In: Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2004 . tape 2 , November 2003, p. 1398-1402 , doi : 10.1109 / ACSSC.2003.1292216 (English).

[8] LIVE Image Quality Database.

[assessment-9] Richard Dosselmann, Xue Dong Yang: A comprehensive assessment of the structural similarity index . In: Signal, Image and Video Processing . tape 5 , no. 1 , November 6, 2009, ISSN 1863-1703 , p. 81–91 , doi : 10.1007 / s11760-009-0144-1 (English).