Video compression

from Wikipedia, the free encyclopedia

Video compression is used to reduce the data rate of a digitized video signal so that it can be stored or transmitted more easily. Achievable compression rates are typically between 1: 5 and 1: 500.

Video compression has its origins in still image compression. Simpler methods such as MJPEG compress the individual images in a video independently of one another. The compression rates that can be achieved are around 1:10. More developed methods also use similarities between the individual partial images for coding. The compression rates that can be achieved with this today are above 1: 100 with hardly any reduction in quality.

The standardization of video coding methods has meanwhile become a process spanning international organizations, in which the Moving Picture Experts Group (MPEG) and the International Telecommunication Union (ITU) are involved. Therefore, many identical processes have different names, such as H.264, MPEG-4 AVC, MPEG-4 / Part 10 or ISO / IEC 14496-10, behind which the same format is hidden.


Redundancy reduction
Irrelevance reduction

The compression algorithms are based on

  • Redundancies (self-similarities) of the video signal (redundancy reduction) and
  • Inadequacies and physiological effects of human vision (irrelevance reduction).

The terms redundancy reduction and irrelevance reduction come from information theory and describe two different approaches to reducing the amount of data, also known as data compression, in the transmission of information. A model is used in which information is transmitted from a source to a sink. In the specific case of video coding, the source corresponds to the sequence of video images as they arise in the original camera, the sink corresponds to the viewer's eye.

Redundancy reduction

The redundancy reduction uses the properties of the source data to reduce the amount of data to be transmitted. In the case of video coding, statistical properties of the image signal and similarities (correlations) between temporally and spatially adjacent image points are used in order to achieve the most compact possible representation. Compression factors of 1: 2 to 1: 5 can be achieved. Since no information is lost, it is called lossless coding .

The inter-coding methods , which extrapolate estimated values ​​from individual images that have already been transmitted, in order to only have to transmit the estimation errors, are used to utilize temporal correlations . (→ Difference coding ) For spatial correlations there are intra-coding methods, which code image points using the difference to estimated values ​​from spatially surrounding points or recognize image patterns spanning pixels and can describe them more compactly. So-called → entropy coding is used to make use of statistical redundancy .

Irrelevance reduction

The aim of irrelevance reduction is to omit that information from the transmission that is not relevant for the sink. It takes into account physiological peculiarities of human visual perception and deliberately rejects information so that the resulting disturbances are as little perceptible as possible for human viewers. This enables a further compression of typically 1: 2 to 1:50, depending on the method and the required quality. Since information is discarded, one speaks of lossy coding .

In concrete terms, in the case of video coding, this means that only part of the image data is transmitted through quantization .

Since the spatial resolution of color perception is worse than the resolution of differences in brightness due to the anatomy of the eye, the resolution of the color information can be reduced without the differences being strongly perceived. One speaks of color subsampling . Usually, a conversion into a corresponding color model is carried out before the coding and the data rate is usually reduced by 50%.

Another property of the visual system that can be exploited is the frequency dependence. Similar to sounds, images can also be represented as a superposition of two-dimensional vibrations. Low frame rates are responsible for coarse image structures, high rates for fine details. Disturbances in the various frequency ranges are perceived to differing degrees, which can be illustrated by a simple test image.

This frequency dependency is used in all video compression methods of the MPEG family after a suitable transformation as a psychovisual factor in the quantization.

Basic techniques

Video compression techniques consist of several sub-processes, so-called compression tools (English tools ) that take advantage of various kinds of redundancy. Intra-frame prediction (pixel extrapolation and differential coding ) and transformation coding are based on correlations of spatially adjacent pixels , while inter-frame coding uses time dependencies, for example, motion compensation and differential coding (DPCM), and finally statistical redundancy using entropy coding reduced.

Frequency transformation

In the case of block-based transformation coding (for example with the discrete cosine transformation , DCT), individual images ( frames ) are divided into square blocks and these are assessed according to their complexity. This step is necessary so that the codec “knows” for which (complex) image blocks it needs a lot of storage space and for which (simple) blocks fewer bits are sufficient. This is the prerequisite for the irrelevance reduction.

Differential coding

Differential Pulse Code Modulation (DPCM) is usually used to utilize similarities between neighboring pixels or between individual images : Only the differences to individual images or pixels that have already been transmitted are saved. The procedure is supplemented by motion correction with inter-coding.

Motion correction

Difference-coded image with motion vectors drawn in

A further possibility for reducing the amount of data is the motion correction ( English motion compensation ). A search is made for matching parts of the image that have moved on compared to the last individual image. A motion vector is saved for this, the unmoved ones are simply taken over from the last single image.

Entropy coding

Using a code with variable -length codes (VLC), statistical redundancies in value series can be removed. Instead of coding all symbols to be transmitted with a constant code word length, symbols that occur more frequently or are more likely are coded with shorter code words than less common symbols. This is where arithmetic coding methods are most widespread. In some cases, however, the older Huffman coding or variants of the less complex run-length coding ( e.g. CAVLC ) are still in use.


Standardization began with the H.120 standard , which has not yet found practical use. Common video coding formats generally follow the basic design established with the successor H.261 (1988). The most important features are the block-based frequency transformation , (motion-compensated) differential pulse code modulation (DPCM) and entropy coding. The main techniques for this were developed by 1979. This basic design has since been continuously refined and auxiliary technologies developed, which later resulted in hundreds of patents . Many older techniques will not find widespread use until many years later, when their use becomes practical due to advances in the performance of microprocessor technology. An example of an exception with a certain relevance is the wavelet- based VC-2 standard (Dirac variant).

The H.26x video format series from ITU-T and the MPEG video formats have so far (2016) been the dominant video coding standards. Up to and including H.264 , they regularly marked the state of the art when published and several were widely used, including MPEG-1 (1991), MPEG-2 (1994) and most recently H.264 / MPEG-4 AVC (2003). In addition to special formats for niche applications, there were various cheaper and partly proprietary main competitors such as Microsoft's Windows Media Video 9 or VC-1 , several formats from On2's VPx series and, most recently, their successors, the VP8 and VP9, which were bought by Google . Since Theora there have been efforts to find freely licensed formats that were initially even less noticed and technically inferior. With Google's release of VP8 (2008) and VP9 (2012), significant technical advances were made and the performance of the free formats largely caught up with the state of the art. With the Alliance for Open Media , the industry established broad support for license-free video formats from 2015.

By Bell Laboratories in 1950 the patent on DPCM was filed, which was soon applied much on video encoding. Entropy coding began in the 1940s with Shannon-Fano coding , on which the commonly used Huffman coding developed in 1950 is based; the more modern context adaptive arithmetic coding (CABAC) was published in the early 1990s. Transformation coding (using Hadamard transformation ) was introduced in 1969, and the popular discrete cosine transformation (DCT) appeared in scientific literature in 1974.

See also


  • Lajos L. Hanzo, Peter J. Cherriman, Jürgen Streit (University of Southampton): Video compression and communications . from basics to H.261, H.263, H.264, MPEG2, MPEG4 for DVB and HSDPA-style adaptive turbo-transceivers. 2nd Edition. IEEE Press, 2007, ISBN 978-0-470-51849-6 .

Web links

Commons : video compression  - collection of images, videos and audio files

Individual evidence

  1. Test image to show the frequency dependence of the resolution perception of the human eye ( Memento from October 30, 2007 in the Internet Archive )
  2. Patent US2605361 : Differential Quantization of Communication Signals. Filed June 29, 1950 , published July 29, 1952 , inventor: C. Chapin Cutler.
  3. ^ Claude Elwood Shannon : A Mathematical Theory of Communication . In: Alcatel-Lucent (Ed.): Bell System Technical Journal . tape 27 , no. 3-4 , 1948 (English).
  4. ^ David Albert Huffman : A method for the construction of minimum-redundancy codes . In: Proceedings of the IRE . tape 40 , no. 9 , September 1952, p. 1098–1101 , doi : 10.1109 / JRPROC.1952.273898 (English, [PDF]).
  5. ^ CCITT Study Group VIII and the Joint Photographic Experts Group (JPEG) of ISO / IEC Joint Technical Committee 1 / Subcommittee 29 / Working Group 10: Recommendation T.81 . Digital Compression and Coding of Continuous-tone Still images - Requirements and guidelines. Ed .: ITU-T. 1993, Annex D - Arithmetic coding, p. 54 ff . (English, [PDF; accessed November 7, 2009]).
  6. ^ William K. Pratt, Julius Kane, Harry C. Andrews: "Hadamard transform image coding", in Proceedings of the IEEE 57.1 (1969): pp. 58-68
  7. ^ Nasir Ahmed, T. Natarajan, Kamisetty Ramamohan Rao: Discrete Cosine Transform . In: IEEE Transactions on Computers . C-23, no. 1 , January 1974, p. 90–93 , doi : 10.1109 / TC.1974.223784 (English, [PDF]).
  8. Cliff Reader: Patent landscape for royalty-free video coding . In: Society of Photo-Optical Instrumentation Engineers (Ed.): Applications of Digital Image Processing XXXIX . San Diego, California August 31, 2016 (English, recording of the lecture, from 3:05:10 ).