Modified discrete cosine transform

The modified discrete cosine transform (English modified discrete cosine transform , short MDCT ) is a real-valued , discrete , linear , orthogonal transformation , which to the group of discrete Fourier transforms counts (DFT) and a modification of the eponymous discrete cosine transform is (DCT).

The MDCT was developed in 1986, 1987 by John P. Princen, AW Johnson, and Alan B. Bradley.

The MDCT is the central transformation of the audio data compression methods Advanced Audio Coding (AAC), Dolby Digital (AC-3), Ogg Vorbis , but also MPEG Audio Layer 3 (MP3), Opus and ATRAC use and others. a. the MDCT as a spectral transformation. In addition, there is the similarly structured modified discrete sine transformation (MDST), which is based on the discrete sine transformation , but which has no significant significance in the field of digital signal processing .

motivation

The MDCT is based on type IV of the discrete cosine transformation, also referred to as DCT-IV, and uses at the beginning of the input signal sequence to be transformed, for example, this is a finite number of samples of an audio signal, an even continuation and at the end of the signal sequence an odd continuation. The input signal is divided into successive blocks, each block being subjected to the transformation separately. In the MDCT, the signal sequences for forming the individual blocks are partially overlapped with one another in order to compensate for the even or odd continuations of the block formation. In the mostly English-language specialist literature, this is referred to as time-domain aliasing cancellation (TDAC) . Similar methods are used in the context of the DFT with the overlap-add method and the overlap-save method , in order to convert the periodic continuation of the DFT into the aperiodic convolution operation.

The MDCT avoids what is known as block artifacts in the DCT of JPEG compression: jumps between sample values of neighboring transformation blocks. The human hearing reacts to this form of disturbance even more sensitively than the eye, so a method had to be found that does not change suddenly between neighboring blocks, but rather gradually. This is done by enlarging the samples entering a transformation using a window function. However, the problem here is that this would normally increase the amount of data, since sampling values are used multiple times in calculations and are stored redundantly. The MDCT avoids this problem in that sample values are included in the transformation as output values, but only spectral values are produced. Normally such a transformation would be very lossy, but these errors are canceled out again during the inverse transformation and when adding neighboring inverse transformed blocks under certain conditions. ${\ displaystyle \ mathrm {2N}}$ ${\ displaystyle \ mathrm {N}}$

It is possible to carry out a spectral transformation with a window function without increasing the number of values. This window function leads to a better spectral resolution in the MDCT and to fewer artifacts in the IMDCT.

definition

transformation

Due to the overlap, with MDCT and in contrast to symmetrical frequency transformations, the amount of input samples from the time domain is twice as large as the spectral output data formed from them. Formal be in transforming real numbers to real numbers displayed using the following equation: ${\ displaystyle 2 \ mathrm {N}}$ ${\ displaystyle x_ {0}, \; \ dots, x_ {2 \ mathrm {N} -1}}$ ${\ displaystyle \ mathrm {N}}$ ${\ displaystyle X_ {0}, \; \ dots, X _ {\ mathrm {N} -1}}$

{\ displaystyle X_ {k} = \ sum _ {n = 0} ^ {2 \ mathrm {N} -1} x_ {n} \ cos \ left [{\ frac {\ pi} {\ mathrm {N}} } \ left (n + {\ frac {1} {2}} + {\ frac {\ mathrm {N}} {2}} \ right) \ left (k + {\ frac {1} {2}} \ right) \ right] \;}

With

{\ displaystyle \; k = 0, \; \ dots, \ mathrm {N} -1}

In the literature, additional constant factors are sometimes introduced in this relationship in a non-uniform form and for standardization purposes, but these do not fundamentally change the transformation.

Inverse transformation

The inverse MDCT, abbreviated IMDCT , represents the reverse of the above transformation. Since the input and output sequences have different numbers, an addition in the time domain of the successive blocks and the temporally overlapping areas is a part of the time domain aliasing cancellation (TDAC) required.

Formal are at the IMDCT real numbers in real numbers transferred: ${\ displaystyle \ mathrm {N}}$ ${\ displaystyle X_ {0}, \; \ dots, X _ {\ mathrm {N} -1}}$ ${\ displaystyle 2 \ mathrm {N}}$ ${\ displaystyle y_ {0}, \; \ dots, y_ {2 \ mathrm {N} -1}}$

{\ displaystyle y_ {n} = {\ frac {1} {\ mathrm {N}}} \ sum _ {k = 0} ^ {\ mathrm {N} -1} X_ {k} \ cos \ left [{ \ frac {\ pi} {\ mathrm {N}}} \ left (n + {\ frac {1} {2}} + {\ frac {\ mathrm {N}} {2}} \ right) \ left (k + {\ frac {1} {2}} \ right) \ right] \;}

With

{\ displaystyle \; n = 0, \; \ dots, 2 \ mathrm {N} -1}

As with DCT-IV, as a form of orthogonal transformation, the inverse transformation is identical to the forward transformation except for one factor.

use

The MDCT is the basic operation of modern audio compression methods. For this purpose, the input signal is divided into half-overlapping blocks of the length , each of which extends from the sample value . ${\ displaystyle b = 0,1,2, \; \ dots}$ ${\ displaystyle 2 \ mathrm {N}}$ ${\ displaystyle x_ {b \ mathrm {N} - \ mathrm {N}}, \; \ dots, x_ {b \ mathrm {N} + \ mathrm {N} -1}}$

The transformation is carried out block by block for each block using a window function (which must have certain properties): ${\ displaystyle b}$ ${\ displaystyle w_ {n}}$

{\ displaystyle X_ {b, k} = \ sum _ {n = 0} ^ {2 \ mathrm {N} -1} w_ {n} \; x_ {b \ mathrm {N} - \ mathrm {N} + n} \; \ cos \ left [{\ frac {\ pi} {\ mathrm {N}}} \ left (n + {\ frac {1} {2}} + {\ frac {\ mathrm {N}} { 2}} \ right) \ left (k + {\ frac {1} {2}} \ right) \ right]}

With

{\ displaystyle \; k = 0, \; \ dots, \ mathrm {N} -1, \; b \ in \ mathbb {N}}

The inverse transformation takes place for a sample with and ${\ displaystyle y_ {b \ mathrm {N} + n}}$ ${\ displaystyle n = 0, \; \ dots, \ mathrm {N} -1}$ ${\ displaystyle b \ in \ mathbb {N}}$

{\ displaystyle y_ {b \ mathrm {N} + n} = {\ frac {2} {\ mathrm {N}}} \ left (w_ {n} \ sum _ {k = 0} ^ {\ mathrm {N } -1} X_ {b + 1, k} \ cos \ left [{\ frac {\ pi} {\ mathrm {N}}} \ left (n + {\ frac {1} {2}} + {\ frac {\ mathrm {N}} {2}} \ right) \ left (k + {\ frac {1} {2}} \ right) \ right] \; - \; w_ {n + \ mathrm {N}} \ sum _ {k = 0} ^ {\ mathrm {N} -1} X_ {b, k} \ cos \ left [{\ frac {\ pi} {\ mathrm {N}}} \ left (n + {\ frac { 1} {2}} + {\ frac {\ mathrm {N}} {2}} \ right) \ left (k + {\ frac {1} {2}} \ right) \ right] \ right)}

MDCT window functions:
blue: cosine, red: sine-cosine, green: modified Kaiser-Bessel

Leakage of the MDCT window functions:
blue: cosine, red: sine-cosine, green: modified Kaiser-Bessel

The window function must have the following properties: ${\ displaystyle w_ {n}}$

The same function can be used for the analysis and the synthesis of a block . Otherwise the TDAC will not work.

{\ displaystyle b}

The window function is used twice for each sample, both in the analysis and in the synthesis. These two values must be OB d. A. satisfy the equation . The condition is called the Princen-Bradley condition . ${\ displaystyle w_ {prev} ^ {2} + w_ {next} ^ {2} = 1}$

{\ displaystyle w_ {n}}

should be as smooth a function as possible in order to keep the leakage effect low, the

would reduce the concentration of dominant signal components in the analysis and
would generate interference signals away from dominant signal components during the synthesis (DC components would e.g. generate a rattle).

The second condition differs the window function considerably from the normal window functions. The following three window functions are essentially used:

Cosine window
modified Kaiser-Bessel windows
Sine-Cosine Window

Calculation effort

The direct calculation of the MDCT according to the above formula requires O (N ² ) operations. Similar to the fast Fourier transform (FFT), as a form of efficient calculation of the DFT, there are also algorithms for the MDCT which are structured similar to the Radix-2 algorithm to reduce the number of arithmetic operations to O (N log N) to reduce.

In addition, the MDCT can be calculated using pre- and post-processing and an FFT.

literature

Henrique S. Malvar: Signal Processing with Lapped Transforms . Artech House, 1992, ISBN 0-89006-467-9 .

Individual evidence

↑ John P. Princen, Alan B. Bradley: Analysis / Synthesis filter bank design based on time domain aliasing cancellation . In: IEEE Transactions on Acoustics, Speech and Signal Processing . tape 34 , no. 5 , October 1986, p. 1153-1161 , doi : 10.1109 / TASSP.1986.1164954 .
↑ J. Princen, A. Johnson, A. Bradley: Subband / Transform coding using filter bank designs based on time domain aliasing cancellation . In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87. tape 12 , 1987, pp. 2161–2164 , doi : 10.1109 / ICASSP.1987.1169405 (first mention of the term MDCT ).

[1] John P. Princen, Alan B. Bradley: Analysis / Synthesis filter bank design based on time domain aliasing cancellation . In: IEEE Transactions on Acoustics, Speech and Signal Processing . tape 34 , no. 5 , October 1986, p. 1153-1161 , doi : 10.1109 / TASSP.1986.1164954 .

[2] J. Princen, A. Johnson, A. Bradley: Subband / Transform coding using filter bank designs based on time domain aliasing cancellation . In: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87. tape 12 , 1987, pp. 2161–2164 , doi : 10.1109 / ICASSP.1987.1169405 (first mention of the term MDCT ).