Constrained-Energy Lapped Transform

from Wikipedia, the free encyclopedia
Constrained-Energy Lapped Transform
File extension : no
MIME type : audio / celt
Developed by: Xiph.Org Foundation , IETF Codec Working Group
Initial release: December 2007
Current version: 0.11.1 (as of February 15, 2011)
Type: Audio
Contained in: Ogg
Expanded to: opus
Standard (s) : current IETF Internet draft
Website : celt-codec.org

Constrained-Energy Lapped Transform ( CELT ; German for example: overlapping transformation with predetermined energy ) is a (patent) -free data format / method for lossy audio data compression with particularly low codec latency in order to be able to process what is typically immediately before the compressed transmission to cause as little delay ( latency ) as possible . The process is openly documented and can be used free of software data restrictions. It was created by the Xiph.Org Foundation (as part of the Ogg Codec family) and the Codec Working Group of theInternet Engineering Task Force (IETF), but has now merged into the further development of Opus . CELT has thus been abandoned as an independent format and will only be further developed in its hybridized form integrated with SILK as a layer of Opus. This article deals with the historical, independent format, for the integrated form and the further developments that have taken place since the integration into Opus, see the article on Opus.

properties

The objective is a method for real-time applications. The main target for this is low codec latency. CELT enables latencies of typically 3 to 9, but configurable down to less than 2 milliseconds, whereby lower latencies require higher bit rates in order to achieve comparable quality again. CELT thus clearly undercuts the latencies possible with other standard codecs.

Like the sister project Vorbis, it is a broadband (entire human listening area can be mapped) all-purpose process, i.e. without specializing in certain types of signals, which sets it apart from its other sister project Speex . It processes audio signals with sampling rates between 32 and 96 kHz and up to two channels ( stereophony ). In principle, the format thus enables transparent results, but also bit rates down to 24 kBit / s. The compression capabilities are said to be significantly superior to MP3 overall . Another property useful for some real-time applications such as telephony, CELT performs very well at low bit rates. With the help of the frequency band folding , the sound quality should also be clearly superior to Vorbis and even be similar to that of HE-AACv 1. In comparative double-blind hearing tests at ~ 64 kBit / s it also proved to be clearly superior to HE-AACv1.

It has a comparatively low complexity; the calculation effort is similar to that of the low-delay variant of AAC (AAC-LD) and is well below that of Vorbis.

It enables constant and variable bit rates. If the signal disappears in the noise floor on the encoder side during pauses in speech and similar cases, the transmission can be limited to signaling the override to the decoder with the comfort noise generated . Most of the settings of the streaming format can be changed while the data stream is running.

The format reacts robustly to transmission errors. Both the loss of entire packets and bit errors can be masked with a steady increase in interference ( packet loss concealment , PLC).

technology

Block diagram of the codec

CELT is a transformation codec based on the modified discrete cosine transformation (MDCT) and approaches from CELP (code book for excitation, but in the frequency domain).

For the MDCT ( window function ), the original PCM-coded signal is broken down into comparatively small, overlapping blocks and transformed into frequency coefficients. Choosing a particularly short block length enables low latency on the one hand, but also results in poor frequency resolution on the other hand, which must be compensated. To further reduce the codec latency at the expense of minor quality losses, the 50 percent overlap of the MDCT time window is practically halved by setting the signal to zero for one eighth of the time at the beginning and end of the window. Among other things, to make better use of cross-block correlations despite the very short block lengths, the method is state-based and bases the coding of a CELT block on data from previous blocks.

The coefficients are grouped into frequency groups that largely correspond to those of human perception. The entire energy content of each group is evaluated and these energy values ​​are quantized (= data reduction ) and compressed with a forecast, in that only correction values ​​have to be transferred to the forecast values ​​( delta coding ).

The (unquantized) energy values ​​are calculated from the DCT coefficients (normalization). The coefficients of the residual signal thus obtained (English "band shape") are encoded with pyramid vector quantization ( PVQ , a spherical vector quantization ). This coding leads to code words of fixed (predictable) length, which allows tolerance to bit errors, and furthermore makes entropy coding superfluous. At the end of the process, all output data from the encoder are packed together to form a single bit stream using area coding . In connection with the PVQ, CELT uses a technique known as frequency band convolution, which by reusing lower coefficients for higher frequency bands is supposed to do something similar to the spectral band replication (SBR) and thereby has significantly lower implications for codec latency and complexity (computational effort). The resulting increased richness in the corresponding frequency ranges prevents the annoying chirping artifacts (English "birdie artifacts", "musical noise artifacts") that otherwise usually occur .

The decoder unpacks the bit stream again into its components, multiplies the calculated separate energy values ​​again with the DCT coefficients of the residual signal and converts them back into PCM data with the inverse MDCT. The individual blocks are reassembled by means of weighted segmented folding (English “weighted overlap add”, WOLA). Many parameters are not transmitted explicitly, but instead obtained in the decoder using the same function as in the encoder.

The CELT M / S stereo and level difference stereophony are available for channel coupling . Blocks can also be written independently (intra-coded key block), for example to enable the decoder to enter a current data stream. Since with transformation codecs sharp, high-energy sound events ( transients ) can generate audible quantization errors in the entire DCT block, which are masked far less by the transient in the backward temporal direction than afterwards, artifacts that can be perceived as leading echoes (English "pre-echo artifacts") ) occur. With CELT, the blocks can be subdivided again to counteract such artifacts.

history

In 2005, Xiph started working on plans and drafts for a Vorbis successor as part of the Ghost project (initially in conversation as Vorbis II). In addition to the codec plans of Vorbis creator Christopher Montgomery , which were stopped in favor of the further development of Theora , this also resulted in Jean-Marc Valin's concept for a particularly low-latency process. Valin has been developing at CELT since 2007 and transferred the first code to the project's repository on November 29th. In December 2007 the first development version 0.0.1 was published, initially named Code-Excited Lapped Transform. CELT has been available to the IETF since July 2009 as a proposal for a free codec standard for telecommunications over the Internet, with the IETF codec working group now also participating in the development.

As of version 0.9, the previously used pitch prediction in the frequency domain has been replaced by a less complex solution with a pre-filter and a post-filter in the time domain, which was contributed by Raymond Chen from Broadcom .

With CELT 0.11 of February 4, 2011, the bit stream format was provisionally determined - subject to possible, unexpectedly necessary final changes.

Although the format has not yet been finally determined, the procedure has been used since January 2009 in the Ekiga and FreeSWITCH IP telephony applications and now also in Mumble , TeamSpeak and other software. Shortly after the appearance of the hybrid codec Opus (formerly known as "Harmony"), CELT was incorporated as the basis of Opus and will only be further developed within the framework of this follow-up project. Opus represents a superset to CELT and the SILK method, in which the CELT algorithms are responsible for an upper frequency component, while SILK is responsible for the lower frequency component. The corresponding draft has been available to the IETF since September 2010.

In April 2011, support for CELT was added to FFmpeg .

software

The reference implementation is a program library called libcelt, written in the C programming language , which is published as free software under Xiph's own three-part BSD-like license .

Web links

swell

  1. a b c Presentation of the procedure by Timothy B. Terriberry (65 minutes video in ~ 100 MiB OggTheora + Vorbis, see also presentation slides in PDF, ~ 2.3 MiB)
  2. Jason Garrett-Glaser : Important: upcoming CELT bitstream freeze! (No longer available online.) In: ffmpeg-devel.mplayerhq.hu - FFmpeg development discussions and patches mailing list. mplayerhq.hu, November 18, 2010, formerly in the original ; accessed on January 25, 2011 (English).  ( Page no longer available , search in web archivesInfo: The link was automatically marked as defective. Please check the link according to the instructions and then remove this notice. @1@ 2Template: dead link / lists.mplayerhq.hu  
  3. a b c d Christopher Montgomery : next generation audio: CELT update 20101223. In: Monty's demo pages. Xiph.Org, December 23, 2010, accessed January 26, 2011 .
  4. Dirk Bösel: CELT impresses in the 64 kb / s multi-format hearing test (2011). In: MPeX.net. MPeX.net GmbH, April 18, 2011, accessed on April 25, 2011 .
  5. Jean-Marc Valin, Timothy B. Terriberry, Christopher Montgomery, Gregory Maxwell: A High-Quality Speech and Audio Codec With Less Than 10 ms Delay . In: IEEE Signal Processing Society (Ed.): IEEE Transactions on Audio, Speech and Language Processing . tape 18 , no. 1 , April 17, 2009 (English, xiph.org [PDF; accessed February 16, 2011]).
  6. Thomas R. Fischer: A pyramid vector quantizer . In: IEEE (Ed.): IEEE Transactions on Information Theory . tape 32 , no. July 4 , 1986 (English).
  7. ^ Second draft of the specification submitted to the IETF
  8. ^ Jean-Marc Valin: Experimental release of Ghost / CELT 0.0.1. In: Hydrogenaudio Forums. December 9, 2007, accessed January 26, 2011 .
  9. Monika Ermert: IETF takes care of license-free audio codec. In: heise online. November 13, 2009, accessed February 12, 2011 .
  10. ^ First draft of the specification submitted to the IETF
  11. Jean-Marc Valin: CELT decoder complexity. (No longer available online.) In: CELT-dev mailing list. Xiph.Org, February 15, 2011, archived from the original on April 2, 2012 ; accessed on February 16, 2011 . Info: The archive link was inserted automatically and has not yet been checked. Please check the original and archive link according to the instructions and then remove this notice.  @1@ 2Template: Webachiv / IABot / lists.xiph.org
  12. ^ Software that uses or supports CELT. In: CELT website. Xiph.Org, accessed January 25, 2011 .
  13. ^ Jean-Marc Valin, Koen Vos: Definition of the Opus Audio Codec. In: IETF Internet Drafts. IETF Network Working Group, October 2010, accessed January 25, 2011 .
  14. ffmpeg.org
  15. git.videolan.org