Codec2

from Wikipedia, the free encyclopedia

Codec2 is a ( patent -) free lossy audio codec that specializes in intelligible transmission of human speech at extremely low bit rates. The codec was created for the transmission of voice signals over narrowband radio channels in amateur radio . The reference implementation is subject to the terms of version 2.1 of the GNU Lesser General Public License (LGPL).

The open specification of the method enables digital communication over amateur radio frequencies without necessarily transmitting unspecified digital content through the use of previously available proprietary codecs such as AMBE or MELP , which radio amateurs are forbidden.

Codec2 has already been officially integrated into FreeSWITCH and a patch is available for integration into Asterisk .

features

Codec2 offers fixed bit rate modes of 3,200, 2,400, 1,600, 1,400, 1,300, 1,200, 700 or 450 bps. It processes and delivers PCM data with a sampling frequency of 8 kHz. The individual (parameter) data packets each cover 10 to 20 (2.4 kBit / s) or 40 (1.4 kBit / s) milliseconds. The author puts the algorithmic latency at around 100 milliseconds. The voice quality is moving slightly below the ordinary 2G - mobile phones and can be said to measure at comparatively much lower bit rate with that of AMBE.

The reference implementation is programmed in C and has so far not managed without floating point arithmetic , although the procedure itself does not require this. The reference software package also contains an FDMDV software modem and a graphical user interface based on FLTK . The software is developed on Linux and, in addition to a Linux version, a Windows port created using Cygwin is also offered.

Main developer Rowe basically avoided algorithms that are affected by valid patents by basing his process on techniques that have been known for decades. However, no comprehensive patent search had been carried out before it was presented at linux.conf.au in January 2012.

technology

The method works with means of parametric audio coding using a model of the human voice. Among other things, it uses a sinusoidal model as the basic method, which goes back to developments by Robert J. McAulay and Thomas F. Quatieri (MIT Lincoln labs) in the mid-1980s and is closely related to that of the multi-band excitation codecs. From the input signal, parameters for describing line spectrum pairs (a type of LPC coefficient), (basic) pitch, energy and voicing of the signal are determined and quantized. A PCM signal is synthesized from this on the receiver side. The sinusoidal model is based on regularities (periodicity) in the pattern of the overtone frequencies and layers sinusoids harmoniously over a determined fundamental frequency. The amplitudes of the upper frequencies are modeled with Linear Predictive Coding (LPC).

history

The prominent Free Software advocate and radio amateur Bruce Perens saw the need for a free speech codec for less than 5 kBit / s. In 2008 he approached Jean-Marc Valin ( Speex , Opus ), who introduced him to the main developer David Grant Rowe, who has worked with Valin on Speex on various occasions. Rowe is also a radio amateur himself ( callsign VK5DGR) and has experience in creating and using codecs and other signal processing algorithms for speech signals. Among other things, he obtained a doctorate in speech coding in the 1990s and was involved in setting up one of the first satellite telephony systems ( Mobilesat ).

He was convinced of the task and on August 21, 2009 announced his decision to work on a corresponding codec. He built on the research and findings from his doctoral thesis. In August 2010 he released version 0.1 alpha.

Version 0.2 was released towards the end of 2011, introducing a mode with 1,400 bits / s and bringing significant improvements in quantization.

In January 2012 at linux.conf.au, Jean-Marc Valin helped to improve the quantization of the line spectrum pairs, which Rowe is less familiar with. After several changes to the available bit rate modes in winter and spring 2011/2012, modes with 2,400, 1,400 and 1,200 bit / s have been available since May.

In July 2018, a mode with 450 bit / s was published, which was developed as part of a master's thesis at the University of Erlangen-Nuremberg. By skillfully training the vector quantization, the data rate could be further reduced based on the principle of the 700C mode.

Web links

Individual evidence

  1. http://news.slashdot.org/story/10/09/21/0428259
  2. http://www.itr.unisa.edu.au/~steven/thesis/dgr.pdf
  3. http://www.rowetel.com/blog/?p=128
  4. http://www.rowetel.com/blog/?p=839
  5. http://jmspeex.livejournal.com/10446.html