Speex

Speex
File extension :	.spx
MIME type :	audio / speex
Developed by:	Xiph.Org Foundation
Type:	Audio format
Contained in:	Ogg
Standard (s) :	specification
	;

Speex is a free - also patent-free - lossy audio codec by Jean-Marc Valin , which is specially designed for the space-saving storage of audio data containing human speech. Like all speech codecs , it is generally unsuitable for other types of signal . The name is a phonetic spelling of the English word speaks ( 3rd person present : " speaks ").

On the official website, Speex is described as obsolete by Opus , as the latter is superior in all aspects.

The process was developed under the umbrella of the Xiph.Org Foundation and is published under Xiph's BSD- type license. It was intended to complement the lossy general purpose Vorbis process .

The data is by default in Ogg - Container format stored. However, Speex files usually have the ending .spx. To make it easier to distinguish them from Ogg Vorbis . However, Speex can also be transferred in other containers or without a container, e.g. B. is common in IP telephony , where transmission is usually carried out directly via UDP / RTP . Compared to general-purpose compression methods such as MP3 or Vorbis , no music data or other other types of signal can be reduced in size without a clearly audible loss of quality; but significantly better compression rates are achieved with spoken text .

Speex ' MIME type is without a container audio/speex, while it is used audio/oggfor .spx files.

description

In contrast to many other speech codecs, Speex's bit rate range as well as its error tolerance and correction mechanisms are not aimed at mobile phone applications, but rather at conditions common to IP telephony or files. The design of the format was designed to create a codec that can achieve both very good voice quality and low data rates. This resulted in a multi-bit rate codec. With regard to the use in IP instead of mobile telephony , the expected transmission errors are not mutilated, but rather lost packets. The UDP used ensures this completely or not at all delivery of the data packet . These considerations led to the decision to use Code Excited Linear Prediction (CELP) as the fundamental technology behind Speex. One main reason is that CELP has already proven to be suitable for low (example: DoD CELP at 4.8 kbit / s) as well as for higher bit rates (such as G.728 with 16 kbit / s).

features

The most important features can be summarized as follows:

Free software / open source, patent - and free of charge
Large data rate range (from 2 to 44 kbit / s)
different levels of complexity
comparatively high sampling rates (up to 48 kHz)
Possibility of encoding within one and the same data stream in different bandwidths
Dynamic bit rate change and variable data rates (VBR)
Intensity stereophony , option for coding in intensity stereophony
Packet loss obfuscation
Echo cancellation
Speech pause detection (English Voice Activity Detection , VAD ; integrated in the variable bit rate mode)

Sampling rates

In order to be able to enable very good quality, higher sampling rates than the 8 kHz usual for telephone quality are supported. Speex supports sample rates up to 48 kHz, but is primarily designed for 8, 16, and 32 kHz, known as narrowband, broadband, and ultra-broadband.

quality

The Speex coding is basically controlled by a parameter that defines a quality level. This can have values from 0 to 10. For constant bit rate ( constant bit-rate , CBR ) are integer values specified for variable bit rates a floating point number .

complexity

With Speex it is possible to set the encoder to different levels of complexity. The search depth is determined by an integer between 1 and 10, which usually reduces the noise intensity at level 10 compared to level 1 by about one to two decibels, but increases the computational effort by a factor of about 5. As a good compromise, the range from level 2 to 4 is recommended, although the higher settings are often helpful for signals that contain something other than human speech.

Variable bit rate (VBR)

VBR allows the codec to dynamically adapt the bit rate to the complexity of the signal. In the case of Speex, for example, this specifically means that vowels and strong transients require more data than fricatives for an adequate representation . Therefore, with a variable bit rate, higher quality is possible with the same data expenditure or less data is generated in order to achieve a comparable quality. This mode is of course less aimed at streaming applications, as the capacity of the transmission channel specifies a fixed upper limit that may not be able to be adhered to if a quality level to be achieved is specified and the input signal contains a point that is too complex. Furthermore, the average bit rate cannot be foreseen in this mode.

Average bit rate ( average bit rate , ABR )

The quality is adjusted dynamically in real time (open-loop) in order to achieve a specific target bit rate, which means that the average bit rate is predictable. Overall, the quality is somewhat lower than if the encoder had been set exactly to the desired average bit rate with a real variable bit rate.

Speech pause detection ( Voice Activity Detection , VAD )

Speex detects silence or background noise and stores for those areas only descriptive parameters, which allow the production of a similar to the human ear background noise, so-called comfort noise (Engl. Comfort noise generation , CNG). This procedure is included in the variable bit rate mode.

Uneven transfer ( Discontinuous Transmission , DTX )

This technology is an addition to the variable bit rate and speech pause detection, with which any data transmission can be stopped while the background noise remains constant. In file-based operation, placeholder frames are generated that each require five bits, which results in a bit rate of 250 bits per second.

Perceived improvement ( Perceptual enhancement )

This refers to techniques that serve to hide the deviations from the original signal from human perception resulting from the coding / decoding process, which usually also alienates the sound from the original in favor of a subjective sound improvement.

Algorithmic delay

With Speex this corresponds to the length of a frame plus a certain amount of lead before the processing of a frame can begin. The narrowband mode (8 kHz) results in a delay of 30 ms, for broadband (16 kHz) 34 ms delay.

application

Mainly, Speex is used for telecommunication over the Internet, e.g. B. for IP telephony and for communication during online games (for example with TeamSpeak , Mumble and Counter-Strike ). Other areas of application are streaming audio , audio books and spoken podcasts . Accordingly, Speex is supported by a large number of programs from numerous areas, including audio playback programs ( Winamp , XMMS , foobar2000 ), audio editors, IP telephony programs ( Ekiga , Jitsi , Jabbin , Linphone , KPhone , Twinkle ) and video games. A list of programs and additional modules is available on the Speex website. There are a DirectShow filter and an ACM - codec on which the Speex capabilities are built of many programs. The US Army uses Speex in one of Raytheon designed EPLRS -Sprechfunksystem on its Land Warrior system. Microsoft also uses Speex 'for the headset of the Xbox Live , as the maintainer of the Theora codec Ralph Giles reported on LugRadio. Speex can be played on iPod and other portable media players with the open source firmware Rockbox .

Since Flash Player 10, it can be used with Adobe Flash (in addition to ADPCM , HE-AAC , MP3 and Nellymoser) instead of the outdated Nellymoser codec.

As the Chaos Computer Club announced in a publication on the Federal Trojan, Speex is also used here to compress voice recordings.

Individual evidence

↑ wiki.xiph.org
↑ CCC report on the analysis of the state Trojan (PDF; 191 kB).

Web links

[1] wiki.xiph.org

[2] CCC report on the analysis of the state Trojan (PDF; 191 kB).