Convolution reverb

from Wikipedia, the free encyclopedia

The convolution is an acoustic effect to which the reverberation simulates a real or virtual space. To do this, the original audio signal is passed through a digital filter that mimics the acoustic properties of the room. The provision of the convolution hall is usually preceded by the acoustic measurement of a real room. The standard method for this is called Multi Impulse Response , in German: multiple impulse response, MIR for short.

General

In contrast to synthetic reverberation, which simulates certain types of room through artificially generated reflections, convolution reverb is based on a sample of an acoustic room. By generating a test signal (e.g. a sinus sweep or white noise ), the individual reverberation of any room can be determined as an impulse response using a stereo microphone . The result is a typical signal curve , which is also referred to as the “fingerprint” of the individual room sound . Any audio signal that does not contain any reflections can then be provided with this individual surround sound. After processing, the audio signal sounds as if it was recorded at the location where it was recorded, including real reflections. The listening perspective always corresponds to the microphone position when recording the impulse responses and is also dependent on the selected stereophony method. The location of the heard sound source corresponds to that of the original sound source of the recorded sound event.

The advantages are a realistic sound and the free availability of numerous impulse responses on the Internet. In addition, the technology is cheaper than the alternative high-end effects devices or recording on site. In addition, rooms can be simulated that do not actually exist at all (e.g. in films).

A disadvantage is that VST -based plug-ins use a lot of CPU power. In addition, impulse responses are rigid and cannot be edited (e.g. position in space). Slight latencies (individual, depending on the data reduction or degree of approximation and computing power) are another negative aspect.

Basic principle

The conversion between local space and frequency space via FFT and IFFT

Every sound or every audio signal can be seen as a mixture of one or more sinus tones (individual frequencies). The audio signal is a total oscillation.

The sound of an audio signal, for example that of an instrument, results from the instantaneous presence of all its frequencies at the respective point in time. These frequencies all have a specific sound pressure amplitude with a specific phase position.

With the fast Fourier transformation (FFT), a point in time of an audio signal ( spatial space ) can be represented in the frequency space . Conversely, each constellation in the frequency domain can represent a point in time of an audio signal through the inverse FFT (IFFT). Every change in the frequency space (e.g. change in the amplitude of a frequency) results in a typical sound change, transferred to the spatial space by the IFFT.

folding

In theory, it would not be necessary to multiply the frequency images in the frequency domain. Instead, one could multiply each point in time of the signal to be echoed by each point in time of the impulse response . The calculation method for this is called convolution :

.

Since the calculation is carried out on a digital level, both signals (to be reverberated and impulse response) are discrete signals. Discrete means they have a finite number of values. They consist of a finite number of so-called audio samples. This also limits the number of calculation processes. At a sampling frequency (sampling rate) of 44.1 kHz, each audio channel has 44,100 samples per second. The convolution at the discrete level is defined by

.

However, this folding is computationally expensive. Therefore, the signals are not calculated in the local space as shown here, but by multiplications in the frequency space.

The frequency spectra for each point in time are generated by FFT for both the overall course (spatial area) of the signal to be reverberated and that of the impulse response.

With convolutional reverberation, every point in time of the signal to be echoed is transmitted into the frequency domain. Every point in time of the so-called impulse response (see introduction) is also available in the frequency domain. In the frequency domain, each point in time of the signal to be reverberated is multiplied by each point in time of the impulse response and the result is transmitted again (at the correct position in time) via IFFT into the spatial area. The result is again a course in the local area: the reverberated signal.

On a digital level, this means that every sample of the original audio signal is scaled with every sample of the impulse response.

As with the two output signals, the new signal consists of an individual function for each point in time. A sound or an overall audio signal does not result from a periodic function. It has a different mathematical function for each point in time. This is why this relatively high computational effort is necessary for the convolution, in which every point in time (or every sample) of one signal has to be offset against every sample of the other signal.

Each point in time of one signal is offset against each point in time of the other signal and the result is transferred back to the local area via IFFT.

Rendering and data reduction

The convolution reverb calculation can take place by real time calculation (real time effect) or by rendering. Real-time calculation means that the echoed signal is calculated while it is being played. Because of the large computing capacity, there is always a certain time delay (latency). Rendering means that the reverb signal or its audio file is calculated offline. Playback is then possible without latency.

For example, if the impulse response, i.e. the reverberation time, is five seconds and the signal to be echoed (e.g. an instrument) lasts one minute, at a sampling rate of 44.1 kHz the number of convolution operations for a stereo signal is:

60 × 44,100 × 5 × 44,100 × 2 = 1,166,886,000,000

That is over a trillion calculations of two frequency spectra for one minute of stereo of a reverberated signal.

With today's computing power, convolution reverb can only be approximated in real-time calculations, otherwise the latency would be unacceptably long. Even the rendering is only approximate today because of the large computing capacities. The MIR application of the Vienna Symphonic Library offers the most accurate simulation to date . The instruments of an orchestra are reverberated individually. The individual radiation characteristics of each individual instrument are taken into account.

See also

literature

Web links