Localization (acoustics)

Under localization refers to the detection of the direction and distance of a sound source as sound localization and distance hearing , that is the direction localization and the distance localization . Localization is a passive process, in contrast to localization , which describes an active process in which the behavior of a transmitted signal is used for localization (e.g. sonar or echolocation in animals ).

Designation of the three levels
above: 1st horizontal level (transversal level)
middle: 2nd median level (sagittal level)
below: 3rd frontal level

The localization of sound sources is a result of both the binaural (binaural) hearing - in the horizontal plane - and the monaural (monaural) hearing - in the median plane . This article describes the localization of sound sources in humans. In animals, other effects also play a role (for example, the influence of ear movements).

Principle of localization in space

The picture shows the possible levels that can be used to localize a sound source in the room. However, only the following information is required for a clear localization:

an angle of incidence in a half plane
an angle of incidence in a full plane
a distance

With the first two details you can span the entire space angularly (turning the half plane around the angle of the full plane). This also corresponds to the interaction of the mechanisms that the hearing uses to localize sound sources.

Depending on the mechanisms that the hearing uses for localization, the following categories must be distinguished (half-plane, full-plane and distance):

Determination of the lateral direction of incidence of the sound.
For this purpose, the hearing evaluates time differences and level differences between the two ears . This differentiates between the directions left, straight ahead, right. These mechanisms of hearing cannot differentiate between front and back ( straight ahead does not mean front here ). An angle of incidence for the entire horizontal plane cannot be determined by the hearing using these mechanisms.
Determination of the median direction of incidence of the sound in the median plane .
For this purpose, the hearing evaluates the resonances of the outer ear . A distinction is made between the front, top, back and bottom - but not right and left.
Distance of the sound source.
For this purpose, the hearing evaluates reflection patterns and timbres from memory.

The first two mechanisms can be used to determine the solid angle at which the sound is incident, and the last mechanism to determine the distance.

The hearing has no direct mechanisms for evaluating a direction of incidence in the frontal plane. Sound sources in the frontal plane are localized through the combination of mechanisms for the horizontal angle of incidence and the median plane.

Determination of the lateral direction of incidence: left, straight ahead, right

To determine the direction of incidence from the side (sound left, straight ahead, right), the hearing evaluates the following information as ear signals :

Time difference between the two ears as interaural time difference (ITD)
Sound from the right reaches the right ear sooner than the left ear. ITD _max = 0.63 ms.
A distinction is made between the
- Evaluation of phase delays at low frequencies
- Evaluation of group delays at high frequencies
Evaluation of frequency-dependent level differences (level differences) between the two ears ( Interaural Level Difference , ILD)
Sound from the right has a higher level at the right ear than at the left because the head shadows the signal at the left ear. These level differences are highly dependent on frequency and increase with increasing frequency.

In the case of low frequencies below approx. 800 Hz , mainly differences in transit time are evaluated (phase delays), in the case of high frequencies above approx. In between there is an area of overlap in which both mechanisms play a role. The quality of the direction determination is not impaired by this.

Evaluation at low frequencies

At low frequencies below 800 Hz, the dimensions of the head with a distance d = 21.5 cm from ear to ear, corresponding to a transit time difference of 632 µs, are smaller than half the wavelength of the sound. The level differences are so small that they do not allow an exact evaluation. Frequencies below 80 Hz can no longer be localized in their direction.

Evaluation at high frequencies

At high frequencies above 1600 Hz, the dimensions of the head are larger than the wavelength of the sound. Here the hearing can no longer clearly determine the direction from phase delays. On the other hand, level differences become larger, which are then also evaluated by the hearing.

In addition, group delays between both ears are evaluated (even at higher frequencies) : This means that if a sound starts again, the direction can be determined from the delay in the sound onset between both ears. This mechanism is especially important in a reverberant environment. When the sound is used, there is a short period of time in which the direct sound already reaches the listener, but no reflected sound. The hearing takes advantage of this period of the initial time gap ( english initial time delay gap , ITDG ) to determine the direction and maintains the detected direction as long as a result of reflections no clear direction determination is possible.

These mechanisms cannot differentiate between front and back. Accordingly, the entire horizontal plane cannot be spanned by these mechanisms.

Determination of the direction of incidence in the median plane: front / back and top / bottom

The human outer ear, i.e. the auricle and the beginning of the auditory canal , act as directionally selective filters. In the structure of the auricle, different resonances are excited in the median plane , depending on the direction of sound incidence . This means that each of these directions (front, up, back, down) has a different resonance pattern. The frequency response of the ears is given direction-specific patterns that are evaluated by the hearing-brain system ( direction-determining bands ).

These patterns in the frequency response are individual, depending on the shape and size of your own auricle. If sound is presented through headphones that was picked up by another head with different auricles, the direction in the median plane can no longer be recognized without problems. Example: The predominant rear localization of artificial head recordings and the “ in the head localization ” (ICL).

Determination of the distance from the sound source

Determining the distance from the sound source is only possible to a limited extent in humans. For example, extreme level differences (for example when whispering into an ear) and special resonance patterns of the auricle in the close range serve as indicators for determining the distance in the close range.

The following information can be used for distance perception:

Frequency spectrum: In air, high frequencies are attenuated more strongly than low frequencies. Therefore, the further away a sound source is, the more muffled it is - the high frequency components are missing. For sound with a known spectrum (e.g. speech), this can be used to estimate the distance.
volume: More distant sound sources have a lower volume than closer ones. This aspect is particularly important with familiar sound sources such as speaking people.
Move: Similar to the visual system, there is also the phenomenon of movement parallax with sound: If the listener moves, closer stationary sound sources move past him faster than more distant ones.
Sound reflections: Two types of sound reach our ears in rooms: The primary sound comes directly from the sound source. The secondary or reflected sound comes from reflections of the sound from the sound source on walls. The distance to the sound source can be estimated from the ratio of primary to reflected sound.
Initial time gap

Signal processing

The human ear localizes sound sources in so-called frequency groups . The listening area is divided into about 24 frequency groups, each with a width of 1 Bark or 100 Mel . To determine the direction, the signal components within a frequency group are evaluated together.

The hearing can extract the sound signals of a localized sound source from ambient noise. For example, hearing can concentrate on one speaker when other speakers are speaking at the same time.

This ability, which is also known as the cocktail party effect, means that noises from other directions that could interfere with the perception of the desired sound source are perceived as being greatly attenuated. The signal processing of the hearing achieves improvements in the signal-to-noise ratio of around 9 to 15 dB . Interfering noises from other directions are only heard half to a third as loud as they are in reality.

Localization in closed rooms

In closed rooms, not only does the sound from the direction of the sound source affect the hearing, but also sound reflected from the walls . To determine the direction, however, only the direct sound arriving first, not the reflected sound arriving later, is evaluated by the ear ( law of the first wave front ). This enables the sound source to be determined correctly.

For this purpose, the hearing evaluates strong changes in volume over time in different frequency groups. If there is a sharp increase in volume in one or more frequency groups, it is very likely that it is direct sound from a sound source that is starting or whose signal changes its properties. This short period of time is used by the hearing to determine direction (and also to determine loudness).

Reflections arriving later no longer increase the volume in the frequency groups concerned so much, so that no new direction is determined here.

Once the direction has been recognized, it is used as the direction of the sound source until a new direction determination is possible again due to a stronger volume increase (see Franssen effect ).

Technical applications

stereophony

The Stereofonie makes use of the principles of localization utilized by sounds are generated from several sound sources so that they give the impression to the listener, the sound source being placed at another location ( phantom source ).

With loudspeaker stereophony in a stereo triangle , the locations of the phantom sound sources on the loudspeaker base are localized and indicated by the direction of the audio event as a deflection in percent from the center. Frequency- neutral interchannel level differences and interchannel transit time differences lead to shiftable phantom sound sources through cumulative localization . With loudspeaker stereophony, spectral differences - these are frequency-dependent level differences - should be avoided because they lead to sound discolouration when the sound is incident from the side.
In stereo loudspeaker reproduction, a level difference of approximately Δ L = 18 dB (16 to 20 dB) is required for an audio event direction 100%, corresponding to a 30 ° deflection from the direction of a loudspeaker . The level difference stereophony produces the greatest localization sharpness .
With stereo loudspeaker reproduction, a travel time difference of approximately Δ t = 1.5 ms (1 ms for high frequencies, 2 ms for bass) is required for an audio event direction 100%, corresponding to a 30 ° deflection from the direction of a loudspeaker .

When listening with headphones , binaural sound recordings are used to generate phantom sound sources.

Other uses

The principles of localization are used in the military with directional listeners.

literature

Dieter Stotz: Computer-aided audio and video technology . 2nd Edition. Springer, 2011, ISBN 978-3-642-23252-7 .

Web links

Localization - echo sounder and its location (PDF; 59 kB)
Localization and positioning - is there a difference? (PDF; 42 kB)
Dissertation on the topic of sound localization (PDF; 11 MB)
Sound location; Excerpts from: VDI magazine, vol. 80, no. 33, V. 158.1936, pp. 995–1000