Microprosody

from Wikipedia, the free encyclopedia

The Mikroprosodie is a portion of the prosody and deals with the analysis of microscopic variations in the amplitude and frequency of a speech signal. Mainly the effects of jitter and shimmer known from transmission technology are examined . The analysis results are e.g. B. relevant in the early detection of larynx diseases or in speaker recognition .

Jitter and shimmer in micro prosody

Shimmer describes the superposition of the fundamental frequency of a speech signal with a noise, so that irregularities in the amplitude occur. A similar effect, which often occurs together with shimmer, is jitter, an irregularity in the fundamental frequency or the period of a speech signal.

Shimmer

Shimmer is defined as the average difference (in dB ) between successive amplitudes of the signal, with amplitude itself being understood as the mean distance between two frequency maxima. The average value for the shimmer of the voice of a healthy person is between 0.05 and 0.22 dB. (According to Haji et al., 1986) Shimmer, in conjunction with the electroglottograph, is well suited for the detection of abnormal vocal fold vibrations (especially in hoarse voices).

Jitter

Jitter is defined as the micro-variation of the fundamental frequency of a voice, also known as F0 . With pathological changes in the voice, the extent of the variation increases, especially with diseases that affect the symmetry (i.e. the tension or mass) of the vocal cords. The jitter value is particularly high at the beginning and at the end of a sustained tone.

introduction

In the case of a long utterance of a vowel, it is noticeable in the oscillogram that the fundamental frequency or the period of the vowel is not strictly periodic, but is superimposed by small disturbances and irregularities (micro-variations). The period is not always the same length (jitter) and the amplitude of the signal fluctuates slightly (shimmer). The effect occurs in all people, not just people with a voice disorder. On average, the signal deviates by 2% from the average period or amplitude. Higher deviations indicate a pathological disorder of the larynx.

Changes in these micro-variations below the 2% mark are difficult to detect by the human ear.

Microprosody in other areas

Automatic analyzes of human prosody should be preceded by examinations of microprosody so that prosody recognition is not falsified. In addition, the microprosody plays an important role in speech recognition and speech synthesis , as they contribute to a natural voice and make it easier to recognize.

See also A, B, and C prosody

Causes of the micro-variations

The influence of the pulse rate

The pulse is a periodic change in the blood supply. This causes a periodic change in volume of the vocal folds and thus also a periodic, superimposed vocal fold movement. Investigations by Orlikoff / Baken show that the fluctuations in the basic frequency actually repeat themselves periodically, the period duration roughly corresponding to the time interval between the pulse beats. In a study by Orlikoff / Baken, the contribution of the pulse rate to the total jitter was 0.5–20.0%, an average of 6.9% for men and an average of 2.4% for women, making a total of 4.6 %. The duration was on average 3.7 µs for men and 0.9 µs for women, i.e. 2.3 µs on average. The musculus thyroarytaenoideus (vocalis), which runs between the thyroid and anterior cartilage, is particularly affected . The problem that the phonation held represents a breathing exercise and therefore the heartbeat also changes was counteracted by seeing each fundamental frequency value relative to the average fundamental frequency value (i.e. per heartbeat ).

Nerve impulses

The appearance of nerve impulses results in a rhythmic contraction of the vocal cords. The impulses in the motor units cause the musculus thyroarytaenoideus to twitch (this has been investigated more closely for this muscle, but according to Titze it can be assumed in a similar way for other laryngeal muscles).

The resulting jitter depends on

  • the number of motor units (many motor units can to a certain extent "compensate" for the twitching of a single unit)
  • the frequency of the impulses (the jitter is less if there are more than 50 stimuli per second, as the muscle does not have enough time to relax and the duration is shortened (tetanus).)
  • the length variation of the motor units (the more different the lengths of the muscle fibers, the greater the jitter; here there is an exponential relationship.)
  • the pulse variation (as with length variation)

Structure of the vocal folds

Another explanation for jitter and shimmer is the structure of the vocal folds or a so-called internal vibration. The smaller and more rigid (rigid) the vocal folds, the lower the micro-variation. This also indicates that the jitter decreases with increasing basic frequency - with which the vocal folds are increasingly tense. Different values ​​of jitter have also been observed for different vowels (see influence of age below).

Influences and dependencies

Tongue movement

The throat area is a highly complex system of ligaments, cartilage and muscles, which can even be influenced by muscle parts that are far away (e.g. body posture affects phonation). The fact that the jitter values ​​for different vowels differ significantly is due, among other things, to the varied tongue position and movement.

gender

The average jitter values ​​for men and women differ, but this is most likely due to the generally higher basic frequency of the female subjects. The gender does not matter.

health

Laryngeal diseases result in increased jitter and shimmer values. But even a cold can affect the speech signal because of the movement of the relatively large amount of mucus on the vocal folds.

Age

Younger people have fewer micro-variations than older people. But a study by Linville (1987) shows that it is necessary to differentiate between the respective vowels. Older women have e.g. B. with / a / a higher jitter than with / i / and / u /, with younger women it is exactly the opposite.

Experiments and measurement methods for determining jitter and shimmer

One way of determining the jitter and shimmer in test subjects is vowel hold tests. Here, test participants have to hold a vowel of a certain volume as long as possible. Target groups could be smokers vs. Non-smoker, singer vs. People with no vocal training or those with larynx disease vs. be healthy people. The test subjects can receive visual feedback via a voltmeter .

This laboratory situation has the advantage that coarticulation and prosodic phenomena, such as those caused by e.g. B., occur in the spoken language, can be excluded.

The utterances can then be digitized using a microphone. Sometimes an electroglottograph (EGG) is also used, which is very well suited to display irregularities in the vibration of the vocal folds, in particular the amplitude. The EGG display facilitates digital analysis; In addition, other aspects are displayed, the meaning of which has not yet been fully clarified (e.g. the type and manner of contacting the vocal folds).

Advantages and disadvantages of microprosody determination as a diagnostic tool

advantages

The advantages of the microprosody determination as a diagnostic means are on the one hand the pleasant, external and non-invasive application (no object is inserted into the pharynx), on the other hand the relatively low cost (in terms of the equipment and its use).

disadvantage

The determination of jitter and shimmer is not always done in a completely uniform manner in research. Different measuring devices and different analysis software can lead to different results. A study by Karnell et al. (1991) shows this very clearly using the example of the voice laboratories of Chicago, Denver and Pine Brook.

Formulas for jitter

  • The percentage jitter factor (JF) (Hollien et al., 1973): the (average deviation from the period duration * 100) divided by the average period of the signal
  • The Pitch Perturbation Quotient (PPQ) (Davis, 1976) as the ratio of the sum of period differences with a moving period mean to the mean period duration
  • and the Directional Perturbation Factor (DPF) (Hecker / Kreul, 1971) as the number of sign changes divided by the number of possible sign changes, which is independent of the individual fundamental frequency using the observed sign changes (for differences in consecutive periods) and the possible sign changes.

See also

literature

  • Haji, T. et al. (1986) Frequency and amplitude perturbation analysis of electroglottograph during sustained phonation, JASA, 80: 1, pp. 58-62
  • Higgins, MB; Saxman, JH (1989) A comparison of intrasubject variation across sessions of three vocal fundamental perturbation indices, JASA, 86: 3, 911-916
  • Karnell, MP et al. (1991) Comparison of Acoustic Voice Perturbation Measures Among Three Independent Voice Laboratories, JSHR, 34, 781-789
  • Linville, SE (1988) Intraspeaker variability in fundamental frequency stability: An age-related problem ?, JASA, 83: 2, 741-745
  • Orlikoff, R.-F .; Baken, RJ (1989) The Effect of the Heartbeat on Vocal Fundamental Frequency Perturbation, JSHR, 32: 3, pp. 576-582
  • Schoentgen, J. (1990) Acoustic features of dysphonic voices, Rapport-d'Activites-de-l'Institute-de-Phonetique, 26, pp. 87-112
  • Titze, I. (1991) A Model for Neurologic Sources of Aperiodicity in Vocal Fold Vibration, JSHR, 34: 3, pp. 460-472