Multi-Stimulus Test with Hidden Reference and Anchor

from Wikipedia, the free encyclopedia

The Multi-Stimulus Test with Hidden Reference and Anchor ( MUSHRA ) is a hearing test for the comparative assessment of the audio quality of different audio samples. It is used to test the transmission quality of systems for lossy audio transmission. Compared to the older Mean Opinion Score (MOS), it should be able to deliver statistically significant results even with a smaller number of test persons. The radio sector of the International Telecommunication Union (ITU-R) officially recommends the test in recommendation number BS.1534-3. for transmissions or coding techniques of medium audio quality; Recommendation BS.1116-3, which describes a double-blind hearing test method, applies instead to transmission systems with transparent coding, i.e. artifacts that are not or barely perceptible . For signals of telephone quality (lower than the quality tested in MUSHRA) the recommendation of ITU-T P.800 applies.

In the MUSHRA test, the uncoded original is presented together with several encoded versions of the same signal. The listener should rate the coded signals on a scale from 0 to 100 MUSHRA points. To do this, he or she can switch back and forth between all signals or concentrate on a shorter section of the signal and hear it repeatedly. During the evaluation, the difference between the coded signals and the original should be evaluated. The signals to be evaluated include a further copy of the uncoded original (the hidden reference) and several anchor signals. These are mostly band-limited signals with a bandwidth of 3.5 kHz and 7 kHz. They serve to ensure that the scale is used similarly in repeated tests or tests in different laboratories and that biases (distortions of the results) are avoided.

In contrast to tests recommended by ITU-T P.800, both the MUSHRA tests and the BS.1116 tests carry out the quality assessment by trained expert listeners. Expert listeners are generally more critical than untrained listeners and are better able to reproduce their results. In general, they have a lower standard deviation, which is why fewer listeners are required than in tests with untrained listeners. In addition, expert listeners compare more between the individual signals and focus more often on shorter sections of the signal.

It is believed that the preferences of expert listeners and untrained listeners are similar. However, expert listeners weight spatial artifacts somewhat more heavily than untrained listeners.

Possible criteria for whether someone is an expert listener are how well he / she is able to reproduce his / her results and whether he / she hears differences between the different signals. If the hidden reference (i.e. the uncoded original) is repeatedly rated with less than 90 MUSHRA points, this is also an indication of an unreliable listener.

Language material presented in P.800 tests must be in the listener's native language, as listeners with less language skills rate the audio quality as worse than native speakers or listeners who speak the language fluently. In contrast to this, speech signals in a foreign language can also be assessed in MUSHRA tests, since the listeners have the opportunity to hear the signals several times and compensate for any difficulties in perceiving the artifacts by hearing these signals longer and more between the two Compare individual coded versions and the original.

literature

swell

  1. http://www.itu.int/rec/R-REC-BS.1534
  2. https://www.itu.int/rec/R-REC-BS.1116
  3. a b ITU-T: P.800: Methods for subjective determination of transmission quality. Retrieved July 2, 2017 .
  4. Zielinski, Slawomir, Rumsey, Francis, Bech, Søren: On Some Biases Encountered in Modern Audio Quality Listening Tests-A Review . In: Journal of the Audio Engineering Society . tape 56 , no. 6 , June 15, 2008 ( aes.org [accessed July 2, 2017]).
  5. Zielinski, Slawomir: On Some Biases Encountered in Modern Audio Quality Listening Tests (Part 2): Selected Graphical Examples and Discussion . In: Journal of the Audio Engineering Society . tape 64 , no. 1/2 , February 5, 2016 ( aes.org [accessed July 2, 2017]).
  6. Schinkel-Bielefeld, Nadja, Lotze, Netaya, Nagel, Frederik: Audio quality evaluation by experienced and inexperienced listeners . In: Proceedings of Meetings on Acoustics . tape 19 , no. 1 , May 14, 2013, p. 060016 , doi : 10.1121 / 1.4799190 ( scitation.org [accessed July 2, 2017]).
  7. Francis Rumsey, Slawomir Zielinski, Rafael Kassier, Søren Bech: Relationships between experienced listener ratings of multichannel audio quality and naïve listener preferences . In: The Journal of the Acoustical Society of America . tape 117 , no. 6 , May 31, 2005, ISSN  0001-4966 , p. 3832–3840 , doi : 10.1121 / 1.1904305 ( scitation.org [accessed July 2, 2017]).
  8. Lorho, Gaëtan, Le Ray, Guillaume, Zacharov, Nick: eGauge — A Measure of Assessor Expertise in Audio Quality Evaluations . Ed .: 38th Conference of the Audio Engineering Society. June 13, 2010 ( aes.org [accessed July 2, 2017]).
  9. Blašková, Ľubica, Holub, Jan: How Do Non-native Listeners Perceive Quality of Transmitted Voice? In: Communications . tape 10 , no. 4 , 2008, ISSN  1335-4205 , p. 11-15 ( researchgate.net [accessed July 1, 2017]).
  10. Schinkel-Bielefeld, Nadja, Jiandong, Zhang, Yili, Qin, Leschanowsky, Anna Katharina, Shanshan, Fu: Is it Harder to Perceive Coding Artifact in Foreign Language Items? - A Study with Mandarin Chinese and German Speaking Listeners . Ed .: 142nd Convention of the Audio Engineering Society, Berlin. Paper # 9739, May 11, 2017 ( aes.org [accessed July 2, 2017]).