Activation word

from Wikipedia, the free encyclopedia

An activation word ( English hotword or wake word ), sometimes also called wake-up word , wake-up command or trigger word , is a word that - when spoken by the user - is used to activate a voice assistant and then with it - usually verbally and acoustically in (approximately) natural language  - to interact . An activation word is a special voice command in connection with voice control that activates a voice assistant for the purpose of further conversation . Instead of a single word, a phrase (e.g. often in the form of a salutation ) can serve as an activation word .

The activation word can - depending on the language assistance system - either

  • fixed,
  • Can be selected from a list of fixed activation words or
  • can be freely specified by the user.

Some systems can be activated not just with a single activation word, but with several different activation words. These can be devices that have integrated not just one, but several different voice assistants.

Functional sequence

General

The activation word is pronounced by the user in such a way that it is received by the microphones of the terminal of the voice assistance system. If the system recognizes the activation word, it usually gives the user a corresponding feedback that it is now active (i.e. ready for interaction). The user can then start his actual conversation with the assistant and, for example, ask him a question or give him a command .

The computing power required to recognize the activation word is provided offline , i.e. locally by the terminal. With many voice assistants , the actual interaction takes place online , i.e. via an Internet connection via the server of the respective assistant provider. These servers then perform the speech recognition for recognizing the user's request and the speech synthesis for the speech response of the system to the user. There are also systems that work purely offline and therefore do not require an Internet connection.

Depending on the voice assistance system or its configuration, the end device's microphones either listen permanently to the activation word or the microphones are only activated after manual activation, e.g. B. activated by pressing a button. In the latter case, the activation word is omitted because it is replaced here by manual activation.

theory

That functional unit of the voice assistance system that decides on the basis of a received audio signal that could be interpreted as a possible activation word, whether the voice assistance system is activated or not, works as a binary classifier . This classification of the audio signals does not work perfectly in practice, errors occur (which is also due to the use of artificial intelligence ). The following four cases can occur:

  1. Correctly positive : The activation word was spoken and it was correctly recognized as an activation word; the voice assistance system was activated correctly.
  2. False negative : The activation word was spoken, but it was incorrectly not recognized as an activation word; the voice assistance system was wrongly not activated.
  3. False positive : The activation word was not spoken, but it was incorrectly classified as an activation word; the voice assistance system was activated incorrectly.
  4. Correctly negative : The activation word was not spoken and it was correctly not classified as an activation word; the voice assistance system was therefore correctly not activated.

In order to avoid incorrect classifications (here the two cases 2 and 3) as far as possible, it is necessary that the classifier is suitably parameterized. With some language assistance systems, the user can make certain settings himself. For example, the Google Assistant should allow the sensitivity of the response to its activation word to be adapted to the user's preferences.

Challenges and problem areas

Ideal language assistance system

On the one hand, an ideal language assistance system always recognizes its activation word perfectly. On the other hand, it never activates (apart from manual activation) without its activation word.

Non-activation: non-recognition of the activation word

In practical use - depending on the pronunciation of the user or the background noise around the microphones - it can happen that the voice assistance system does not recognize the activation word as such.

The user is then required to try again. Pronouncing the activation word more clearly or louder, speaking more purposefully in the direction of the microphones, and creating a quieter environment with less noise increase the chances of correct detection.

Alternatively, if the system supports it, the user can switch to manual activation.

Incorrect activation: Activation despite the activation word not being spoken

The reverse case can also occur in practice, namely that the voice assistance system reacts and activates not only (correctly) to the activation word, but also incorrectly to similar-sounding words or (other) noises . This can e.g. B. be triggered by music or people talking to each other. Such an incorrect activation can result in problems with online systems, for example with regard to privacy or data protection , because spoken words or ambient noises then unexpectedly and unintentionally end up on the provider's servers.

In order to avoid incorrect activations, it is advisable - as far as the voice assistance system allows - as part of a systematic procedure to choose the activation word so that this activation word and similar-sounding words do not appear in the everyday language of the user.

Differentiation of different users

User independence

Depending on the application, it may be desired that the voice assistance system can be activated equally by different users. Then it has to recognize the activation word despite different dialects , voices , speaking speeds or word stresses .

Differentiation of user roles

In contrast to this, in other scenarios it may be desired that the voice assistance system can only be activated by certain users and not by all others. For example, it can be useful to allow activation only to the driver of a vehicle , but not to passengers. A speaker authentication , i.e. the identification of the individual , may not be necessary for this; it may be sufficient if the system determines the speaker's role, for example based on his seated position, and thereby recognizes the person authorized to activate, in the example the driver, as such. A voice assistance system that cannot identify speakers, but can differentiate between their roles, can offer users different services that are linked to their roles. For example, the driver and front passenger can be differentiated on the basis of their seating position without having to identify them for this purpose, in that the assistant evaluates from which side the activation word or the sound is primarily coming; and the passenger is only allowed to use a limited range of functions of the assistant.

Differentiation of user-individuals

As with a classic multi-user system , it may be desirable for the voice assistance system to be able to differentiate between different users (as individuals) in order to offer them personalized services. Then it must be able to identify the user , e.g. B. based on his voice , the activation word or other parameters. For example, if user A instructs the wizard to record an appointment, this appointment in personal calendar to this user A is entered and not the calendar of another user B land.

Attack scenarios

Audio injection

Voice assistance systems can be attacked and misused through audio injection.

Laser-Based Audio Injection

By irradiating the microphone directly with a laser beam , attackers can succeed in commanding voice assistance systems - in other words, giving them “light commands”. For this purpose, the light intensity of the laser beam is modulated in such a way that it induces electrical signals in the microphone that are very similar to those that would normally result from real voice commands. The activation word and commands to be executed can be transmitted to the voice assistance system inaudibly and even from a distance of tens of meters. Systems with MEMS microphones are particularly susceptible to this type of attack , as they have the property of converting not only sound but also light that is aimed directly at them into electrical signals.

Examples

Examples of voice assistants and their activation words, as well as devices that can use these assistants, are:

Voice assistants and activation words
Voice assistant providers Activation word Examples of using devices
Alexa Amazon
  • "Alexa"
  • "Amazon"
  • "Computer"
  • "Echo"
Google Assistant Google
  • " Hey Google "
  • " Okay, Google "
Siri Apple
  • " Hey Siri "
Bixby Samsung
  • " Hi, Bixby "
Cortana Microsoft
  • " Hey, Cortana "
Hello magenta Deutsche Telekom
  • "Hello Magenta"
  • "Hey Magenta"
  • "Hi Magenta"
  • ("Alexa" ... Amazon)
  • Smart speaker
  • Smart Speaker Mini
Voice Mate LG Electronics (only by pressing a button)
  • certain LG televisions with webOS
Volkswagen ( vw.os )
  • "Hello, ID" (German) or
    " Hello, ID " (English)
Intelligent personal assistant BMW
  • "Hey, BMW"
  • (further customizable)
MBUX Mercedes Benz
  • "Hey, Mercedes"
Jasper Open source - GitHub project
  • "Jasper"
Snips Snips
  • " Hey, Snips "
  • (further customizable)
Fortebit (EasyVR 3 Plus)
  • " Robot " (replaceable)
  • (further customizable)

See also

Individual evidence

  1. a b c d Sven Hansen, Nico Jurran, Stefan Porteck: Language assistants permeate everyday life. heise online , September 13, 2019, accessed on September 23, 2019 .
  2. Axel Kannenberg: Google's voice assistant: employees listen to and evaluate audio recordings. heise online , July 11, 2019, accessed on September 23, 2019 .
  3. a b c Use Siri on all Apple devices. Apple , May 6, 2019, accessed September 22, 2019 .
  4. a b What is Cortana? Microsoft , May 21, 2019, accessed September 22, 2019 .
  5. a b c Bixby: The 4 ways to talk to Bixby. Retrieved September 24, 2019 .
  6. a b Frequently Asked Questions. Amazon , accessed September 23, 2019 .
  7. a b c Turn "OK Google" (hotword) on or off - Pixel Slate Help. Google , accessed September 23, 2019 .
  8. a b c Snips | Voice AI Platform. Retrieved September 22, 2019 .
  9. a b c Hello Magenta: The Telekom Smart Speaker | Telecom. Telekom Deutschland , accessed on September 30, 2019 .
  10. a b data protection portal for Alexa. Amazon , accessed September 23, 2019 .
  11. a b Jasper | Control everything with your voice. Retrieved September 22, 2019 .
  12. ^ Daniel Herbig: Study: "Gilmore Girls" and "The Office" trigger Alexa and Co. particularly often. heise online , February 24, 2020, accessed on April 23, 2020 .
  13. Daniel Herbig: Google Assistant: Wakeword sensitivity can be adjusted soon. heise online , April 22, 2020, accessed on April 23, 2020 .
  14. a b VW ID.3 | First check | Seat sample | Display | Operation - 163 degrees in the Volkswagen ID.3 at the IAA. Retrieved September 24, 2019 .
  15. Dennis Schirrmacher: Security researchers command Alexa, Siri & Co. via laser beam. heise online , November 6, 2019, accessed on November 9, 2019 .
  16. Takeshi Sugawara, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, Kevin Fu: Light Commands: Laser-Based Audio Injection on Voice-Controllable Systems. 2019, accessed on November 9, 2019 .
  17. Help: Change activation word. Amazon , accessed September 22, 2019 .
  18. Help: Use Alexa on your Fire tablet. Amazon , accessed September 22, 2019 .
  19. Denise Bergert: Microsoft: Alexa Xbox Skill comes to Germany. heise online , July 9, 2019, accessed September 30, 2019 .
  20. a b Setup, installation and control. Sonos , accessed October 15, 2019 .
  21. Alexa on Sonos. Sonos , accessed October 15, 2019 .
  22. a b Bernd Mewes: Xbox: Microsoft expands voice control with Google Assistant. heise online , September 27, 2019, accessed on September 30, 2019 .
  23. Google Assistant on Sonos. Sonos , accessed October 15, 2019 .
  24. Devices that support "Hey Siri". Apple , September 6, 2019, accessed October 11, 2019 .
  25. Bixby | Apps & Services. Samsung , accessed September 24, 2019 .
  26. a b Hello Magenta: The Telekom Smart Speaker | Telecom. Telekom Deutschland , accessed on June 10, 2020 .
  27. Order the Telekom Smart Speaker | Telecom. Telekom Deutschland , accessed on June 10, 2020 .
  28. Smart Speaker Mini | Telecom. Telekom Deutschland , accessed on June 10, 2020 .
  29. [LG webOS TV] Using speech recognition. In: FAQ's. LG Electronics , July 21, 2015, accessed October 11, 2019 .
  30. VW ID.3: a new era in e-mobility. In: Volkswagen AG . Porsche Austria, 2020, accessed on August 2, 2020 .
  31. VW ID.3 Multimedia, HUD with augmented reality and displays - IAA Live. Retrieved September 30, 2019 .
  32. The BMW Intelligent Personal Assistant today and tomorrow. In: AutoFrey. January 15, 2019, accessed September 22, 2019 .
  33. André Berton: MBUX Voice Assistant: Understand yourself. Daimler AG , December 28, 2018, accessed on September 22, 2019 .
  34. Heinz Behling: Video tutorial: Voice-controlled infrared and wireless remote control with Arduino. heise online , May 4, 2020, accessed on May 4, 2020 .