Web Speech API

The Web Speech API is a specification of the Speech API Community Group within the W3C to enable the use of functions for speech synthesis and speech recognition using JavaScript in web browsers .

General

Although the functions for speech synthesis and recognition are described in a common specification, they are independent of each other. They can therefore be implemented individually in browsers or used by websites.

How the functions are made available is not specified. For example, Google Chrome also uses online functions provided by Google, while Firefox uses local services.

Opening the microphone for speech recognition theoretically makes it possible to spy on a user. Therefore, this function is only started with the consent of the user. While the voice recording is running, a corresponding message is displayed with the option to switch the function off again.

Speech synthesis

The speech synthesis functions are available via the singleton speechSynthesis and the class SpeechSynthesisUtterance . To use it, create a new SpeechSynthesisUtteranceobject with the text that is to be spoken. Then you can define further parameters, especially the language code , but also information on speaking speed, pitch, etc. This object is passed to the function speechSynthesis.speak. It is then queued and output when it is its turn. There are other functions for handling the queue, with which the speech output can be paused or canceled entirely. The current status can be tracked via events . In addition to plain text, the use of SSML is also intended.

example

The following code speaks the text " Hello World ".

var utterance = new SpeechSynthesisUtterance('Hallo Welt');
utterance.lang = 'de'; //Sprache auf Deutsch festlegen
speechSynthesis.speak(utterance);

voice recognition

The speech recognition functions are available through the class SpeechRecognition. First you create a new SpeechRecognitionobject. This can be configured by specifying, for example, the language or a grammar according to which the recognition should take place. About the start- method detection can be started. As soon as a result is available, a corresponding event is triggered through which the recognized text, possible alternatives and data for their confidence are available.

example

The following code outputs the spoken text in a message window.

var recognition = new SpeechRecognition();
recognition.lang = 'de'; //Sprache auf Deutsch festlegen
recognition.onresult = function (event) {
  if (event.results.length > 0) {
    alert(event.results[0][0].transcript); //erstes Ergebnis ausgeben
  }
};
recognition.start();

Browser support

Speech synthesis is available in Google Chrome from version 33, in Apple Safari from version 7, Opera from version 27 and Microsoft Edge from version 14. Mozilla Firefox theoretically supports speech synthesis from version 31, but the function is deactivated by default. Support from the operating system is also required. This was initially only available on Firefox OS (version 2.0 or higher), where the function is also activated. With version 42 support for Windows was implemented, with version 44 for Mac OSX and Linux, but still deactivated. From version 47, Firefox can use the function - if it is activated - internally in the reading view to read out websites. The function was activated by default with version 49.

Speech recognition works in Chrome and Opera, but still with a manufacturer prefix and without grammar support. Theoretically, there is also a partial implementation in Firefox, but the interface to allow speech recognition is missing, so that it can only be used in Firefox OS (from version 2.5).

For other browsers and older versions there are polyfills and alternative implementations with a comparable range of functions. These are either based on online services or are generated from existing programs by means of emscripts .

Individual evidence

↑ ^a ^b Chris Mills: Firefox and the Web Speech API. In: Mozilla Hacks. January 21, 2016, accessed September 8, 2016 .
↑ Can I use: Speech Synthesis API. Retrieved September 8, 2016 .
↑ Firefox 42 for developers. In: Mozilla Developer Network. Retrieved September 8, 2016 .
↑ Firefox 44 for developers. In: Mozilla Developer Network. Retrieved September 8, 2016 .
↑ Sören Hentzschel: Firefox 47 can read articles. March 8, 2016, accessed September 8, 2016 .
↑ Firefox 49 for developers. Others. In: Mozilla Developer Network . Retrieved September 21, 2016 .
↑ janantala: speech-synthesis. In: GitHub . Retrieved September 8, 2016 .
↑ Norbert Landsteiner: meSpeak.js . syl22-00: Pocketsphinx.js . Retrieved September 8, 2016.

Web links

specification
Web Speech API in the Mozilla Developer Network

[:0-1] Chris Mills: Firefox and the Web Speech API. In: Mozilla Hacks. January 21, 2016, accessed September 8, 2016 .

[2] Can I use: Speech Synthesis API. Retrieved September 8, 2016 .

[3] Firefox 42 for developers. In: Mozilla Developer Network. Retrieved September 8, 2016 .

[4] Firefox 44 for developers. In: Mozilla Developer Network. Retrieved September 8, 2016 .

[5] Sören Hentzschel: Firefox 47 can read articles. March 8, 2016, accessed September 8, 2016 .

[6] Firefox 49 for developers. Others. In: Mozilla Developer Network . Retrieved September 21, 2016 .

[7] tala: speech-synthesis. In: GitHub . Retrieved September 8, 2016 .

[8] Norbert Landsteiner: meSpeak.js . syl22-00: Pocketsphinx.js . Retrieved September 8, 2016.