Speech Recognition Grammar Specification

The Speech Recognition Grammar Specification (SRSG) is a W3C standard that describes how speech recognition grammars (Engl. Speech recognition grammars ) can be specified. A speech recognition grammar is a series of word schemes that tell the speech recognition system what a person would say. For example, when one invokes an automated attendant system, the speech recognition system would ask for the name of the person to speak to. A speech recognition program is then called which has a speech recognition grammar. This grammar contains the names of everyone in the directory and the various sentence patterns that callers typically use to call.

SRGS specifies two different but logically equivalent syntaxes, one is XML- based, the other uses the augmented BNF . In practice, however, the XML syntax is used more often.

If the speech recognition program were to return only a string of the spoken words, the speech software would have to do the very tedious work of extracting the semantic meaning from the words. For this reason, SRGS grammars can be designed with tag elements which, when executed, generate the semantic result. SRGS does not specify the content of these tag elements: this is done in cooperation with the W3C standard Semantic Interpretation for Speech Recognition (SISR). SISR is based on ECMAScript and ECMAScript statements within the SRGS tags generate an ECMAScript semantic result object that can be easily processed by the voice application.

Both SRGS and SISR are W3C recommendations, i.e. at the final stage on the way to the W3C standard. The W3C VoiceXML standard, which defines how voice dialogues are specified, is heavily based on SRGS and SISR.

Examples

Here is an example of the augmented BNF form of SRGS as it might appear in a language directory application:

#ABNF 1.0 ISO-8859-1;

// Standard-Grammatiksprache ist US-Englisch
language en-US;

// Single language attachment to tokens
// Note that "fr-CA" (Canadian French) is applied to only
//  the word "oui" because of precedence rules
$yes = yes | oui!fr-CA;

// Single language attachment to an expansion
$people1 = (Michel Tremblay | André Roy)!fr-CA;

// Handling language-specific pronunciations of the same word
// A capable speech recognizer will listen for Mexican Spanish and
//   US English pronunciations.
$people2 = Jose!en-US | Jose!es-MX;

/**
 * Multi-lingual input possible
 * @example may I speak to André Roy
 * @example may I speak to Jose
 */
public $request = may I speak to ($people1 | $people2);

Here is the same SRGS example as an XML form:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
                  "http://www.w3.org/TR/speech-grammar/grammar.dtd">
  
<!-- the default grammar language is US English -->
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar 
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0">
  
  <!--
     single language attachment to tokens
     "yes" inherits US English language
     "oui" is Canadian French language
  -->
  <rule id="yes">
    <one-of>
      <item>yes</item>
      <item xml:lang="fr-CA">oui</item>
    </one-of>
  </rule> 
  
  <!-- Single language attachment to an expansion -->
  <rule id="people1">
    <one-of xml:lang="fr-CA">
      <item>Michel Tremblay</item>
      <item>André Roy</item>
    </one-of>
  </rule>
  
  <!--
     Handling language-specific pronunciations of the same word
     A capable speech recognizer will listen for Mexican Spanish 
     and US English pronunciations.
  -->
  <rule id="people2">
    <one-of>
      <item xml:lang="en-US">Jose</item>
      <item xml:lang="es-MX">Jose</item>
    </one-of>
  </rule>
  
  <!-- Multi-lingual input is possible -->
  <rule id="request" scope="public">
    <example> may I speak with André Roy </example>
    <example> may I speak with Jose </example>
  
    may I speak with
    <one-of>
      <item> <ruleref uri="#people1"/> </item>
      <item> <ruleref uri="#people2"/> </item>
    </one-of>
  </rule>
</grammar>

Web links


recommendations	ActivityPub • ARIA • Canonical XML • CDF • CSS • DOM • Geolocation API • HTML • HTML5 • InkML • ITS • JSON-LD • MathML • OWL • P3P • PLS • PNG • RDF • RDF schema • RIF • SCXML • SISR • SKOS • SMIL • SOAP • SRGS • SSML • SVG • SPARQL • Timed Text • Turtle • VoiceXML • WSDL • XForms • XHTML • XHTML + RDFa • XInclude • XLink • XML • XML Base • XML Encryption • XML Events • XML Information Set • XML Namespace • XML Schema • XML Signature • XPath • XPointer • XProc • XQuery • XSL • XSL-FO • XSLT ( elements )
Working drafts and candidates	CCXML • CURIE • SMIL Timesheets • sXBL • WICD • XFrames • XBL • XHTML + MathML + SVG • XMLHttpRequest
Remarks	XAdES • XFDL • XHTML + SMIL • XUP
Guidelines	Web Content Accessibility Guidelines • Multimodal Interaction Activity • Markup Validation Service
Initiatives	Web Accessibility Initiative

Speech Recognition Grammar Specification

Examples

See also

Web links