Dragon NaturallySpeaking

from Wikipedia, the free encyclopedia
Nuance Dragon

Dragon Naturally Speaking Logo.png
Basic data

developer Nuance Communications
Current  version 15 (Professional Individual and Group)
(April 2018 and May 2017)
operating system Windows
category voice recognition
License Proprietary
German speaking Yes
https://www.nuance.com/dragon.html

Dragon NaturallySpeaking is a speech recognition - software that converts the spoken word into text on the screen or control commands for the computer. Dragon NaturallySpeaking is available in various editions for private and professional users; independent providers provide additional specialist vocabulary. Since version number 14 (only used internally), the designation "NaturallySpeaking" is no longer used by the manufacturer. Instead, it is marketed under the name "Dragon", followed by the name of the edition such as B. "Dragon Professional Individual". The similar variant, written for the macOS operating system , was last called "Dragon Professional Individual for Mac", but has not been sold since October 2018. The term Dragon NaturallySpeaking is initially retained here for better delimitation.

Beginnings

The beginnings of the program and the original manufacturer are based on a prototype of speech recognition software developed by James and Janet Baker in the late 1970s and early 1980s while they were initially at Carnegie Mellon University and later at an IBM research center were active. The Bakers founded Dragon Systems in May 1982. The predecessor of Dragon NaturallySpeaking was the software DragonDictate, which was written for DOS and did not yet enable continuous speech recognition. Dragon NaturallySpeaking 1.0 was released in 1997. In 2000, the company was taken over by Lernout & Hauspie . From its bankruptcy estate, the American company ScanSoft acquired the rights to it in 2005, which is now called Nuance Communications .

functionality

Dragon NaturallySpeaking is software for speech recognition on the PC. The software converts utterances that are spoken into a microphone connected to the computer into text or control commands. It is a speaker-dependent front-end system (which requires adaptation to the user), i.e. one in which the language is converted into text on the user's computer and is visible immediately after the utterance has been dictated (“what you say is what you see "). In relation to the voice recognition function of smartphones, in which the acoustic information sent over the Internet is converted on central servers and the text is then transmitted back, this results in clear advantages in terms of speed and accuracy of the conversion and the ability to adapt to vocabulary and Needs of the user. Depending on the version, DragonNaturally Speaking also supports the conversion of previously recorded dictations (with a dictation machine or a recording program).

The acoustic signals are digitally scanned for implementation - to put it simply - and classified according to characteristics within the framework of an "acoustic model", which enable an approximate assignment to sounds. The selection is made statistically using different variants of hidden Markov models . Starting with version 15, Dragon prides itself on using a new speech recognition engine using " deep learning ". This acoustic model is adapted to the voice of the respective speaker during initial training (which is no longer necessary in current versions) and continuously during use, in particular by correcting recognition errors. For the “recognized” sounds, statistical hypotheses are then made about the most likely spoken words. In the case of similar or identical sounding sounds / words, the software decides on the basis of multiple word sequences within the speaker's utterance which result appears as text on the screen. The basis for this is a language model (linguistic model) that describes these probabilities. Details are explained in the article on speech recognition . The recognition process usually runs so quickly in the background on current hardware that the spoken text appears on the screen almost immediately after the utterance is finished.

In the delivery state, the software contains standard language models for the respective input language, which are based on an analysis of the probability of word sequences within a very large text corpus by the manufacturer. When setting up the software on the user's PC - creating a user profile - this standard language model can be adapted to the user's writing style by analyzing existing texts. This also takes place continuously during use (so-called model optimization). For this continuous improvement of the linguistic model (but also for that of the acoustic model), the consistent correction of incorrectly recognized words and word combinations by means of the corresponding program functions is particularly important. Dragon NaturallySpeaking uses connections of up to four words, so-called quadgrams, in the language model "BestMatch IV" prevailing in the older version 11. From version 12 onwards, Dragon set up user profiles with the language model "BestMatch V" on appropriately powerful PCs (multi-core processors and working memory of more than 2 GB RAM), which should analyze five-word sequences. With version 15, the language model was again referred to as "BestMatch IV", which is supposed to be related to the changed recognition technology.

The language model works exclusively according to statistical methods, not according to grammatical rules. Due to this functionality, the recognition accuracy is best when connected utterances are spoken, preferably whole long sentences. Accordingly, the software is geared towards the recognition of well-structured language, as is typical for dictating letters, reports and other factual texts, but not for the implementation of recorded everyday verbal utterances with many sentence breaks, omissions and fillings, and certainly not for direct implementation of conversations of several speakers in text.

The language model of Dragon NaturallySpeaking is based on a supplied vocabulary (word lexicon) which contains approx. 150,000 word forms (in the active foreground vocabulary) in the delivery state. Since the software does not apply any grammatical rules, not only the word stems but all individual word forms are stored in the vocabulary. This vocabulary can be supplemented user-specifically by analyzing your own texts for unknown words and word forms, but also by correcting recognition errors by approx. 150,000 word forms. In order to keep the speed of implementation within an acceptable range, the vocabulary is divided into different "slots", ie a foreground vocabulary and a background vocabulary (the size of which is estimated at around 250,000–300,000 entries). For active access, only the foreground vocabulary is kept in the working memory; words from the background vocabulary are added after they have been used once (and recognized incorrectly and then corrected).

The software's language model is geared towards a specific language, which means that it is not possible to dictate texts in different input languages ​​with the same user profile. In order to dictate in another language, a corresponding separate user profile must be created and called up. The German version of Dragon NaturallySpeaking enables the creation of user profiles in German and English . The software is also available for Spanish , French , Italian , Dutch and Japanese , but not in the form of individual modules, but in the form of separate versions. Common foreign words are included in the supplied vocabulary; The addition and reliable recognition of further foreign words whose pronunciation does not correspond to the usual sound in German can be done by the user by storing such words in an onomatopoeic "spoken form" in the lexicon (example entries: written form "breakage", spoken for example "Brehkitsch", or: written form "CIA", spoken "Ssie ei äi").

The name of the software "NaturallySpeaking" used up to version 13 is derived from the property of continuous speech recognition. In contrast to speech recognition systems that were used until the mid-1990s, and in contrast to the predecessor DragonDictate, the speaker does not have to make unnatural pauses between the individual words (discrete speech), but can speak continuously. The software can determine the (probable) word boundaries from the sound sequences using the methods described. Nonetheless, a structured, clear (but not exaggeratedly articulated) and fluent way of speaking is the best guarantee of success (the manufacturer recommends using the language of news anchors as a guide).

System requirements and features

Dragon NaturallySpeaking runs under the Windows operating system from Windows XP , under 64-bit Windows from version 10.1. For macOS , Nuance sold software based on the same speech recognition core until September 2018, which was called Dragon Dictate up to version 4, but should not be confused with the above-mentioned predecessor of Dragon NaturallySpeaking and in the last available version 6 "Dragon Professional Individual for Mac ”. This macOS version lags behind the Windows versions of Dragon in the functionality of corrections and control options for the computer. Nuance stopped selling and supporting the macOS version in October 2018

Dragon NaturallySpeaking does not run on operating systems with a Linux core (e.g. Ubuntu, Red Hat, openSuSE etc.).

From version 11, NaturallySpeaking uses a multipass technique on multi-core processors, in which the same utterance is analyzed in parallel on two processor cores and the most likely utterance is determined using different hidden Markov models in order to increase reliability . In order to keep enough computer capacity for other tasks, especially the target applications into which dictations are made, it is recommended to use modern processors. The processor and size of the main memory as well as a sufficiently large 2nd or 3rd level cache also have a considerable influence on the speed of implementation. With a powerful current PC, the text usually appears immediately after an utterance has been pronounced.

Although the program makes relatively high demands on the size of the main memory and the capacity of the processor, the user interface is an inconspicuous "dragon bar", which can also be completely hidden. The concept is that the user dictates directly into target applications such as word processing programs, in which the spoken text then appears without keyboard input. Compatible application programs can also be controlled by spoken commands (e.g. saving or printing documents, formatting); Last but not least, these features are appreciated by users with restricted mobility. To communicate with application programs, Dragon NaturallySpeaking uses the MSAA (Microsoft Active Accessibility) interface and the Microsoft language application interface SAPI 4 (not the successor version 5). The full set of commands for controlling applications is therefore only available in compatible application programs such as Microsoft Word (version 2013 is only compatible from NaturallySpeaking 12.5, version 2016 from Dragon Professional Individual or 14) or Internet Explorer , in the software as a "standard window" or "Full Text Control Window" (also called Select-and-Say in earlier versions). Other software such as OpenOffice Writer, Mozilla Firefox or Mozilla Thunderbird are partially supported. Browser- based cloud applications such as Outlook.com are only partially supported, not e.g. B. the Microsoft Office Web Apps .

Dragon NaturallySpeaking also has its own simple word processing program "DragonPad", which is functionally similar to Microsoft WordPad , as well as a dictation window that can be used to transfer dictated text to incompatible target applications. In addition to compatible application programs, you can use Dragon NaturallySpeaking to control the Windows interface with voice commands ( only to a limited extent on the Windows 8 start screen ).

Recognition accuracy

The software requires an initial, approximately five-minute speaker training, which can also be skipped from version 9, as well as an analysis of the speaker's own texts if possible. With a well-trained profile, the recognition rate is currently more than 98 percent, depending on the quality of the hardware and the clarity of the speech. The recognition accuracy can also be positively influenced by using a microphone that is better than the one supplied by the manufacturer.

Traditionally, the earlier a limited technical vocabulary is used (for example with doctors or lawyers), the better the recognition rate. Due to the increased performance of the program and the hardware, with the current versions there is practically no longer any need to use separate vocabularies for certain subject areas. However, it is still true that words that are not already present in the vocabulary cannot be recognized correctly.

An exception is (in the German version) the function for the automatic formation of compounds. Typical components of compound words are additionally provided with features in the vocabulary, according to which they are combined with other words to form compound words (possibly with fugues-s) if these are dictated immediately before or after. This function is also controlled statistically and therefore sometimes supplies incorrect compound words, e.g. B. with "compound words".

Such cases are among the few in which recognition errors are noticed by a spell check in the target application - in contrast to incorrectly recognized words such as in the (fictitious) example: "The trainee went into the void ". Proofreading of texts dictated by speech recognition is therefore recommended, which the manufacturer expressly points out in the license agreement.

Versions

Legend: Old version Older version; still supported Current version Current preliminary version Future version
version publication Editions
Older version; no longer supported: 1.0 June 1997 staff
Older version; no longer supported: 2.0 November 1997 Standard, Preferred, Deluxe
Older version; no longer supported: 3.0 October 1998 Point & Speak, Standard, Preferred, Professional (optional legal or medical add-ons)
Older version; no longer supported: 4.0 August 4, 1999 Essentials, Standard, Preferred, Professional, Legal, Medical, Mobile
Older version; no longer supported: 5.0 August 2000 Essentials, Standard, Preferred, Professional, Legal, Medical
Older version; no longer supported: 6.0 November 15, 2001 Essentials, Standard, Preferred, Professional, Legal, Medical
Older version; no longer supported: 7.0 March 2003 Essentials, Standard, Preferred, Professional, Legal, Medical
Older version; no longer supported: 8.0 November 2004 Essentials, Standard, Preferred, Professional, Legal, Medical
Older version; no longer supported: 9.0 July 2006 Standard, Preferred, Professional, Legal, Medical, SDK client, SDK server
Older version; no longer supported: 9.5 January 2007 Standard, Preferred, Professional, Legal, Medical, SDK client, SDK server
Older version; no longer supported: 10.0 August 7, 2008 Essentials, Standard, Preferred, Professional, Legal, Medical
Older version; no longer supported: 10.1 March 2009 Standard, Preferred, Professional, Legal, Medical
Older version; no longer supported: 11.0 August 24, 2010 Home, Premium, Professional, Legal
Older version; no longer supported: 11.0 March 2011 Medical
Older version; no longer supported: 11.5 June 2011 Premium
Older version; no longer supported: 11.5 July 2011 Home, Professional, Legal
Older version; no longer supported: 12.0 August 2012 Home, Premium
Older version; no longer supported: 12.0 September 2012 Professional, Legal
Older version; no longer supported: 12.0 December 2012 Medical practice
Older version; no longer supported: 12.5 February 2013 Home, Premium, Professional, Legal
Older version; no longer supported: 13.0 August 2014 Home, Premium
Older version; no longer supported: 13.0 October 2014 Professional, Legal
Older version; no longer supported: 14.0 September 2015 Professional Individual, Group
Older version; no longer supported: 14.0 April 2016 Professional Group, Legal Group, Legal Individual
Older version; no longer supported: 15.0 October 2016 Professional Individual
Current version: 15.0 May 2017 Professional Group
Current version: 15.1 November 2017 Professional Group, Legal Group (only volume licensing (VLA))
Current version: 15.3 February 2018 Professional Group, Legal Group
Current version: 15.4 April 2019 Professional Group, Legal Group (only volume licensing (VLA))
Current version: 15.5 November 2019 Professional Group, Legal Group

Web links

Individual evidence

  1. https://www.nuance.com/dragon/support/professional-individual-for-mac-eol.html
  2. History of Dragon Systems , accessed July 2, 2011
  3. ^ "Speechless" (portrait of Janet McIver Baker), Tufts University Magazine, Fall 2012 , accessed October 6, 2012
  4. https://shop.nuance.de/store/nuanceeu/de_DE/Content/pbPage.microsite-dragon-professional?currency=EUR&pgmid=95401100&keyword=dragon+professional+individual+15-e#whatsnew
  5. Directory of the language versions on the manufacturer's website (English), accessed on February 23, 2013
  6. https://www.nuance.com/dragon/support/professional-individual-for-mac-eol.html
  7. Frequently Asked Questions . Archived from the original on August 6, 2015. Retrieved August 11, 2015.