NETtalk

NETtalk is an artificial neural network that was created by Terrence J. Sejnowski and Charles Rosenberg in the mid- 1980s and converts written (English-language) text into a coding of the pronunciation ( i.e. graphemes are converted into phonemes through speech synthesis ).

construction

NETtalk is a multilayer perceptron made up of three layers with seven groups of 29 neurons each in the input layer , 80 neurons in the hidden and 26 neurons in the output layer. Each of the groups in the input layer encodes a letter of the input word (the 29 neurons correspond to the 26 letters of the alphabet and one neuron each for spaces, end of sentence and other punctuation), the fourth group represents the letter whose phoneme the network is to determine , the remaining groups represent the context of the three preceding and following letters, which is essential for correct determination.

Correct grapheme-phoneme combinations were used to train the network, so it is a method of supervised learning .

power

After 50 training runs on a data set of 1024 words, the network achieved an accuracy of 95% on the training data and 78% on the test data.

influence

In the 1980s, NETtalk was one of the high profile applications that re-led many scientists to conduct research into connectionism . However, critics doubt that this was due to the quality of the architecture (similar successes could also be achieved with 'conventional' programs). Rather, reference is made to the presentation of the network's learning process: the phonemes output by the network were output as spoken language, so the program began with an incomprehensible sequence of sounds and gradually improved to understandable language. In addition, a high-pitched voice was used for this presentation, giving the audience the impression that a child was learning to speak.

Sound sample

nettalk.mp3 , accessed October 1, 2018.

literature

Sejnowski, T., Rosenberg, C. (1986) NETtalk: A Parallel Network That Learns to Read Aloud. (Technical Report JHU / EEC-86/01.) Baltimore, MD: Johns Hopkins University.
Sejnowski, T., Rosenberg, C. (1987) Parallel networks that learn to pronounce English text. Complex Systems, 1, 145-168 (PDF file; 2.60 MB)