Elman network

An Elman network , also Simple recurrent network ( SRN ), in German simple recurrent network , is a simple artificial neural network that is able to implicitly process temporal dependencies of inputs through existing feedback from edges between the artificial neurons . This network architecture is named after Jeffrey L. Elman , who proposed this structure in 1990 .

An artificial neural network is a model from neuroinformatics that is motivated by biological neural networks . Artificial neural networks can learn tasks and are often used where explicit modeling of a problem is difficult or impossible. Examples are face and speech recognition .

Limitations of Simple Models

Many models of artificial neural networks either have no possibility of processing time dependencies of input data or they require a history of inputs through an input window . A Time Delay Neural Network is such a network that explicitly represents the time component of a data stream through the use of windowed inputs.

The parallel presentation of inputs from different points in time has various restrictions. For example, a window with a constant length is unsuitable for signals that have a variable length in time. Among other things, this is a hindrance in speech recognition , since words consist of different lengths of syllables . From the perspective of biological abstraction, there is no biological motivation for the parallel creation of input data of different time steps.

structure

Artificial neural network with a layer with direct feedback

To avoid these problems, Elman proposes a simple structure that has a temporal memory by means of feedback . The network has two layers of neurons , a hidden layer and an output layer. The outputs of the hidden layer are stored in the so-called "context cells". For each neuron of the hidden layer there is a cell that stores the past output of the neuron. The edge contains the constant weighting of 1.

The neurons of the hidden layer contain the input data and their past outputs via the context cells as input. In his work Elman shows that this network structure is implicitly able to process longer input streams invariably over time. The output layer only takes on the mapping of the internal representation of the hidden neurons.

Elman networks are z. B. trained by means of backpropagation . The backward directed edges (feedback) are not adjusted.

Exemplary application

The following simple example is taken from Elman's work. The entry consists of blocks of three binary values each .

In the first two time steps, a randomly selected binary value is presented to the network as an input. The third input is then the result of the XOR operation applied to the two previous inputs.

The task of the network is now to predict the next expected input. It can achieve this for the second random input by calculating the input of the third time step together with the input of the first time step. In order to achieve this behavior, the input is shifted forward by a time step and taught in as a target value (see backpropagation ).

Input:	1	0	1	0	0	0	0	1	1	1	0	1	1	1	0	...
Target value:	0	1	0	0	0	0	1	1	1	0	1	1	1	0	?	...

It can be seen that the detection error decreases with the respective second value. The network thus learns the XOR operation and can calculate the value where possible. With the help of the previous inputs, the network can therefore predict the next input.

Hierarchical Elman network

As a generalization of two-layer Elman networks, there are “hierarchical Elman networks”. They can have more than two layers and contain additional feedback in the individual context cells.

literature

Jeffrey L. Elman : Finding Structure in Time. Cognitive Science , 1990, Volume 14, pp. 179-211, ISSN 0364-0213 ( PDF; 1.8 MB ).