Cascade Correlation

Cascade correlation (English: Cascade Correlation ) is a mathematical method for producing constructive and for the training of artificial neural networks . Kaskadenkorrelation was published in 1990 by Scott E. Fahlman and Christian Lebiere in the work "The Cascade-Correlation Learning Architecture".

The idea behind cascade correlation is not only the training of a neural network, but also the structure of its structure (constructive procedure). The big advantage here is that the size of the neural network adapts to the problem (training data). The method is used in particular in multilayer feedforward networks, which are trained using backpropagation .

Procedure

The algorithm starts with a minimal neural network for the task, which only consists of input cells and output cells. The neurons are completely connected to one another via weights. The weights are now trained with the help of a learning process ( Quickprop ). If no significant improvement in the network error is achieved over several training periods, the training is terminated. The weights found are retained and a new (hidden) neuron is added to the network. The hidden neuron ( candidate unit ) is added to an existing hidden layer or forms a new layer. It receives connections as inputs from all input cells and any previously added hidden cells. The output of the added neuron initially has no connection to the network. The training for the weights of the new neuron now takes place. The aim of the training is to maximize the sum of the amounts of the covariance between , the output of the candidate unit, and the residual error of the output cell k, over all output cells. ${\ displaystyle S_ {j}}$ ${\ displaystyle o_ {j}}$ ${\ displaystyle \ delta _ {k}}$

${\ displaystyle S_ {j} = \ sum _ {k} \ left | \ sum _ {p} (o_ {pj} - {\ bar {o}} _ {j}) \, (\ delta _ {pk} - {\ bar {\ delta}} _ {k}) \ right |}$

Here j is the index of the candidate unit, k is the index of the pattern over all output neurons, p is the index of the pattern, the mean output of neuron j over all pattern p and the mean error of the output cells k over all pattern p. The error is given by the difference between the desired output and the actual output. The goal is to maximize. Therefore the partial derivative of has to be taken for each weight of the cell. ${\ displaystyle {\ bar {o}} _ {j}}$ ${\ displaystyle {\ bar {\ delta}} _ {k}}$ ${\ displaystyle \ delta _ {k}}$ ${\ displaystyle S_ {j}}$ ${\ displaystyle S_ {j}}$ ${\ displaystyle w_ {ij}}$

${\ displaystyle {\ frac {\ partial S_ {j}} {\ partial \, w_ {ij}}} = \ sum _ {k} \ sum _ {p} \ sigma _ {k} \ cdot f '_ { act} (net_ {pj}) \, o_ {pi} \, (\ delta _ {pk} - {\ bar {\ delta}} _ {k})}$

With the definition:

${\ displaystyle \ sigma _ {k} = sgn \ left [\ sum _ {p} (o_ {pj} - {\ bar {o}} _ {j}) \, (\ delta _ {pk} - {\ bar {\ delta}} _ {k}) \ right]}$

and

${\ displaystyle o_ {pj} = f_ {act} \ left (net_ {pj} \ right)}$

Once the partial derivative for each weight of the candidate unit has been determined, a gradient ascent can be performed. The training continues until there is no more significant change . The output of the added neuron now delivers a maximum output signal when the error in the rest of the network is greatest. The weights of the candidate unit are now permanently frozen and the output of the cell is completely connected to each output neuron present. A new training of the weights of all output neurons is now started. If the desired output error has not yet been reached after the training, another hidden neuron is fed into the network and trained, as described above. ${\ displaystyle S_ {j}}$

The picture below shows a cascade correlation network with three hidden neurons. The weights are shown here as squares on the connection points between the outputs and inputs of the neurons. The on neuron is used to set the response threshold for all neurons in the network; its output always has the value 1.

Cascade Correlation Network

With the cascade correlation algorithm, hidden cells are trained only once and the weights found are no longer changed, while the weights for the output neurons are readjusted in each step.

The method converges relatively well, since only one layer of the network is trained at a time. This avoids the problem in complex networks that all neurons simultaneously adapt to a given input pattern through training, while they are only supplied with local information about the error propagated back via the direct neighbors.

Implementations

Cascade correlation implementation can be found in various software applications, including:

MemBrain
SNNS (JavaNNS)
Tlearn

literature

^ Scott E. Fahlman, Christian Lebiere: The Cascade Correlation Learning Architecture. (PDF) August 29, 1991, accessed December 24, 2015 .

[CCLA-1] Scott E. Fahlman, Christian Lebiere: The Cascade Correlation Learning Architecture. (PDF) August 29, 1991, accessed December 24, 2015 .