Artificial neuron

An artificial neuron forms the basis for the model of artificial neural networks , a model from neuroinformatics that is motivated by biological neural networks . As a connectionist model , they form an artificial neural network in a network of artificial neurons and can thus approximate functions of any complexity , learn tasks and solve problems where explicit modeling is difficult or impossible to carry out. Examples are face and speech recognition .

As a model based on the biological model of the nerve cell , it can process several inputs and react accordingly via its activation . For this purpose, the inputs are weighted and transferred to an output function which calculates the neuron activation. Their behavior is generally given to them through learning using a learning process .

history

Diagram of a McCulloch-Pitts cell according to Minsky

The beginnings of artificial neurons go back to Warren McCulloch and Walter Pitts in 1943. Using a simplified model of a neural network, the McCulloch-Pitts cell , they show that it can calculate logical and arithmetic functions.

Neural connections of the rodent hippocampus by Ramón y Cajal (1911)

The Hebb rule of learning is described by Donald Hebb in 1949 . Based on the medical research of Santiago Ramón y Cajal , who proved the existence of synapses as early as 1911 , active connections between nerve cells are repeatedly strengthened according to this rule. The generalization of this rule is still used in today's learning processes.

An important work comes out in 1958 with the convergence theorem about the perceptron . There Frank Rosenblatt shows that with the specified learning process he can teach in all the solutions that can be represented with this model.

However, the critics Marvin Minsky and Seymour Papert show in 1969 that a single-stage perceptron cannot represent an XOR operation because the XOR function cannot be linearly separated (linearly separable); only later models can remedy this problem. The limit shown in the modeling initially leads to a decreasing interest in researching artificial neural networks and to a cancellation of research funds.

An interest in artificial neural networks only reappeared when John Hopfield made the Hopfield networks known in 1985 and showed that they were able to solve optimization problems such as the traveling salesman problem . The work on the backpropagation method by David E. Rumelhart , Geoffrey E. Hinton and Ronald J. Williams from 1986 onwards also leads to a revival of research into these networks.

Today such nets are used in many research areas.

Biological motivation

Schematic representation of a nerve cell

Artificial neurons are motivated by the nerve cells of mammals, which are specialized in receiving and processing signals. Signals are passed on electrically or chemically to other nerve cells or effector cells (e.g. for muscle contraction ) via synapses .

A nerve cell consists of the cell body , axon and dendrites . Dendrites are short cell processes that are highly branched and take care of the reception of signals from other nerve cells or sensory cells . The axon functions as the signal output of the cell and can reach a length of 1 m. The transition of the signals takes place at the synapses, which can have an exciting or inhibiting effect.

The dendrites of the nerve cell transmit the incoming electrical excitations to the cell body. If the excitation reaches a certain limit value and exceeds it, the tension is discharged and propagated via the axon ( all-or-nothing law ).

The interconnection of these nerve cells forms the basis for the intellectual performance of the brain . According to estimates, the human central nervous system consists of 10 ¹⁰ to 10 ¹² nerve cells with an average of 10,000 connections - the human brain can therefore have more than 10 ¹⁴ connections. The action potential in the axon can propagate at a speed of up to 100 m / s.

In comparison to logic gates , the efficiency of neurons can also be seen. While gates switch in the nanosecond range (10 ⁻⁹ ) with an energy consumption of 10 ⁻⁶ joules (data from 1991), nerve cells react in the millisecond range (10 ⁻³ ) and only use energy of 10 ⁻¹⁶ joules. Despite the apparently lower values in the processing by nerve cells, computer-aided systems cannot match the capabilities of biological systems.

The performance of neural networks is also demonstrated by the 100-step rule : Visual recognition in humans takes place in a maximum of 100 sequential processing steps - the mostly sequential computers do not provide comparable performance.

The advantages and properties of nerve cells motivate the model of artificial neurons. Many models and algorithms for artificial neural networks still lack a directly plausible, biological motivation. There this is only found in the basic idea of the abstract modeling of the nerve cell.

Modeling

With biology as a model, a solution that can be used for information technology is now being found through suitable modeling . A rough generalization simplifies the system - while maintaining the essential properties.

The synapses of the nerve cell are mapped by adding weighted inputs, and the activation of the cell nucleus by an activation function with a threshold value. The use of an adder and threshold value can already be found in the McCulloch-Pitts cell of 1943.

Components

Representation of an artificial neuron with its elements

Linear separation through the parting line for the conjunction

An artificial neuron with the index and the n inputs, indexed with , can be described by four basic elements: ${\ displaystyle j}$ ${\ displaystyle i}$

Weighting : Each input is given a weight. The weights(inputat neuron) determine the degree of influence that the inputs of the neuron have in the calculation of the later activation . Depending on the sign of the weights, an input can have an inhibitory or exciting ( excitatory ) effect. A weight of 0 marks a nonexistent connection between two nodes. ${\ displaystyle w_ {ij}}$ ${\ displaystyle i}$ ${\ displaystyle j}$

Transfer function: The transfer function calculates the network input of the neuron based on the weighting of the inputs . ${\ displaystyle \ Sigma}$

Activation function: The output of the neuron is ultimately determined by the activation function. The activation is influenced by the network input from the transfer function and a threshold value . ${\ displaystyle \ varphi}$

Threshold: Adding a threshold to the network input shifts the weighted inputs . The designation results from the use of a threshold value function as an activation function, in which the neuron is activated when the threshold value is exceeded. The biological motivation here is the threshold potential in nerve cells. From a mathematical point of view, the parting plane that separates the feature space is shifted by a threshold value with a translation . ${\ displaystyle \ theta _ {j}}$

The following elements are defined by a connection graph:

Inputs: Inputs can result from the observed process, the values of which are transferred to the neuron, or come from the outputs of other neurons. They are also represented like this: ${\ displaystyle x_ {i}}$
Activation or output : The result of the activation function is referred to as activation (o for "output") of the artificial neuron with the index , analogous to the nerve cell . ${\ displaystyle o_ {j}}$ ${\ displaystyle j}$

Mathematical definition

The artificial neuron as a model is usually introduced in the literature in the following way:

First, the artificial neuron is activated (referred to as "network input" or "net" in the illustration above) ${\ displaystyle v}$

{\ displaystyle v = \ sum _ {i = 1} ^ {n} x_ {i} w_ {i} + w_ {0}}

Are defined. Since mathematics generally does not differentiate between the index (0..9) and the number (10), a synthetic input is usually introduced as a mathematical simplification and one writes ${\ displaystyle x_ {0} = 1}$

${\ displaystyle v = \ sum _ {i = 0} ^ {n} x_ {i} w_ {i}}$

{\ displaystyle o = \ varphi (v)}

It is

{\ displaystyle n}

the number of entries

{\ displaystyle x_ {i}}

the entry with the index , both discrete and continuous may be

{\ displaystyle i}

{\ displaystyle w_ {i}}

the weighting of the input with the index

{\ displaystyle i}

{\ displaystyle \ varphi}

the activation function and

{\ displaystyle o}

the output

Activation functions

Different function types can be used as activation functions , depending on the network topology used . Such a function can be non-linear, for example sigmoid , piecewise linear or a step function. In general, activation functions are monotonically increasing . ${\ displaystyle \ varphi}$

Linear activation functions are very limited, since a composition of linear functions can be represented by arithmetic transformations by a single linear function. They are therefore not suitable for multilayer connection networks and are only used in simple models.

Examples of basic activation functions are:

Threshold function

The threshold function ( engl. Hard limit ), as defined hereinafter, taking only the values or at. The value 1 for the input , otherwise . If a threshold is used subtractively , the function is only activated if the additional input exceeds the threshold. A neuron with such a function is also called a McCulloch-Pitts cell . It reflects the all-or-nothing nature of the model. ${\ displaystyle 0}$ ${\ displaystyle 1}$ ${\ displaystyle v \ geq 0}$ ${\ displaystyle 0}$ ${\ displaystyle \ theta}$

{\ displaystyle \ varphi ^ {\ text {hlim}} (v) = {\ begin {cases} 1 & {\ text {if}} v \ geq 0 \\ 0 & {\ text {if}} v <0 \ end {cases}}}

A neuron with this activation function is also represented like this:

Piecewise linear function

The here used piecewise linear function (engl. Piecewise linear ) forms a limited interval from linear, the outer intervals are mapped to a constant value:

{\ displaystyle \ varphi ^ {\ text {pwl}} (v) = {\ begin {cases} 1 & {\ text {if}} v \ geq {\ frac {1} {2}} \\ v + {\ frac {1} {2}} & {\ text {if}} - {\ frac {1} {2}} <v <{\ frac {1} {2}} \\ 0 & {\ text {if}} v \ leq - {\ frac {1} {2}} \ end {cases}}}

A neuron with the piecewise linear function as the activation function is also represented as follows:

Sigmoid function

Sigmoid function with gradient

{\ displaystyle a = 5}

such as

{\ displaystyle a = 10}

Sigmoid functions as activation functions are very frequently used images. As defined here, they have a variable slope that influences the curvature of the function graph . A special property is their differentiability , which is required for some methods such as the backpropagation algorithm: ${\ displaystyle a}$

{\ displaystyle \ varphi _ {a} ^ {\ text {sig}} (v) = {\ frac {1} {1+ \ exp (-av)}}}

The values of the above functions are in the interval . These functions can be defined accordingly for the interval . ${\ displaystyle [0,1]}$ ${\ displaystyle [-1, + 1]}$

A neuron with sigmoid function is also represented like this:

Rectifier (ReLU)

Rectifier activation function

Rectifier as an activation function is used particularly successfully in deep learning models. It is defined as the positive part of their argument.

{\ displaystyle \ varphi (v) = \ max (0, v)}

Examples

Representation of Boolean functions

Boolean functions can be represented with artificial neurons . The three functions conjunction ( and ), disjunction ( or ) and negation ( not ) can be represented using a threshold function as follows: ${\ displaystyle \ varphi ^ {\ text {hlim}}}$

conjunction	Disjunction	negation
Neuron that represents the conjunction	Neuron that represents the disjunction	Neuron that represents the negation

For the conjunction, for example, it can be seen that only for the Boolean entries and the activation ${\ displaystyle x_ {1} = 1}$ ${\ displaystyle x_ {2} = 1}$

{\ displaystyle o = \ varphi ^ {\ text {hlim}} ((w_ {1} \ cdot x_ {1} + w_ {2} \ cdot x_ {2}) - \ theta) = \ varphi ^ {\ text {hlim}} ((1 {,} 0 \ cdot 1 + 1 {,} 0 \ cdot 1) -1 {,} 5) = \ varphi ^ {\ text {hlim}} (0 {,} 5) = 1}

results, otherwise . ${\ displaystyle 0}$

Learning a neuron

Unlike in the previous example, in which the appropriate weightings were selected, neurons can learn the function to be represented. The weights and threshold are initially randomized and then adjusted using a " trial and error " learning algorithm.

Table of values of the logical conjunction
${\ displaystyle x_ {1}}$	${\ displaystyle x_ {2}}$	${\ displaystyle x_ {1} {\ text {and}} x_ {2}}$
0	0	0
0	1	0
1	0	0
1	1	1

In order to learn the logical conjunction, the perceptron criterion function can be used. It adds the values of incorrectly recognized inputs to the weighting in order to improve recognition until as many inputs as possible are classified correctly. The activation function here is the threshold function analogous to the previous example . ${\ displaystyle \ varphi ^ {\ text {hlim}}}$

The learning rate, which determines the speed of learning, is also selected for the learning process . An explicit mention is therefore not required. ${\ displaystyle \ alpha = 1}$

Instead of specifying the threshold value as such, an on neuron ( bias ), i.e. a constant input, is added. The threshold is indicated by the weighting . ${\ displaystyle x_ {0} = 1}$ ${\ displaystyle w_ {0} = - \ theta}$

To the neuron to the two possible outputs and to train the conjunction, the entries are for the associated output with multiplied. The output is only through this step if the relevant input was classified incorrectly. This procedure simplifies the consideration during teaching and the subsequent adjustment of the weighting. The learning table then looks like this: ${\ displaystyle 0}$ ${\ displaystyle 1}$ ${\ displaystyle 0}$ ${\ displaystyle -1}$ ${\ displaystyle 0}$

Learning table
Inputs
${\ displaystyle x_ {0}}$	${\ displaystyle x_ {1}}$	${\ displaystyle x_ {2}}$
−1	0	0
−1	0	−1
−1	−1	0
1	1	1

At the inputs, the input has the value at which the neuron should output at the end . ${\ displaystyle x_ {0}}$ ${\ displaystyle -1}$ ${\ displaystyle 0}$

For the initial situation, the weightings are chosen randomly:

Weight	Initial value	meaning
${\ displaystyle w_ {0}}$ ( ) ${\ displaystyle = - \ theta}$	00.1	Representation of the threshold
${\ displaystyle w_ {1}}$	00.6	Weighting of the first entry ${\ displaystyle x_ {1}}$
${\ displaystyle w_ {2}}$	−0.3	Weighting of the second input ${\ displaystyle x_ {2}}$

To test the weights, they are inserted into a neuron with three inputs and the threshold value . For the selected weights, the output looks like this: ${\ displaystyle \ theta = 0}$

Output of the neuron with random weights
Inputs			output
${\ displaystyle x_ {0}}$	${\ displaystyle x_ {1}}$	${\ displaystyle x_ {2}}$
−1	0	0	0
−1	0	−1	1
−1	−1	0	0
1	1	1	1

The first and third inputs are calculated incorrectly and the neuron outputs . Now the perceptron criterion function is used: ${\ displaystyle 0}$

By adding the incorrectly recognized entries, the associated weights are determined by

{\ displaystyle w_ {i} ^ {\ mathrm {new}} = w_ {i} ^ {\ mathrm {old}} + \ sum _ {j} \ alpha \ cdot (t_ {j} -o_ {j}) \ cdot x_ {i}}

corrected.

It is

{\ displaystyle j}

the number of the entry,

{\ displaystyle t_ {j}}

the desired output,

{\ displaystyle o_ {j}}

the actual output,

{\ displaystyle x_ {i}}

the input of the neuron and

{\ displaystyle \ alpha> 0}

the learning speed coefficient.

Weight adjustment in the first step
step	Weight	Previous value	New value
1	${\ displaystyle w_ {0}}$	00.1	${\ displaystyle w_ {0} (1) = w_ {0} (0) + (- 1) + (- 1) = - 1 {,} 9}$
	${\ displaystyle w_ {1}}$	00.6	${\ displaystyle w_ {1} (1) = w_ {1} (0) +0 + (- 1) = - 0 {,} 4}$
	${\ displaystyle w_ {2}}$	−0.3	${\ displaystyle w_ {2} (1) = w_ {2} (0) + 0 + 0 = -0 {,} 3}$

Verification
Inputs			output
${\ displaystyle x_ {0}}$	${\ displaystyle x_ {1}}$	${\ displaystyle x_ {2}}$
−1	0	0	1
−1	0	−1	1
−1	−1	0	1
1	1	1	0

The check after the weighting change shows that instead of the first and third input, the fourth input is classified incorrectly. Carrying out a further step in the learning process improves the neuron's recognition function:

Weight adjustment in the second step
step	Weight	Previous value	New value
2	${\ displaystyle w_ {0}}$	−1.9	${\ displaystyle w_ {0} (2) = w_ {0} (1) + 1 = -0 {,} 9}$
	${\ displaystyle w_ {1}}$	−0.4	${\ displaystyle w_ {1} (2) = w_ {1} (1) + 1 = 0 {,} 6}$
	${\ displaystyle w_ {2}}$	−0.3	${\ displaystyle w_ {2} (2) = w_ {2} (1) + 1 = 0 {,} 7}$

Verification
Inputs			output
${\ displaystyle x_ {0}}$	${\ displaystyle x_ {1}}$	${\ displaystyle x_ {2}}$
−1	0	0	1
−1	0	−1	1
−1	−1	0	1
1	1	1	1

You can now see that the neuron has learned the given function and has correctly calculated all four inputs.

Using the input and and the selection of the activation follows: ${\ displaystyle x_ {1} = 1}$ ${\ displaystyle x_ {2} = 1}$ ${\ displaystyle \ theta = -w_ {0}}$

{\ displaystyle {\ begin {matrix} o & = & \ varphi ^ {\ text {hlim}} (w_ {0} \ cdot x_ {0} + w_ {1} \ cdot x_ {1} + w_ {2} \ cdot x_ {2}) = \ varphi ^ {\ text {hlim}} ((w_ {1} \ cdot x_ {1} + w_ {2} \ cdot x_ {2}) - \ theta) \\ & = & \ varphi ^ {\ text {hlim}} ((0 {,} 6 \ cdot 1 + 0 {,} 7 \ ​​cdot 1) -0 {,} 9) = \ varphi ^ {\ text {hlim}} (0 {,} 4) = 1 \ end {matrix}}}

For the other three inputs, which were multiplied by for teaching , the value now results . So follows from the input and the activation: ${\ displaystyle -1}$ ${\ displaystyle 0}$ ${\ displaystyle x_ {1} = 0}$ ${\ displaystyle x_ {2} = 1}$

{\ displaystyle o = \ varphi ^ {\ text {hlim}} ((0 {,} 6 \ cdot 0 + 0 {,} 7 \ ​​cdot 1) -0 {,} 9) = \ varphi ^ {\ text { hlim}} (- 0 {,} 2) = 0}

Without specifying certain weightings, the neuron has learned to use the specifications to represent the conjunction as in the first example.

Application force of a single neuron

An artificial neuron is able to learn by machine even without an entire network. The statistical terms are linear regression and classification. In this way, linear functions can be learned and linearly separable classes can be distinguished. With the help of the so-called kernel trick, non-linear models can also be learned. According to this, a single neuron can produce results similar to, but not optimally, SVMs.

literature

Simon Haykin : Neural Networks, A Comprehensive Foundation . Macmillan College Publishing Company, New York 1994, ISBN 0-02-352761-7 .
Andreas Zell: Simulation of neural networks . R. Oldenbourg Verlag, Munich 1997, ISBN 3-486-24350-0 .
Jürgen Cleve, Uwe Lämmel: Data Mining . De Gruyter Oldenbourg Verlag, Munich 2014, ISBN 978-3-486-71391-6 .
Jürgen Cleve, Uwe Lämmel: Artificial Intelligence . Hanser Verlag, Munich 2012, ISBN 978-3-446-42758-7 .

Web links

Technology Review: Mind in the Machine

swell

↑ JJ Hopfield, D. Tank: Neural Computation of Decisions in Optimization Space . Biological Cybernetics, No. 52, pp. 141-152, 1985.
↑ Patricia S. Churchland , Terrence J. Sejnowski: Basics of neuroinformatics and neurobiology. Friedr. Vieweg & Sohn Verlagsgesellschaft, Braunschweig / Wiesbaden 1997, ISBN 3-528-05428-X
↑ Werner Kinnebrock: Neural Networks: Basics, Applications, Examples. R. Oldenbourg Verlag, Munich 1994, ISBN 3-486-22947-8

This version was added to the list of articles worth reading on July 17, 2006 .

[hopfield-1] JJ Hopfield, D. Tank: Neural Computation of Decisions in Optimization Space . Biological Cybernetics, No. 52, pp. 141-152, 1985.

[2] Patricia S. Churchland , Terrence J. Sejnowski: Basics of neuroinformatics and neurobiology. Friedr. Vieweg & Sohn Verlagsgesellschaft, Braunschweig / Wiesbaden 1997, ISBN 3-528-05428-X

[3] Werner Kinnebrock: Neural Networks: Basics, Applications, Examples. R. Oldenbourg Verlag, Munich 1994, ISBN 3-486-22947-8