Multi-dimensional chain rule

The multidimensional chain rule or generalized chain rule is a generalization of the chain rule from functions of one variable to functions and mappings of several variables in multidimensional analysis . It states that the concatenation of (totally) differentiable maps or functions is differentiable and indicates how the derivation of this map is calculated.

Multidimensional derivatives

If there is a differentiable mapping, then the derivation of in the point , written , or , is a linear mapping that maps vectors in the point to vectors in the image point . It can be represented by the Jacobi matrix , which is denoted by , or also by , and whose entries are the partial derivatives : ${\ displaystyle f \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {m}}$ ${\ displaystyle f}$ ${\ displaystyle p \ in \ mathbb {R} ^ {n}}$ ${\ displaystyle f '(p) \,}$ ${\ displaystyle Df (p)}$ ${\ displaystyle Df_ {p}}$ ${\ displaystyle p \ in \ mathbb {R} ^ {n}}$ ${\ displaystyle f (p) \ in \ mathbb {R} ^ {m}}$ ${\ displaystyle J_ {f} (p)}$ ${\ displaystyle {\ frac {\ partial f} {\ partial x}} (p)}$ ${\ displaystyle Df (p)}$

{\ displaystyle J_ {f} (p) = \ left ({\ frac {\ partial f_ {i}} {\ partial x_ {j}}} (p) \ right) _ {ij} = {\ begin {pmatrix } {\ frac {\ partial f_ {1}} {\ partial x_ {1}}} (p) & \ ldots & {\ frac {\ partial f_ {1}} {\ partial x_ {n}}} (p ) \\\ vdots && \ vdots \\ {\ frac {\ partial f_ {m}} {\ partial x_ {1}}} (p) & \ ldots & {\ frac {\ partial f_ {m}} {\ partial x_ {n}}} (p) \ end {pmatrix}}}

The chain rule now states that the derivative of the concatenation of two mappings is precisely the concatenation of the derivatives, or that the Jacobian matrix of the concatenation is the matrix product of the Jacobian matrix of the outer function with the Jacobian matrix of the inner function.

sentence

If and are differentiable mappings, then the concatenation is also differentiable. Their derivation in the point is the sequential execution of the derivation of in the point and the derivation of in the point : ${\ displaystyle f \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {l}}$ ${\ displaystyle g \ colon \ mathbb {R} ^ {l} \ to \ mathbb {R} ^ {m}}$ ${\ displaystyle h = g \ circ f \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {m}}$ ${\ displaystyle p \ in \ mathbb {R} ^ {n}}$ ${\ displaystyle f}$ ${\ displaystyle p}$ ${\ displaystyle g}$ ${\ displaystyle f (p)}$

{\ displaystyle D (g \ circ f) _ {p} = Dg_ {f (p)} \ circ Df_ {p}}

or.

{\ displaystyle (g \ circ f) '(p) = g' (f (p)) \ circ f '(p).}

For the Jacobi matrices the following applies accordingly:

{\ displaystyle J_ {g \ circ f} (p) = J_ {g} (f (p)) \ cdot J_ {f} (p)}

,

or.

{\ displaystyle {\ frac {\ partial (g \ circ f)} {\ partial x}} (p) = {\ frac {\ partial g} {\ partial y}} (f (p)) \ cdot {\ frac {\ partial f} {\ partial x}} (p)}

where the point denotes the matrix multiplication. Here the coordinates in the domain are of having referred to the coordinates in the image space of and thus the domain of having . Written out with the components of the figures and the partial derivatives: ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle f}$ ${\ displaystyle x = (x_ {1}, \ dots, x_ {n})}$ ${\ displaystyle \ mathbb {R} ^ {l}}$ ${\ displaystyle f}$ ${\ displaystyle g}$ ${\ displaystyle y = (y_ {1}, \ dots, y_ {l})}$

{\ displaystyle {\ frac {\ partial h_ {i}} {\ partial x_ {j}}} (p) = \ sum _ {k = 1} ^ {l} {\ frac {\ partial g_ {i}} {\ partial y_ {k}}} (f (p)) \ cdot {\ frac {\ partial f_ {k}} {\ partial x_ {j}}} (p)}

Greater differentiability

If, for one , the mappings and of the class , that is, times, are continuously differentiable, then is also of the class . This results from repeatedly applying the chain rule and the product rule to the partial derivatives of the component functions. ${\ displaystyle k \ in \ mathbb {N}}$ ${\ displaystyle f}$ ${\ displaystyle g}$ ${\ displaystyle C ^ {k}}$ ${\ displaystyle k}$ ${\ displaystyle g \ circ f}$ ${\ displaystyle C ^ {k}}$

Special case n = m = 1

Often one would like to determine the derivative of an ordinary real function , which is however defined via a multi-dimensional "detour": ${\ displaystyle h \ colon \ mathbb {R} \ to \ mathbb {R}}$

{\ displaystyle h = g \ circ f}

with and .

{\ displaystyle f \ colon \ mathbb {R} \ to \ mathbb {R} ^ {l}}

{\ displaystyle g \ colon \ mathbb {R} ^ {l} \ to \ mathbb {R}}

In this case the chain rule can be written as follows:

{\ displaystyle h '(x) = {\ frac {\ partial g} {\ partial y_ {1}}} (f (x)) \ cdot f_ {1}' (x) + \ dots + {\ frac { \ partial g} {\ partial y_ {l}}} (f (x)) \ cdot f_ {l} '(x) = \ operatorname {grad} \ g (f (x)) \ cdot f' (x) }

The last painting point denotes the scalar product between two vectors, the gradient

{\ displaystyle \ operatorname {grad} \ g = \ nabla g = \ left ({\ frac {\ partial g} {\ partial y_ {1}}}, \ ldots, {\ frac {\ partial g} {\ partial y_ {l}}} \ right)}

the function evaluated at the point and the vector-valued derivative ${\ displaystyle g}$ ${\ displaystyle f (x)}$

{\ displaystyle f '(x) = \ left (f_ {1}' (x), \ ldots, f_ {l} '(x) \ right)}

the illustration .

{\ displaystyle f}

Chain rule and direction derivation

For the special case , with is ${\ displaystyle f \ colon \ mathbb {R} \ to \ mathbb {R} ^ {l}}$ ${\ displaystyle f (t) = a + tv}$ ${\ displaystyle a, v \ in \ mathbb {R} ^ {l}}$

{\ displaystyle (g \ circ f) '(0) = \ left. {\ frac {d} {dt}} \ right | _ {t = 0} g (a + tv) = D_ {v} g (a )}

the directional derivative of the point in the direction of the vector . It then follows from the chain rule ${\ displaystyle g}$ ${\ displaystyle a}$ ${\ displaystyle v}$

{\ displaystyle (g \ circ f) '(0) = \ operatorname {grad} \ g (f (0)) \ cdot f' (0) = \ operatorname {grad} \ g (a) \ cdot v.}

The result is the usual formula for calculating the directional derivative:

{\ displaystyle D_ {v} g (a) = \ operatorname {grad} \ g (a) \ cdot v}

example

{\ displaystyle h (x) = g (\ cos x, \ sin x)}

In this example forms the outer function, depending on . So is ${\ displaystyle g}$ ${\ displaystyle y = (y_ {1}, y_ {2})}$

{\ displaystyle g '(y) = {\ begin {pmatrix} {\ frac {\ partial g} {\ partial y_ {1}}} & {\ frac {\ partial g} {\ partial y_ {2}}} \ end {pmatrix}}}

As an inner function we set , depending on the real variable . Derive results ${\ displaystyle f (x) = (f_ {1} (x), f_ {2} (x)) = (\ cos x, \ sin x)}$ ${\ displaystyle x}$

{\ displaystyle f '(x) = {\ begin {pmatrix} f_ {1}' (x) \\ f_ {2} '(x) \ end {pmatrix}} = {\ begin {pmatrix} - \ sin x \\\ cos x \ end {pmatrix}}}

According to the general chain rule, the following applies:

{\ displaystyle {\ begin {aligned} h '(x) & = g' (f (x)) \ cdot f '(x) = \ left. {\ begin {pmatrix} {\ frac {\ partial g} { \ partial y_ {1}}} & {\ frac {\ partial g} {\ partial y_ {2}}} \ end {pmatrix}} \ right | _ {y = f (x)} \ cdot {\ begin { pmatrix} - \ sin x \\\ cos x \ end {pmatrix}} \\ & = - \ sin {x} \ cdot {\ frac {\ partial g} {\ partial y_ {1}}} (\ cos { x}, \ sin {x}) + \ cos {x} \ cdot {\ frac {\ partial g} {\ partial y_ {2}}} (\ cos {x}, \ sin {x}) \ end { aligned}}}

An additive example using substitution

For example , to find the derivative of , one can write the function and then apply the chain and product rule, resulting in the derivative ${\ displaystyle \ displaystyle f (x) = x ^ {x}}$ ${\ displaystyle x ^ {x} = e ^ {x \ ln x}}$

{\ displaystyle f '(x) = e ^ {x \ ln x} \ left (x \ cdot {\ frac {1} {x}} + 1 \ cdot \ ln x \ right) = x ^ {x} + x ^ {x} \ ln x}

leads. An alternative possibility of derivation, however, would be to use the multidimensional chain rule:

Let the function be , its two 1st partial derivatives and - easy to see due to the transformation - . If you replace now and with the two auxiliary functions and , you get with and above. multi-dimensional chain rule: ${\ displaystyle \ displaystyle g (u, v) = {u} ^ {v}}$ ${\ displaystyle {\ frac {\ partial g} {\ partial u}} = v \, u {} ^ {v-1}}$ ${\ displaystyle \ displaystyle u {} ^ {v} = e ^ {v \ ln u}}$ ${\ displaystyle {\ frac {\ partial g} {\ partial v}} = u {} ^ {v} \ ln u}$ ${\ displaystyle u}$ ${\ displaystyle v}$ ${\ displaystyle \ displaystyle h_ {1} (x) = x}$ ${\ displaystyle h_ {2} (x) = x}$ ${\ displaystyle \ displaystyle f (x) = g (h_ {1} (x), h_ {2} (x))}$

{\ displaystyle f '(x) = {\ frac {\ partial g} {\ partial u}} (x, x) \, h_ {1}' (x) + {\ frac {\ partial g} {\ partial v}} (x, x) \, h_ {2} '(x) = x \, x ^ {x-1} \ cdot 1 + x ^ {x} \ ln x \ cdot 1 = x ^ {x} + x ^ {x} \ ln x}

This procedure can be described as follows:

One derives from that in the base, considering that in the exponent as a constant, ${\ displaystyle x ^ {x}}$ ${\ displaystyle x}$ ${\ displaystyle x}$
one derives from that in the exponent, whereby one considers that in the base as a constant, ${\ displaystyle x ^ {x}}$ ${\ displaystyle x}$ ${\ displaystyle x}$
the results are added up.

The “trick” here is to differentiate between the base and the exponent, although they have the same sound. ${\ displaystyle x}$ ${\ displaystyle x}$

This derivation is generally applicable, e.g. B. it simply delivers the Leibniz rule for parameter integrals .

Generalization to differentiable manifolds

If and are differentiable manifolds and a differentiable mapping, then the derivative or of in the point is a linear mapping from the tangent space of in the point into the tangent space of in the image point : ${\ displaystyle M}$ ${\ displaystyle N}$ ${\ displaystyle f \ colon M \ to N}$ ${\ displaystyle \, f '(p)}$ ${\ displaystyle Df_ {p}}$ ${\ displaystyle f}$ ${\ displaystyle p \ in M}$ ${\ displaystyle M}$ ${\ displaystyle p}$ ${\ displaystyle N}$ ${\ displaystyle f (p)}$

{\ displaystyle Df_ {p} \ colon T_ {p} M \ to T_ {f (p)} N}

Other names for it are: differential (then often written), pushforward ( ) and tangential mapping ( ). ${\ displaystyle df_ {p}}$ ${\ displaystyle f _ {\ ast p}}$ ${\ displaystyle T_ {p} f}$

The chain rule then says: Are , and differentiable manifolds and is the concatenation of the differentiable mappings and , then is also differentiable and the derivative in the point applies: ${\ displaystyle M}$ ${\ displaystyle N}$ ${\ displaystyle P}$ ${\ displaystyle h = g \ circ f \ colon M \ to P}$ ${\ displaystyle f \ colon M \ to N}$ ${\ displaystyle g \ colon N \ to P}$ ${\ displaystyle h}$ ${\ displaystyle p \ in M}$

{\ displaystyle Dh_ {p} = Dg_ {f (p)} \ circ Df_ {p}}

Chain rule for Fréchet derivatives

The chain rule applies correspondingly to Fréchet derivatives .

Given are Banach spaces , and , open subsets and and mappings and . ${\ displaystyle X}$ ${\ displaystyle Y}$ ${\ displaystyle Z}$ ${\ displaystyle U \ subset X}$ ${\ displaystyle V \ subset Y}$ ${\ displaystyle B \ colon U \ to Y}$ ${\ displaystyle A \ colon V \ to Z}$

If at the point and at the point is differentiable, then the concatenation at the point is also differentiable and it applies ${\ displaystyle B}$ ${\ displaystyle \ varphi \ in U}$ ${\ displaystyle A}$ ${\ displaystyle B (\ varphi) \ in V}$ ${\ displaystyle A \ circ B \ colon U \ to Z}$ ${\ displaystyle \ varphi}$

{\ displaystyle (A \ circ B) '(\ varphi) = A' (B (\ varphi)) \ circ B '(\ varphi)}

literature

Otto Forster : Analysis 2. Differential calculus in R ⁿ . Ordinary differential equations. 9th edition. Vieweg + Teubner, Wiesbaden 2011, ISBN 978-3-8348-1231-5 .
Konrad Königsberger : Analysis 2. 5th edition. Springer, Berlin 2004, ISBN 3-540-20389-3 .
Geiger, Kanzow: Theory and numerics of restricted optimization tasks. Springer, Berlin / Heidelberg 2002, ISBN 978-3-540-42790-2 .

References and comments

↑ ^a ^b Physicists write the vectors, or , with vector arrows ( , ) or with bold face ( or ). That has u. a. the advantage that you can see immediately that, in contrast to, is a one-dimensional variable. ${\ displaystyle \, f '(x)}$ ${\ displaystyle v}$ ${\ displaystyle {\ vec {f}} '(x)}$ ${\ displaystyle {\ vec {v}}}$ ${\ displaystyle \ mathbf {f '} (x)}$ ${\ displaystyle \ mathbf {v}}$ ${\ displaystyle x}$ ${\ displaystyle \ mathbf {v}}$

[Physik-1] Physicists write the vectors, or , with vector arrows ( , ) or with bold face ( or ). That has u. a. the advantage that you can see immediately that, in contrast to, is a one-dimensional variable. ${\ displaystyle \, f '(x)}$ ${\ displaystyle v}$ ${\ displaystyle {\ vec {f}} '(x)}$ ${\ displaystyle {\ vec {v}}}$ ${\ displaystyle \ mathbf {f '} (x)}$ ${\ displaystyle \ mathbf {v}}$ ${\ displaystyle x}$ ${\ displaystyle \ mathbf {v}}$