# Chain rule

The chain rule is one of the basic rules of differential calculus . It makes statements about the derivation of a function that can be represented as a concatenation of two differentiable functions. The key message of the chain rule is that such a function itself is differentiable again and its derivation is obtained by deriving the two linked functions separately and - evaluated at the right places - multiplied with one another .

The chain rule can be generalized to functions that can be represented as a chain of more than two differentiable functions. Such a function is also differentiable, its derivative is obtained by multiplying the derivatives of all nested functions.

The chain rule forms a special case of the multi-dimensional chain rule for the one-dimensional case.

Its counterpart in integral calculus is integration through substitution .

## Mathematical formulation

Be open intervals , and functions with . ${\ displaystyle U, V}$ ${\ displaystyle v \ colon V \ rightarrow \ mathbb {R}}$${\ displaystyle u \ colon U \ rightarrow \ mathbb {R}}$${\ displaystyle v (V) \ subset U}$

The function is differentiable at the point and is differentiable at the point . ${\ displaystyle v}$${\ displaystyle x_ {0} \ in V}$${\ displaystyle u}$${\ displaystyle z_ {0}: = v (x_ {0}) \ in U}$

Then the "composite" function ( composition )

${\ displaystyle f = u \ circ v \ colon \, V \ rightarrow \ mathbb {R}}$

differentiable at the point and the following applies: ${\ displaystyle x_ {0}}$

${\ displaystyle (u \ circ v) '(x_ {0}) = u' {\ big (} v (x_ {0}) {\ big)} \ cdot v '(x_ {0}).}$

In connection with the chain rule is called the outer , the inner function . ${\ displaystyle u}$${\ displaystyle v}$${\ displaystyle f}$

Practical rule of thumb : The derivation of a function formed by concatenation in the point is the "outer derivation" , evaluated at the point , times the derivation of the inner function , evaluated at the point . Or in short: "Outer derivative times inner derivative". ${\ displaystyle x_ {0}}$${\ displaystyle u '}$${\ displaystyle v (x_ {0})}$${\ displaystyle v '}$${\ displaystyle x_ {0}}$

## example

The function defined by is considered. ${\ displaystyle f (x) = \ left (x ^ {3} +1 \ right) ^ {2}}$${\ displaystyle f}$

This can be represented as a chain of functions

${\ displaystyle u (v) = v ^ {2}}$

with the function

${\ displaystyle v (x) = x ^ {3} +1,}$

because it applies . Here one calls external and internal function . ${\ displaystyle f (x) = u (v (x))}$${\ displaystyle u}$ ${\ displaystyle v}$

To apply the chain rule, we need the derivatives (outer derivative) and (inner derivative) : ${\ displaystyle u '}$ ${\ displaystyle v '}$

${\ displaystyle u '(v) = 2v}$

and

${\ displaystyle v '(x) = 3x ^ {2}.}$

Since both and are differentiable, according to the chain rule is also differentiable, and the following applies to their derivation: ${\ displaystyle u}$${\ displaystyle v}$${\ displaystyle f = u \ circ v}$

${\ displaystyle f \ '(x) = u' (v (x)) \, v '(x).}$

Well so we get in total: ${\ displaystyle u '(v (x)) = 2 (x ^ {3} +1)}$

${\ displaystyle f \ '(x) = 2 (x ^ {3} +1) \, 3x ^ {2}}$

With the help of color, the rule of thumb formulated at the beginning can also be recognized in the formula image.

{\ displaystyle {\ begin {aligned} f (x) & = (\ color {Blue} x ^ {3} +1 \ color {Black}) ^ {2} \\ f '(x) & = 2 (\ color {Blue} x ^ {3} +1 \ color {Black}) \ cdot \ color {Blue} 3x ^ {2} \ color {Black} \\\ end {aligned}}}

Note that the representation of a function as a concatenation of an external with an internal function does not have to be unambiguous. The example function can also be understood as a chain of functions and , because the following also applies to these two functions: ${\ displaystyle u (v) = (v + 1) ^ {2}}$${\ displaystyle v (x) = x ^ {3}}$

${\ displaystyle u (v (x)) = (x ^ {3} +1) ^ {2} = f (x).}$

The use of the chain rule is computationally more complex in this case, since at least the term has to be multiplied. ${\ displaystyle (v + 1) ^ {2}}$

Overall, the chain rule in the sense of constructivist didactics can be discovered in this example . Multiplying out gives:

${\ displaystyle f (x) = x ^ {6} + 2x ^ {3} +1}$.

After deduction, the inner function is prepared by removing: ${\ displaystyle v (x) = x ^ {3} +1}$

${\ displaystyle f '(x) = 6x ^ {5} + 6x ^ {2} = 6x ^ {2} (x ^ {3} +1) = 2 (x ^ {3} +1) \ cdot 3x ^ {2}}$.

From this the chain rule can be assumed, which then still has to be proven in its general validity.

## Heuristic derivation

To calculate the derivative of , the difference quotient has to be calculated. If you expand this fraction , you get: ${\ displaystyle u \ circ v}$${\ displaystyle {\ frac {\ Delta u} {\ Delta x}}}$${\ displaystyle \ Delta v}$

${\ displaystyle {\ frac {\ Delta u} {\ Delta x}} = {\ frac {\ Delta u} {\ Delta v}} \ cdot {\ frac {\ Delta v} {\ Delta x}}}$.

By crossing the border , the difference quotients become the differential quotients. Goes to zero, then too . One then obtains for the derivation of the linked function: ${\ displaystyle \ Delta x \ rightarrow 0}$${\ displaystyle \ Delta x}$${\ displaystyle \ Delta v}$

{\ displaystyle {\ begin {aligned} f '(x) & = \ lim _ {\ Delta x \ rightarrow 0} {\ frac {\ Delta u} {\ Delta x}} = \ lim _ {\ Delta x \ rightarrow 0} \ left ({\ frac {\ Delta u} {\ Delta v}} \ cdot {\ frac {\ Delta v} {\ Delta x}} \ right) \\ & = \ lim _ {\ Delta v \ rightarrow 0} \ left ({\ frac {\ Delta u} {\ Delta v}} \ right) \ cdot \ lim _ {\ Delta x \ rightarrow 0} \ left ({\ frac {\ Delta v} {\ Delta x}} \ right) = {\ frac {\ mathrm {d} u} {\ mathrm {d} v}} \ cdot {\ frac {\ mathrm {d} v} {\ mathrm {d} x}} = u '{\ big (} v (x) {\ big)} \ cdot v' (x). \ end {aligned}}}

## proof

One defines

${\ displaystyle D (z, z_ {0}): = {\ begin {cases} {\ frac {u (z) -u (z_ {0})} {z-z_ {0}}}, & {\ text {falls}} z \ neq z_ {0}, \\ u '(z_ {0}), & {\ text {falls}} z = z_ {0}. \ end {cases}}}$

Because in is differentiable, applies ${\ displaystyle u}$${\ displaystyle z_ {0}}$

${\ displaystyle \ lim _ {z \ to z_ {0}} D (z, z_ {0}) = u '(z_ {0}),}$

that is, the function is continuous at that point . Also applies to everyone : ${\ displaystyle z \ mapsto D (z, z_ {0})}$${\ displaystyle z_ {0}}$${\ displaystyle z \ in U}$

${\ displaystyle u (z) -u (z_ {0}) = D (z, z_ {0}) \ cdot (z-z_ {0}).}$

Because of this it follows: ${\ displaystyle \ lim _ {x \ to x_ {0}} v (x) = v (x_ {0})}$

{\ displaystyle {\ begin {aligned} (u \ circ v) '(x_ {0}) & = \ lim _ {x \ to x_ {0}} {\ frac {u {\ big (} v (x) {\ big)} - u {\ big (} v (x_ {0}) {\ big)}} {x-x_ {0}}} = \ lim _ {x \ to x_ {0}} {\ frac {D {\ big (} v (x), v (x_ {0}) {\ big)} \ cdot {\ big (} v (x) -v (x_ {0}) {\ big)}} { x-x_ {0}}} \\ & = \ lim _ {x \ to x_ {0}} D {\ big (} v (x), v (x_ {0}) {\ big)} \ cdot \ lim _ {x \ to x_ {0}} {\ frac {v (x) -v (x_ {0})} {x-x_ {0}}} \\ & = u '{\ big (} v ( x_ {0}) {\ big)} \ cdot v '(x_ {0}). \ end {aligned}}}

## Complex functions

Let open subsets , e.g. B. Areas , and functions with . ${\ displaystyle U, V \ subset \ mathbb {C}}$ ${\ displaystyle v \ colon V \ rightarrow \ mathbb {C}}$${\ displaystyle u \ colon U \ rightarrow \ mathbb {C}}$${\ displaystyle v (V) \ subseteq U}$

The function is differentiable at the point and is differentiable at the point . ${\ displaystyle v}$${\ displaystyle x_ {0} \ in V}$ ${\ displaystyle u}$${\ displaystyle v (x_ {0}) \ in U}$

Then is the compound function

${\ displaystyle f: = u \ circ v \ colon V \ rightarrow \ mathbb {C}, x \ mapsto u (v (x))}$

differentiable at the point and the following applies: ${\ displaystyle x_ {0}}$

${\ displaystyle (u \ circ v) '(x_ {0}) = u' {\ big (} v (x_ {0}) {\ big)} \ cdot v '(x_ {0}).}$

Conclusion: The complex chain rule (including its proof) is completely analogous to the real one.

## Generalization to multiple concatenations

Differentiating becomes a little more complicated when more than two functions are linked. In this case the chain rule is applied recursively . For example, when three functions are linked , u , v and w result

${\ displaystyle f (x) = u (v (w (x)))}$

the derivation

${\ displaystyle f '(x) = u' (v (w (x))) \ cdot (v (w (x))) '= u' (v (w (x))) \ times v '(w (x)) \ cdot w '(x).}$

Generally owns the function

${\ displaystyle f = u_ {1} \ circ \ cdots \ circ u_ {n}}$

the derivation

${\ displaystyle f '(x) = u_ {1}' (u_ {2} (\ cdots (u_ {n} (x)))) \ cdot u_ {2} '(u_ {3} (\ cdots (u_ {n} (x)))) \ cdots u_ {n} '(x),}$

as can be proven by complete induction . When calculating the derivative in practice, one multiplies factors that result as follows:

The first factor is obtained by expressing and deriving the outermost function by an independent variable. Instead of these independent variables, the arithmetic expression should be used for the remaining (inner) functions. The second factor is calculated accordingly as the derivative of the second outermost function, whereby the arithmetic expression for the associated inner functions must also be used here. This process is continued until the last factor, the innermost derivative.

The function can again serve as an example . This can be represented as a chain of the three functions: ${\ displaystyle f (x) = (x ^ {3} +1) ^ {2}}$

${\ displaystyle {\ begin {array} {ccl} u (v) & = & v ^ {2} \\ v (w) & = & w + 1 \\ w (x) & = & x ^ {3}, \ end {array}}}$

because it applies:

${\ displaystyle u (v (w (x))) = u (w (x) +1) = u (x ^ {3} +1) = (x ^ {3} +1) ^ {2} = f (x).}$

The chain rule generalized to multiple concatenations thus also delivers

${\ displaystyle {\ begin {array} {ccl} u '(v) & = & 2v \\ v' (w) & = & 1 \\ w '(x) & = & 3x ^ {2}, \ end {array} }}$

the derivation

${\ displaystyle f \ '(x) = u' (v (w (x))) v '(w (x)) w' (x) = 2v (w (x)) \ cdot 1 \ cdot w '( x) = 2 (x ^ {3} +1) \ cdot 1 \ cdot 3x ^ {2}.}$

## Generalization for higher derivatives

A generalization of the chain rule for higher derivatives is the formula of Faà di Bruno . It's much more complicated and harder to prove.

If and are two- times differentiable functions whose chaining is defined, then applies ${\ displaystyle u}$${\ displaystyle v}$${\ displaystyle n}$${\ displaystyle f (x) = u (v (x))}$

${\ displaystyle f ^ {(n)} (x) = \ sum _ {(k_ {1}, \ dotsc, k_ {n}) \ in T_ {n}} {\ frac {n!} {k_ {1 }! \ cdot \ \ dotsb \ \ cdot k_ {n}!}} u ^ {(k_ {1} + \ dotsb + k_ {n})} (v (x)) \, \ prod _ {m = 1 \ atop k_ {m} \ geq 1} ^ {n} \ left ({\ frac {1} {m!}} v ^ {(m)} (x) \ right) ^ {k_ {m}}.}$

Here the -th derivation of at the point . The amount is summed over the, contains all - tuple of non-negative , integers with . ${\ displaystyle f ^ {(n)} (x)}$${\ displaystyle n}$${\ displaystyle f}$${\ displaystyle x}$${\ displaystyle T_ {n}}$${\ displaystyle n}$ ${\ displaystyle (k_ {1}, \ \ ldots \, k_ {n}) \,}$${\ displaystyle 1k_ {1} + 2k_ {2} + \ cdots + nk_ {n} = n \,}$

## Generalization to functions and mappings of several variables

Here one considers differentiable functions (figures) . The derivation of such a mapping in the point is then a linear mapping that can be represented by a matrix, the Jacobi matrix . ${\ displaystyle f \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {m}}$${\ displaystyle x_ {0} \ in \ mathbb {R} ^ {n}}$${\ displaystyle Df_ {x_ {0}} \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {m}}$${\ displaystyle (m \ times n)}$ ${\ displaystyle J_ {f} (x_ {0})}$

The chain rule states that the concatenation of two differentiable maps is differentiable again. You can get your derivative by concatenating the individual derivatives. The associated Jacobi matrix is ​​the matrix product of the individual Jacobi matrices.

In detail: If the images are differentiable in the point and in the point , the concatenation in the point is also differentiable, and it applies ${\ displaystyle v \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {l}}$${\ displaystyle x_ {0} \ in \ mathbb {R} ^ {n}}$${\ displaystyle u \ colon \ mathbb {R} ^ {l} \ to \ mathbb {R} ^ {m}}$${\ displaystyle v (x_ {0}) \ in \ mathbb {R} ^ {l}}$${\ displaystyle u \ circ v \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {m}}$${\ displaystyle x_ {0}}$

${\ displaystyle D (u \ circ v) _ {x_ {0}} = Du_ {v (x_ {0})} \ circ Dv_ {x_ {0}}}$

and

${\ displaystyle J_ {u \ circ v} (x_ {0}) = J_ {u} (v (x_ {0})) \ cdot J_ {v} (x_ {0}).}$

A chain rule for Fréchet's derivatives of mappings between Banach spaces and for the derivatives (differentials, tangential maps) of mappings between differentiable manifolds can be formulated in a similar way.

## Different notations in physics and other sciences

The chain rule is widely used in many natural sciences, such as physics and engineering . However, a special notation has been developed here which differs significantly from the mathematical notation of the chain rule.

### Presentation of the notation

In physical literature, the notation is usually used to derive a function according to the variable${\ displaystyle h}$${\ displaystyle x}$

${\ displaystyle h '(x) =: {\ frac {\ mathrm {d} h} {\ mathrm {d} x}} (x)}$

prefers. If two functions are concatenated: with , the chain rule is presented in this notation: ${\ displaystyle h}$${\ displaystyle h = f \ circ g}$${\ displaystyle y \ mapsto f (y), x \ mapsto g (x)}$

${\ displaystyle {\ frac {\ mathrm {d} h} {\ mathrm {d} x}} (x) = {\ frac {\ mathrm {d} f} {\ mathrm {d} y}} (g ( x)) {\ frac {\ mathrm {d} g} {\ mathrm {d} x}} (x)}$

It is also a common convention to identify the independent variable of the function with the function symbol of the inner function , but to leave out all argument brackets: ${\ displaystyle f}$${\ displaystyle g}$

${\ displaystyle {\ frac {\ mathrm {d} h} {\ mathrm {d} x}} = {\ frac {\ mathrm {d} f} {\ mathrm {d} g}} {\ frac {\ mathrm {d} g} {\ mathrm {d} x}}}$

Ultimately, for chaining no new icon introduced, but the entire chain with the external function identified: . ${\ displaystyle f \ circ g}$${\ displaystyle f}$${\ displaystyle f = f \ circ g}$

The chain rule then takes on the following appearance:

${\ displaystyle {\ frac {\ mathrm {d} f} {\ mathrm {d} x}} = {\ frac {\ mathrm {d} f} {\ mathrm {d} g}} {\ frac {\ mathrm {d} g} {\ mathrm {d} x}}}$

Formally, the chain rule is here as an extension of the "break" with shows, making it in physical literature (and also in other natural sciences and engineering) is common, not to mention by name the chain rule in use. Instead, substitute formulations are often found, such as the “extension of with ”, and in some cases a reason is completely missing. Even if this is not always immediately recognizable to the untrained eye, the chain rule of differential calculus is behind all these formulations without exception. ${\ displaystyle \ mathrm {d} f / \ mathrm {d} x}$${\ displaystyle \ mathrm {d} g}$${\ displaystyle \ mathrm {d} f / \ mathrm {d} x}$${\ displaystyle \ mathrm {d} g}$

Although the presented notation breaks with some mathematical conventions, it enjoys great popularity and widespread use, as it allows you to calculate with derivatives (at least casually) like “normal fractions”. It also makes many invoices clearer because brackets are omitted and very few symbols have to be used. In many cases, the quantity described by a link represents a certain physical variable (e.g. energy or electrical voltage ) for which a certain letter is “reserved” (e.g. E for energy and U for voltage). The above notation enables this letter to be used consistently throughout the calculation.

### example

The kinetic energy of a body depends on its velocity from: . The speed depends on the formatting time off , so does the kinetic energy of the body is a function of time determined by the concatenation ${\ displaystyle v}$${\ displaystyle E = f (v)}$${\ displaystyle v = g (t)}$

${\ displaystyle E (t) = f (g (t))}$

is described. If we want to calculate the change in kinetic energy over time, the chain rule applies

${\ displaystyle E \ '(t) = f \' (g (t)) g \ '(t).}$

In physical literature one would find the last equation in the following (or similar) form:

${\ displaystyle {\ frac {\ mathrm {d} E} {\ mathrm {d} t}} = {\ frac {\ mathrm {d} E} {\ mathrm {d} v}} {\ frac {\ mathrm {d} v} {\ mathrm {d} t}}.}$

A clear advantage is the consistent use of function symbols, the letters of which correspond to those of the underlying physically relevant variable ( for energy, for speed). ${\ displaystyle E}$${\ displaystyle v}$