Automatic differentiation

from Wikipedia, the free encyclopedia

The automatic differentiation or differentiation of algorithms is a method of computer science and applied mathematics . For a function in several variables, which is given as a procedure in a programming language or as a calculation graph, an extended procedure is generated that evaluates both the function and one or any number of gradients up to a full Jacobi matrix . If the output program contains loops, the number of loop passes must not depend on the independent variables.

These derivatives are z. B. required for solving nonlinear systems of equations using Newton's method and for methods of nonlinear optimization.

The most important aid here is the chain rule and the fact that the derivatives of the elementary functions available in the computer such as sin, cos, exp, log are known and can be calculated just as precisely. The effort for calculating the derivatives is thus proportional (with a small factor) to the effort for evaluating the output function.

Calculation of derivatives

Task: Given a function

We are looking for the code / function for directional derivations or the full Jacobi matrix

Different approaches to this are:

  1. Try to find a closed, analytical form for f and determine “on paper” through differentiation. Then implement the code for by hand.
    Problem : Too difficult, time-consuming, error-prone
    Advantages: very efficient, high accuracy
  2. Generate the calculation rule for f in a computer algebra system and use the means available there for symbolic differentiation. Then export the code for into its actual environment.
    Problem : Time consuming, does not scale, too complicated for larger programs / functions
  3. Find a numerical approximation of the derivative. It applies to lowercase h
    .
    Problem : Choice of the optimal step size h, imprecise, possibly instability
    Advantage: easy calculation
  4. Place the calculation rule as a calculation tree, d. H. as an arithmetic network, and expand it to a calculation tree for function value and derivative using the chain rule .

The idea of ​​automatic differentiation (AD)

Every program that evaluates a function can be described as a sequence of intermediate steps in which intermediate results are converted in an elementary way. You can think of this as a (potentially infinite) sequence of intermediate values ​​and functions that really only depend on one or two variables. The function is evaluated by setting at the beginning and one after the other

is determined. This can be set up so that the function values ​​of f are in the most recently evaluated intermediate results; H. at the end it is still assigned.

AD describes a set of methods, the aim of which is to generate a new program that evaluates the Jacobian matrix of f i. The input variables x are called independent variables, the output variable (s) y are called dependent variables. With AD there are at least two different modes.

  1. Forward mode
  2. Reverse mode

Forward mode

In the forward mode the matrix product is calculated

the Jacobi matrix with any matrix (seed matrix) without first determining the components of the Jacobi matrix.

example 1

AD calculates J

In the forward mode, directional derivatives are transported along the control flow of the computation of f . For each scalar variable v , a vector Dv is generated in the AD-generated code , the i- th component of which contains the directional derivative along the i- th independent variable.

Example 2

Compute a function

 

An automatic differentiation in the forward mode would have a function

 

to the result:

 

Reverse mode

The reverse mode consists of two phases.

  1. The original program is executed and certain data is saved.
  2. The original program is executed in reverse. Directional derivations are transported and the data from phase 1 are used.

In phase 2, a vector is introduced for each scalar variable v . The i th component of this vector contains the i th direction derivative (in the direction of v ). The seed matrix is ​​in . In reverse mode, the result is a product

example 1

AD calculates J

Example 2

For each calculation rule line , the derivatives of u and v are supplemented by s in the following way:

Find the - and - derivatives of . These are each referred to as and . The value is initialized with 1, all other values ​​are initialized with 0.

Efficiency considerations

The efficiency of AD algorithms depends on the mode and the parameter p. The choice of the mode and the parameter p depends on what the Jacobian matrix is ​​calculated for. It denotes

to calculate the time f
the storage requirements of this invoice
calculate the time f and JS
the storage requirements of this invoice
calculate the time f and SJ
the storage requirements of this invoice

The following applies to the two modes presented

  1. Forward mode:
  2. Reverse mode:

The calculation as a chain of calculations

Given:, Question: How does the derivative of s change during the second phase to get the derivatives of u and v?

  
  

is interpreted as a sequence of programs. In the example "Optimizing a wing", the calculation comprises the following steps.

  • Superimposition of the wing with so-called "mode functions"
  • Calculation of a grid that is placed around the wing
  • Solving the Navier-Stokes equations on the grid and calculating the integrals of the same.
.

Overall, the function results

  

With a naive approach would be three matrices , , calculate and then perform two matrix multiplications. However, the disadvantage of the forward mode is:

  

in reverse mode would be analog

  

be valid. A better approach is to use the result of a calculation as the seed matrix of the following.

  1. Choose the seed matrix of the first calculation
  2. The result of the first calculation as the seed matrix of the second calculation
  3. The result of the second calculation as the seed matrix of the third calculation

so

Since the number of rows in each matrix is ​​8 (p = 8), the time and memory requirements increase by a maximum of 8 compared to the regular evaluation .

literature

Web links