Gauss-Newton method

from Wikipedia, the free encyclopedia

The Gauss-Newton method (after Carl Friedrich Gauß and Isaac Newton ) is a numerical method for solving nonlinear minimization problems using the least squares method . The method is related to the Newton method for solving non-linear optimization problems, but has the advantage that the calculation of the 2nd derivative required for the Newton method is not necessary. The calculation of the 2nd derivative is often a limiting factor, especially for large problems with tens of thousands of parameters.

The optimization problem

The Gauss-Newton method solves problems in which the minimum of a sum of squares of continuously differentiable functions is sought, i.e.

with . With the Euclidean norm, this can also be written as

with . Problems of this form occur frequently in practice, in particular the nonlinear problem is equivalent to minimizing , provided that it has a zero . Is a linear map , there is the standard case, the method of least squares linear model function .

The method

The basic idea of ​​the Gauss-Newton method is to linearize the objective function and to optimize the linearization in the sense of the least squares. The linearization, i.e. the 1st order Taylor expansion , of in the point reads

.

The matrix is the Jacobi matrix and is often referred to as. The linear least squares problem is obtained

,

with gradient .

Setting the gradient to zero provides the so-called normal equations

with the explicit solution

.

The Gauss-Newton iteration step results directly from this

,

making it clear that the Jacobi matrix is evaluated at this point and is a step size .

To solve the linear system of equations in the Gauss-Newton iteration step, there are different possibilities depending on the problem size and the structure:

  • Small problems ( ) are best solved with QR decomposition
  • The Cholesky decomposition is ideal for large problems , since the matrix is symmetrical by construction . There are specially adapted Cholesky variants for sparsely populated
  • The CG method can be used as a general option , although preconditioning is usually necessary here

convergence

The update vector in the Gauss-Newton iteration step has the form where . If full rank has, then and therefore also positive definite . On the other hand, the gradient of the quadratic problem , so a descent direction , i.e. H. it applies . From this follows (with a suitable choice of the step size ) the convergence of the Gauss-Newton method to a stationary point . From this representation it can also be seen that the Gauss-Newton method is essentially a scaled gradient method with the positively definite scaling matrix .

In general, no statement can be made about the speed of convergence . If the starting point is very far from the optimum or the matrix is poorly conditioned , the Gauss-Newton method converges as slowly as desired. On the other hand, if the starting point is sufficiently close to the optimum, one can show that the Gauss-Newton method converges quadratically.

extension

In order to improve the behavior in the case of badly conditioned or singular , the Gauss-Newton iteration step can be modified as follows

,

where the diagonal matrix is chosen so that positive is definite. With the choice , i.e. a scalar multiple of the identity matrix , one obtains the Levenberg-Marquardt algorithm .

example

The Rosenbrock function with .

The Rosenbrock function

is often used as a test for optimization methods because it is challenging because of the narrow and shallow valley in which iterative methods can only take small steps. The constants are usually chosen with , the global optimum in this case is with the function value .

In order to use the Gauss-Newton method, the Rosenbrock function must first be brought into the form "sum of squares of functions". Since the Rosenbrock function already consists of a sum of two terms, the approach is chosen

and

.

The Gauss-Newton problem for the Rosenbrock function is thus

whereby .

The Jacobi matrix results as and thus is . Since has full rank , positive is definite and the inverse exists. The following simple line search is used to determine the step size :

  1. Start with .
  2. Calculate the new point as well .
  3. If sit and go to the next iteration.
  4. Otherwise halve and go to 2.

The line search forces the new function value to be less than the previous one; it is guaranteed to terminate (with possibly a very small one ), since it is a direction of descent.

The starting point is chosen. The Gauss-Newton method converges to the global optimum in a few iterations:

Optimization of the Rosenbrock function with the Gauss-Newton method
Optimization with the Gauss-Newton method
0 (0, -0.1) 2
1 (0.1250, -0.0875) 1.8291
2 (0.2344, -0.0473) 1.6306
3 (0.4258, 0.0680) 1.6131
4th (0.5693, 0.2186) 1.3000
5 (0.7847, 0.5166) 1.0300
6th (1.0, 0.9536) 0.2150
7th (1.0, 1.0) 1.1212e-27

The gradient method (with the same line search) provides the following result in comparison, it does not find the optimum even after 500 iterations:

Optimization of the Rosenbrock function with the gradient method
Optimization with gradient methods
0 (0, -0.1) 2
1 (0.0156, 0.0562) 1.2827
2 (0.0337, -0.0313) 1.0386
3 (0.0454, 0.0194) 0.9411
4th (0.0628, -0.0077) 0.8918
5 (0.0875, 0.0286) 0.8765
500 (0.8513, 0.7233) 0.0223

literature

  • Dimitri P. Bertsekas: '' Nonlinear Programming. '' Second Edition, Athena Scientific, 1995, ISBN 9781886529144 .
  • Yurii Nesterov: " Introductory Lectures on Convex Optimization: A Basic Course. " Springer Science & Business Media, 2003, ISBN 978-1-4419-8853-9 .
  • Jorge Nocedal, Stephen Wright: " Numerical Optimization. " Springer Science & Business Media, 2000, ISBN 9780387987934 .
  • Amir Beck: " Introduction to Nonlinear Optimization. " SIAM, 2014, ISBN 978-1611973648 .

Individual evidence

  1. Ceres Solver documentation. Retrieved May 10, 2019 .