Gauss-Newton method
The Gauss-Newton method (after Carl Friedrich Gauß and Isaac Newton ) is a numerical method for solving nonlinear minimization problems using the least squares method . The method is related to the Newton method for solving non-linear optimization problems, but has the advantage that the calculation of the 2nd derivative required for the Newton method is not necessary. The calculation of the 2nd derivative is often a limiting factor, especially for large problems with tens of thousands of parameters.
The optimization problem
The Gauss-Newton method solves problems in which the minimum of a sum of squares of continuously differentiable functions is sought, i.e.
with . With the Euclidean norm, this can also be written as
with . Problems of this form occur frequently in practice, in particular the nonlinear problem is equivalent to minimizing , provided that it has a zero . Is a linear map , there is the standard case, the method of least squares linear model function .
The method
The basic idea of the Gauss-Newton method is to linearize the objective function and to optimize the linearization in the sense of the least squares. The linearization, i.e. the 1st order Taylor expansion , of in the point reads
- .
The matrix is the Jacobi matrix and is often referred to as. The linear least squares problem is obtained
- ,
with gradient .
Setting the gradient to zero provides the so-called normal equations
with the explicit solution
- .
The Gauss-Newton iteration step results directly from this
,
making it clear that the Jacobi matrix is evaluated at this point and is a step size .
To solve the linear system of equations in the Gauss-Newton iteration step, there are different possibilities depending on the problem size and the structure:
- Small problems ( ) are best solved with QR decomposition
- The Cholesky decomposition is ideal for large problems , since the matrix is symmetrical by construction . There are specially adapted Cholesky variants for sparsely populated
- The CG method can be used as a general option , although preconditioning is usually necessary here
convergence
The update vector in the Gauss-Newton iteration step has the form where . If full rank has, then and therefore also positive definite . On the other hand, the gradient of the quadratic problem , so a descent direction , i.e. H. it applies . From this follows (with a suitable choice of the step size ) the convergence of the Gauss-Newton method to a stationary point . From this representation it can also be seen that the Gauss-Newton method is essentially a scaled gradient method with the positively definite scaling matrix .
In general, no statement can be made about the speed of convergence . If the starting point is very far from the optimum or the matrix is poorly conditioned , the Gauss-Newton method converges as slowly as desired. On the other hand, if the starting point is sufficiently close to the optimum, one can show that the Gauss-Newton method converges quadratically.
extension
In order to improve the behavior in the case of badly conditioned or singular , the Gauss-Newton iteration step can be modified as follows
- ,
where the diagonal matrix is chosen so that positive is definite. With the choice , i.e. a scalar multiple of the identity matrix , one obtains the Levenberg-Marquardt algorithm .
example
is often used as a test for optimization methods because it is challenging because of the narrow and shallow valley in which iterative methods can only take small steps. The constants are usually chosen with , the global optimum in this case is with the function value .
In order to use the Gauss-Newton method, the Rosenbrock function must first be brought into the form "sum of squares of functions". Since the Rosenbrock function already consists of a sum of two terms, the approach is chosen
and
- .
The Gauss-Newton problem for the Rosenbrock function is thus
- whereby .
The Jacobi matrix results as and thus is . Since has full rank , positive is definite and the inverse exists. The following simple line search is used to determine the step size :
- Start with .
- Calculate the new point as well .
- If sit and go to the next iteration.
- Otherwise halve and go to 2.
The line search forces the new function value to be less than the previous one; it is guaranteed to terminate (with possibly a very small one ), since it is a direction of descent.
The starting point is chosen. The Gauss-Newton method converges to the global optimum in a few iterations:
0 | (0, -0.1) | 2 |
1 | (0.1250, -0.0875) | 1.8291 |
2 | (0.2344, -0.0473) | 1.6306 |
3 | (0.4258, 0.0680) | 1.6131 |
4th | (0.5693, 0.2186) | 1.3000 |
5 | (0.7847, 0.5166) | 1.0300 |
6th | (1.0, 0.9536) | 0.2150 |
7th | (1.0, 1.0) | 1.1212e-27 |
The gradient method (with the same line search) provides the following result in comparison, it does not find the optimum even after 500 iterations:
0 | (0, -0.1) | 2 |
1 | (0.0156, 0.0562) | 1.2827 |
2 | (0.0337, -0.0313) | 1.0386 |
3 | (0.0454, 0.0194) | 0.9411 |
4th | (0.0628, -0.0077) | 0.8918 |
5 | (0.0875, 0.0286) | 0.8765 |
500 | (0.8513, 0.7233) | 0.0223 |
literature
- Dimitri P. Bertsekas: '' Nonlinear Programming. '' Second Edition, Athena Scientific, 1995, ISBN 9781886529144 .
- Yurii Nesterov: " Introductory Lectures on Convex Optimization: A Basic Course. " Springer Science & Business Media, 2003, ISBN 978-1-4419-8853-9 .
- Jorge Nocedal, Stephen Wright: " Numerical Optimization. " Springer Science & Business Media, 2000, ISBN 9780387987934 .
- Amir Beck: " Introduction to Nonlinear Optimization. " SIAM, 2014, ISBN 978-1611973648 .
Individual evidence
- ↑ Ceres Solver documentation. Retrieved May 10, 2019 .