# Gauss-Newton method

The Gauss-Newton method (after Carl Friedrich Gauß and Isaac Newton ) is a numerical method for solving nonlinear minimization problems using the least squares method . The method is related to the Newton method for solving non-linear optimization problems, but has the advantage that the calculation of the 2nd derivative required for the Newton method is not necessary. The calculation of the 2nd derivative is often a limiting factor, especially for large problems with tens of thousands of parameters.

## The optimization problem

The Gauss-Newton method solves problems in which the minimum of a sum of squares of continuously differentiable functions is sought, i.e. ${\ displaystyle f_ {i} \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R}}$

${\ displaystyle \ min _ {x \ in \ mathbb {R} ^ {n}} \ left \ {{\ frac {1} {2}} \ sum _ {i = 1} ^ {m} \ left (f_ {i} (x) \ right) ^ {2} \ right \}}$

with . With the Euclidean norm, this can also be written as ${\ displaystyle m \ geq n}$ ${\ displaystyle \ | \ cdot \ |}$

${\ displaystyle \ min _ {x \ in \ mathbb {R} ^ {n}} \ left \ {{\ frac {1} {2}} \ | f (x) \ | ^ {2} \ right \} }$

with . Problems of this form occur frequently in practice, in particular the nonlinear problem is equivalent to minimizing , provided that it has a zero . Is a linear map , there is the standard case, the method of least squares linear model function . ${\ displaystyle f = (f_ {1}, \ dotsc, f_ {m}) \ colon \ mathbb {R} ^ {n} \ to \ mathbb {R} ^ {m}}$${\ displaystyle f (x) = 0}$${\ displaystyle {\ tfrac {1} {2}} \ | f (x) \ | ^ {2}}$${\ displaystyle f}$${\ displaystyle f}$

## The method

The basic idea of ​​the Gauss-Newton method is to linearize the objective function and to optimize the linearization in the sense of the least squares. The linearization, i.e. the 1st order Taylor expansion , of in the point reads ${\ displaystyle f}$${\ displaystyle f}$${\ displaystyle x ^ {0} \ in \ mathbb {R} ^ {n}}$

${\ displaystyle {\ tilde {f}} (x) = f (x ^ {0}) + \ nabla f (x ^ {0}) ^ {T} (xx ^ {0})}$.

The matrix is the Jacobi matrix and is often referred to as. The linear least squares problem is obtained ${\ displaystyle \ nabla f (x ^ {0}) ^ {T}}$${\ displaystyle J}$

${\ displaystyle \ min _ {x \ in \ mathbb {R} ^ {n}} \ left \ {{\ frac {1} {2}} \ | {\ tilde {f}} (x) \ | ^ { 2} = {\ frac {1} {2}} \ | J (xx ^ {0}) + f (x ^ {0}) \ | ^ {2} \ right \}}$,

with gradient . ${\ displaystyle \ nabla {\ frac {1} {2}} \ left \ Vert {\ tilde {f}} (x) \ right \ Vert ^ {2} = J ^ {T} \ left (J (xx ^ {0}) + f (x ^ {0}) \ right)}$

Setting the gradient to zero provides the so-called normal equations

${\ displaystyle J ^ {T} J (xx ^ {0}) = - J ^ {T} f (x ^ {0})}$

with the explicit solution

${\ displaystyle x = x ^ {0} - \ left (J ^ {T} J \ right) ^ {- 1} J ^ {T} f (x ^ {0})}$.

The Gauss-Newton iteration step results directly from this

 ${\ displaystyle x ^ {k + 1} = x ^ {k} - \ alpha ^ {k} \ left ((J | _ {x ^ {k}}) ^ {T} J | _ {x ^ {k }} \ right) ^ {- 1} (J | _ {x ^ {k}}) ^ {T} f (x ^ {k})}$,

making it clear that the Jacobi matrix is evaluated at this point and is a step size . ${\ displaystyle J | _ {x ^ {k}}}$${\ displaystyle x ^ {k}}$${\ displaystyle \ alpha ^ {k} \ geq 0}$

To solve the linear system of equations in the Gauss-Newton iteration step, there are different possibilities depending on the problem size and the structure:

• Small problems ( ) are best solved with QR decomposition${\ displaystyle n <1000, \ m <10000}$
• The Cholesky decomposition is ideal for large problems , since the matrix is symmetrical by construction . There are specially adapted Cholesky variants for sparsely populated${\ displaystyle J ^ {T} J}$ ${\ displaystyle J ^ {T} J}$
• The CG method can be used as a general option , although preconditioning is usually necessary here

## convergence

The update vector in the Gauss-Newton iteration step has the form where . If full rank has, then and therefore also positive definite . On the other hand, the gradient of the quadratic problem , so a descent direction , i.e. H. it applies . From this follows (with a suitable choice of the step size ) the convergence of the Gauss-Newton method to a stationary point . From this representation it can also be seen that the Gauss-Newton method is essentially a scaled gradient method with the positively definite scaling matrix . ${\ displaystyle d = -DJ ^ {T} f (x)}$${\ displaystyle D = \ left (J ^ {T} J \ right) ^ {- 1}}$${\ displaystyle J}$ ${\ displaystyle J ^ {T} J}$${\ displaystyle D}$ ${\ displaystyle J ^ {T} f (x)}$${\ displaystyle \ min _ {x \ in \ mathbb {R} ^ {n}} {\ frac {1} {2}} \ | f (x) \ | ^ {2}}$${\ displaystyle d}$${\ displaystyle \ left (J ^ {T} f (x) \ right) ^ {T} d <0}$${\ displaystyle \ alpha ^ {k}}$ ${\ displaystyle D}$

In general, no statement can be made about the speed of convergence . If the starting point is very far from the optimum or the matrix is poorly conditioned , the Gauss-Newton method converges as slowly as desired. On the other hand, if the starting point is sufficiently close to the optimum, one can show that the Gauss-Newton method converges quadratically. ${\ displaystyle x ^ {0}}$${\ displaystyle J ^ {T} J}$${\ displaystyle x ^ {0}}$

### extension

In order to improve the behavior in the case of badly conditioned or singular , the Gauss-Newton iteration step can be modified as follows ${\ displaystyle J ^ {T} J}$

${\ displaystyle x ^ {k + 1} = x ^ {k} - \ alpha ^ {k} \ left ((J | _ {x ^ {k}}) ^ {T} J | _ {x ^ {k }} + \ Delta ^ {k} \ right) ^ {- 1} (J | _ {x ^ {k}}) ^ {T} f (x ^ {k})}$,

where the diagonal matrix is chosen so that positive is definite. With the choice , i.e. a scalar multiple of the identity matrix , one obtains the Levenberg-Marquardt algorithm . ${\ displaystyle \ Delta ^ {k}}$${\ displaystyle J | _ {x ^ {k}} ^ {T} J | _ {x ^ {k}} + \ Delta ^ {k}}$${\ displaystyle \ Delta ^ {k} = \ beta ^ {k} I, \ \ beta \ geq 0}$

## example

The Rosenbrock function with .${\ displaystyle a = 1, \ b = 100}$
${\ displaystyle g: \ mathbb {R} ^ {2} \ to \ mathbb {R}: x \ mapsto \ left (a-x_ {1} \ right) ^ {2} + b \ left (x_ {2} -x_ {1} ^ {2} \ right) ^ {2}}$

is often used as a test for optimization methods because it is challenging because of the narrow and shallow valley in which iterative methods can only take small steps. The constants are usually chosen with , the global optimum in this case is with the function value . ${\ displaystyle a = 1, \ b = 100}$${\ displaystyle x ^ {*} = (1,1)}$${\ displaystyle g (x ^ {*}) = 0}$

In order to use the Gauss-Newton method, the Rosenbrock function must first be brought into the form "sum of squares of functions". Since the Rosenbrock function already consists of a sum of two terms, the approach is chosen

${\ displaystyle {\ frac {1} {2}} \ left (f_ {1} (x) \ right) ^ {2} = \ left (a-x_ {1} \ right) ^ {2} \ \ \ Longleftrightarrow \ \ f_ {1} (x) = {\ sqrt {2}} (a-x_ {1})}$

and

${\ displaystyle {\ frac {1} {2}} \ left (f_ {2} (x) \ right) ^ {2} = b \ left (x_ {2} -x_ {1} ^ {2} \ right ) ^ {2} \ \ \ Longleftrightarrow \ \ f_ {2} (x) = {\ sqrt {2b}} (x_ {2} -x_ {1} ^ {2})}$.

The Gauss-Newton problem for the Rosenbrock function is thus

${\ displaystyle \ min _ {x \ in \ mathbb {R} ^ {2}} {\ frac {1} {2}} \ | f (x) \ | ^ {2}}$whereby .${\ displaystyle f: \ mathbb {R} ^ {2} \ to \ mathbb {R} ^ {2}: x \ mapsto {\ begin {pmatrix} f_ {1} (x) \\ f_ {2} (x ) \ end {pmatrix}} = {\ begin {pmatrix} {\ sqrt {2}} (a-x_ {1}) \\ {\ sqrt {2b}} (x_ {2} -x_ {1} ^ { 2}) \ end {pmatrix}}}$

The Jacobi matrix results as and thus is . Since has full rank , positive is definite and the inverse exists. The following simple line search is used to determine the step size : ${\ displaystyle J = {\ begin {bmatrix} {\ tfrac {\ partial f_ {1}} {\ partial x_ {1}}} & {\ tfrac {\ partial f_ {1}} {\ partial x_ {2} }} \\ {\ tfrac {\ partial f_ {2}} {\ partial x_ {1}}} & {\ tfrac {\ partial f_ {2}} {\ partial x_ {2}}} \ end {bmatrix} } = {\ begin {bmatrix} - {\ sqrt {2}} & 0 \\ - 2x_ {1} {\ sqrt {2b}} & {\ sqrt {2b}} \ end {bmatrix}}}$${\ displaystyle D = J ^ {T} J = {\ begin {bmatrix} 8bx_ {1} ^ {2} + 2 & -4bx_ {1} \\ - 4bx_ {1} & 2b \ end {bmatrix}}}$${\ displaystyle J}$${\ displaystyle D}$${\ displaystyle D ^ {- 1}}$${\ displaystyle \ alpha ^ {k}}$

1. Start with .${\ displaystyle \ alpha ^ {k} = 1}$
2. Calculate the new point as well .${\ displaystyle {\ tilde {x}} = x ^ {k} + \ alpha ^ {k} d}$${\ displaystyle d = - \ left ((J | _ {x ^ {k}}) ^ {T} J | _ {x ^ {k}} \ right) ^ {- 1} (J | _ {x ^ {k}}) ^ {T} f (x ^ {k})}$
3. If sit and go to the next iteration.${\ displaystyle f ({\ tilde {x}}) ${\ displaystyle x ^ {k + 1} = {\ tilde {x}}}$
4. Otherwise halve and go to 2.${\ displaystyle \ alpha ^ {k}}$

The line search forces the new function value to be less than the previous one; it is guaranteed to terminate (with possibly a very small one ), since it is a direction of descent. ${\ displaystyle \ alpha ^ {k}}$${\ displaystyle d}$

The starting point is chosen. The Gauss-Newton method converges to the global optimum in a few iterations: ${\ displaystyle x ^ {0} = (0, -0.1)}$

Optimization of the Rosenbrock function with the Gauss-Newton method
Optimization with the Gauss-Newton method
${\ displaystyle k}$ ${\ displaystyle x ^ {k}}$ ${\ displaystyle g (x ^ {k})}$
0 (0, -0.1) 2
1 (0.1250, -0.0875) 1.8291
2 (0.2344, -0.0473) 1.6306
3 (0.4258, 0.0680) 1.6131
4th (0.5693, 0.2186) 1.3000
5 (0.7847, 0.5166) 1.0300
6th (1.0, 0.9536) 0.2150
7th (1.0, 1.0) 1.1212e-27

The gradient method (with the same line search) provides the following result in comparison, it does not find the optimum even after 500 iterations:

Optimization of the Rosenbrock function with the gradient method
Optimization with gradient methods
${\ displaystyle k}$ ${\ displaystyle x ^ {k}}$ ${\ displaystyle g (x ^ {k})}$
0 (0, -0.1) 2
1 (0.0156, 0.0562) 1.2827
2 (0.0337, -0.0313) 1.0386
3 (0.0454, 0.0194) 0.9411
4th (0.0628, -0.0077) 0.8918
5 (0.0875, 0.0286) 0.8765
${\ displaystyle \ vdots}$ ${\ displaystyle \ vdots}$ ${\ displaystyle \ vdots}$
500 (0.8513, 0.7233) 0.0223

## literature

• Dimitri P. Bertsekas: '' Nonlinear Programming. '' Second Edition, Athena Scientific, 1995, ISBN 9781886529144 .
• Yurii Nesterov: " Introductory Lectures on Convex Optimization: A Basic Course. " Springer Science & Business Media, 2003, ISBN 978-1-4419-8853-9 .
• Jorge Nocedal, Stephen Wright: " Numerical Optimization. " Springer Science & Business Media, 2000, ISBN 9780387987934 .
• Amir Beck: " Introduction to Nonlinear Optimization. " SIAM, 2014, ISBN 978-1611973648 .

## Individual evidence

1. Ceres Solver documentation. Retrieved May 10, 2019 .