Quasi-Newton method

Quasi-Newton methods are a class of numerical methods for solving nonlinear minimization problems. The methods are based on Newton's method , but do not calculate the inverse of the Hesse matrix directly, but merely approximate it in order to reduce the computational effort per iteration .

The first algorithm was developed by William Davidon , a physicist at Argonne National Laboratory , in the mid-1950s . The best-known algorithms are Broyden-Fletcher-Goldfarb-Shanno (BFGS) , named after Roger Fletcher , Donald Goldfarb , David F. Shanno , Charles George Broyden , and Davidon-Fletcher-Powell (DFP) (after Fletcher, Davidon and Michael JD Powell ).

Basic algorithm

A doubly differentiable function is approximated to the second degree with a Taylor expansion. ${\ displaystyle f \ colon \ mathbb {R} ^ {n} \ rightarrow \ mathbb {R}}$

{\ displaystyle f (x) \ approx q (x) = f (x_ {k}) + (x-x_ {k}) ^ {T} \ nabla f (x_ {k}) + {1 \ over 2} (x-x_ {k}) ^ {T} H (x_ {k}) (x-x_ {k})}

The derivative of the function must be zero for a minimum. It follows : ${\ displaystyle q}$

{\ displaystyle \ nabla q (x) = \ nabla f (x_ {k}) + H (x_ {k}) (x-x_ {k}) = 0.}

If the Hessian matrix is positive definite , the said zero of the derivative of is actually a minimum of and this can be approximated iteratively with the Newton method: ${\ displaystyle H (x_ {k})}$ ${\ displaystyle q}$ ${\ displaystyle q}$

{\ displaystyle x_ {k + 1} = x_ {k} -H ^ {- 1} (x_ {k}) \ nabla f (x_ {k}).}

The problem here is that the inverse of the Hessian matrix is calculated and that it has to be positive. The quasi-Newton method is replaced by a scalar and a matrix ${\ displaystyle H ^ {- 1} (x_ {k})}$ ${\ displaystyle \ alpha}$ ${\ displaystyle M_ {k}}$

{\ displaystyle x_ {k + 1} = x_ {k} - \ alpha _ {k} M (x_ {k}) \ nabla f (x_ {k}).}

The derivation equation above gives transformed for and ${\ displaystyle x_ {k}}$ ${\ displaystyle x_ {k + 1}}$

{\ displaystyle \ nabla f (x_ {k}) = - H (x_ {k}) (x-x_ {k})}

{\ displaystyle \ nabla f (x_ {k + 1}) = - H (x_ {k + 1}) (x-x_ {k + 1}).}

From this we can deduce: ${\ displaystyle \ Delta g_ {k}}$

{\ displaystyle \ Delta g_ {k} = \ nabla f (x_ {k + 1}) - \ nabla f (x_ {k})}

{\ displaystyle \ Delta g_ {k} = - H (x_ {k + 1}) (x-x_ {k + 1}) + H (x_ {k}) (x-x_ {k}).}

It is now thought that the Hessian function and are approximately the same, and concludes: ${\ displaystyle x_ {k}}$ ${\ displaystyle x_ {k + 1}}$

{\ displaystyle \ Delta g_ {k} \ approx -H (x_ {k + 1}) (x_ {k} -x_ {k + 1})}

{\ displaystyle M_ {k + 1} \ Delta g_ {k} = x_ {k + 1} -x_ {k}.}

For one chooses a correction term of the form : ${\ displaystyle M_ {k + 1}}$ ${\ displaystyle cZZ ^ {T}}$

{\ displaystyle M_ {k + 1} = M_ {k} + cZZ ^ {T}}

{\ displaystyle M_ {k + 1} \ Delta g_ {k} = M_ {k} \ Delta g_ {k} + cZZ ^ {T} \ Delta g_ {k} = x_ {k + 1} -x_ {k} = \ Delta x_ {k}.}

The equation can be rearranged so that

{\ displaystyle cZ = {{\ Delta x_ {k} -M_ {k} \ Delta g_ {k}} \ over {Z ^ {T} \ Delta g_ {k}}}.}

Thus applies

{\ displaystyle Z = \ Delta x_ {k} -M_ {k} \ Delta g_ {k}}

{\ displaystyle c = {{1} \ over {Z ^ {T} \ Delta g_ {k}}}.}

In this way the matrix can be clearly determined, but with just one correction term it is not always positive. ${\ displaystyle M_ {k + 1}}$

Davidon-Fletcher-Powell (DFP)

The matrix is approximated with the matrix and two correction terms: ${\ displaystyle M_ {k + 1}}$ ${\ displaystyle M_ {k}}$

{\ displaystyle {\ begin {aligned} M_ {k + 1} & = M_ {k} + c_ {1} Z_ {1} Z_ {1} ^ {T} + c_ {2} Z_ {2} Z_ {2 } ^ {T} \\ M_ {k + 1} & = M_ {k} + {{\ Delta x_ {k} \ Delta x_ {k} ^ {T}} \ over {\ Delta x_ {k} ^ { T} \ Delta g_ {k}}} - {{M_ {k} \ Delta g_ {k} \ Delta g_ {k} ^ {T} M_ {k}} \ over {\ Delta g_ {k} ^ {T } M_ {k} \ Delta g_ {k}}} \ end {aligned}}}

properties

If is a quadratic function, with exact arithmetic the algorithm delivers the exact solution after a finite number of iterations. The following applies to all other functions ${\ displaystyle f}$

{\ displaystyle f (x_ {k + 1}) <f (x_ {k}).}

In the case of a quadratic function with parameters, the solution is ideally even achieved in steps. In practice, a little more iterations are needed, e.g. B. if the linear step size search is not carried out precisely enough or the gradients are not determined precisely enough. Usually you stop the optimization if z. B. the gradient is very small or a certain number of iterations is reached. ${\ displaystyle N}$ ${\ displaystyle N}$

literature

William C. Davidon, Variable Metric Method for Minimization , SIOPT Volume 1 Issue 1, Pages 1-17, 1991 (first as Argonne National Laboratory Report 1959).
Jorge Nocedal and Stephen J. Wright: Numerical Optimization , Springer-Verlag, 1999 ISBN 0-387-98793-2 .
Edwin KP Chong and Stanislaw H.Zak: An Introduction to Optimization , 2ed, John Wiley & Sons Pte. Ltd. August 2001.
P. Gill, W. Murray and M. Wright: Practical Optimization , 1981