Trust region procedure

The trust region methods are a class of robust and efficient globalization strategies for the numerical calculation of a local minimum of a possibly non-convex, one time continuously differentiable function. The trust region methods are closely related to the Levenberg-Marquardt algorithm , but have significant differences in the quadratic sub-problems to be solved and are deterministic in the choice of the step size restriction. Under certain circumstances it can also be shown that line search methods are special variants of trust region methods.

Overview

Trust region methods are used to solve nonlinear programming problems (NLP) of the type

{\ displaystyle u \ in H: J (u) = \ min!}

to solve.

Here it is assumed that once is continuously differentiable on . But that doesn't mean that it is a convex function . Since you can also solve the following non-smooth nonlinear programming problem with the slightest changes to the algorithm ${\ displaystyle J}$ ${\ displaystyle {\ mathcal {K}}}$ ${\ displaystyle J}$

{\ displaystyle u \ in {\ mathcal {K}} \ subset H: J (u) = \ min!}

solve, we will consider the trust region method for this class of problems. Here is with , as well as a Hilbert space. ${\ displaystyle {\ mathcal {K}} = \ {v \ in H: {\ underline {\ phi}} \ leq v \ leq {\ overline {\ phi}}, {\ text {almost everywhere}} \} }$ ${\ displaystyle {\ underline {\ phi}}, {\ overline {\ phi}} \ in H}$ ${\ displaystyle H}$

A trust region procedure is an iterative procedure which consists of individual, so-called trust region steps. Such a trust region step takes place in two logical units: First, a correction is calculated. The way in which such a correction is calculated can in principle be freely selected, as long as the correction fulfills a so-called "Sufficient Decrease Condition". For performance reasons, however, the correction is often calculated by solving a quadratic minimization problem under constraints. The corrections calculated with this Sequential Quadratic Programming (SQP) variant are under certain circumstances the corrections that are calculated in a (quasi-) Newton step. Since this correction can be arbitrarily bad in terms of the task of the minimization problem, the usefulness of the correction is measured and the secondary conditions of the quadratic problem are changed on the basis of this.

A trust region step in detail

Calculate the correction

We assume that a feasible iterate is given. How to calculate a correction by solving the following Quadratic Programming Problem (QP): ${\ displaystyle u ^ {k} \ in {\ mathcal {K}}}$

{\ displaystyle s \ in H: \ psi ^ {k} (s) = \ min! \ {{\ tilde {s}} \ in H: {\ tilde {s}} + u ^ {k} \ in { \ mathcal {K}}, \ | s \ | _ {H} \ leq \ Delta ^ {k} \}}

with the quadratic function

{\ displaystyle \ psi ^ {k} (s) = \ langle \ nabla J (u ^ {k}), s \ rangle + {\ frac {1} {2}} \ langle M ^ {k} s, s \ rangle}

Here refers to the trust region radius and a symmetric approximation to the possibly existing Hessian matrix of . In this case the function is a second order Taylor approximation an . ${\ displaystyle \ Delta ^ {k} \ in \ mathbb {R} ^ {+}}$ ${\ displaystyle M ^ {k} \ colon H \ to H '}$ ${\ displaystyle u ^ {k}}$ ${\ displaystyle M ^ {k} = \ nabla ^ {2} J (u ^ {k})}$ ${\ displaystyle \ psi ^ {k} (s)}$ ${\ displaystyle J (u ^ {k} + s) -J (u ^ {k})}$

We assume briefly that the matrix is symmetrically positive (semi-) definite and that there are no constraints in the above QP problem. So the necessary and also sufficient conditions for a minimum are straight ${\ displaystyle M ^ {k}}$

{\ displaystyle s \ in H: M ^ {k} s = - \ nabla J (u ^ {k})}

which is a quasi-Newton step .

Comment on the solution of the quadratic problem

This minimization problem can generally be solved approximately. This means that the solution just has to be a better solution to the QP problem than the so-called Cauchy point . More precisely, this means that the following inequality must be satisfied ${\ displaystyle s}$

{\ displaystyle - \ psi ^ {k} (s) \ geq \ beta \ | D (u ^ {k}) \ nabla J (u ^ {k}) \ | \ min \ {\ | D (u ^ { k}) \ nabla J (u ^ {k}), \ Delta ^ {k} \}}

where is a Coleman-Li scaling matrix that stores the distance to the obstacle on the main diagonal . ${\ displaystyle D (u)}$

Without further assumptions one must use a formulation with a penalty function for the solution of the quadratic minimization problem in order to be included in the solution. However, for a finite dimensional space , can be replaced by and a truncated conjugate gradient method (truncated CG) or a nonlinear Gauss-Seidel method can be used for the solution. ${\ displaystyle \ | s \ | _ {H}}$ ${\ displaystyle H}$ ${\ displaystyle \ | s \ | _ {H}}$ ${\ displaystyle \ | s \ | _ {\ infty}}$

As mentioned, a correction can be as bad as you want, so a scalar, the so-called contraction rate, is calculated to determine the quality of a correction ${\ displaystyle s}$

{\ displaystyle \ rho ^ {k} = {\ frac {J (u ^ {k}) - J (u ^ {k} + s)} {- \ psi ^ {k} (s)}}}

The value measures the deviation of the quadratic function by the predicted reduction of (predicted reduction) of the real (actual reduction). It is therefore often found in the literature ${\ displaystyle \ rho ^ {k}}$ ${\ displaystyle \ psi ^ {k} (s)}$ ${\ displaystyle J}$

{\ displaystyle \ rho ^ {k} = {\ frac {{\ text {ared}} _ {k} (s)} {{\ text {pred}} _ {k} (s)}}}

Comment on : In fact, the rate of contraction measures the linearity of the gradient of . If is a quadratic function, the rate of contraction will always be 1, provided . In practice, you should therefore also test whether this applies. ${\ displaystyle \ rho ^ {k}}$ ${\ displaystyle J}$ ${\ displaystyle J}$ ${\ displaystyle M ^ {k} = \ nabla ^ {2} J (u ^ {k})}$ ${\ displaystyle J (u ^ {k}) - J (u ^ {k} + s)> 0}$

Update step

Finally, it is determined on the basis of whether the correction is accepted and how the next trust region radius is chosen. ${\ displaystyle \ rho ^ {k}}$ ${\ displaystyle \ Delta ^ {k + 1}}$

If so, and is chosen. Otherwise is and ${\ displaystyle \ rho ^ {k} \ geq \ eta}$ ${\ displaystyle u ^ {k + 1} = u ^ {k} + s}$ ${\ displaystyle \ Delta ^ {k + 1} = \ gamma _ {2} \ Delta ^ {k}}$ ${\ displaystyle u ^ {k + 1} = u ^ {k}}$ ${\ displaystyle \ Delta ^ {k + 1} = \ gamma _ {1} \ Delta ^ {k}}$

Here and are to be chosen a priori. In the literature, other constants are often used to fine-tune the choice of the new trust region radius, but the method manages with these constants. ${\ displaystyle \ eta \ in (0,1)}$ ${\ displaystyle \ gamma _ {2}> 1> \ gamma _ {1}> 0}$

Convergence properties

Under certain but very weak assumptions one can show that the iterates calculated in this way converge to solutions of the necessary first-order conditions.

Basically you proceed as follows:

the solution of the QP problem always satisfies the sufficient decrease condition (if not one chooses the Cauchy point). That means: If corrections are successful, i.e. if the following inequality applies ${\ displaystyle \ rho ^ {k} \ geq \ eta}$

{\ displaystyle J (u ^ {k}) - J (u ^ {k} + s) \ geq - \ eta \ psi ^ {k} (s) \ geq - \ eta \ beta \ | D (u ^ { k}) \ nabla J (u ^ {k}) \ | \ min \ {\ | D (u ^ {k}) \ nabla J (u ^ {k}) \ |, \ Delta ^ {k} \} }

So you can limit the actual descent in by the norm of the gradient and the trust region radius downwards. ${\ displaystyle J}$

If one now assumes that the norm of the gradient is bounded by below, one obtains the following: Assuming there are infinitely many successful steps, then converges to or and one solves the problem (at least its necessary first-order conditions). Otherwise, the trust region radius must converge to 0 due to the above inequality. In that case, however, the quadratic function would provide an increasingly better approximation and the decrease ratio would approach 1. But that would contradict the assumption that there are not an infinite number of successful steps. ${\ displaystyle \ varepsilon> 0}$ ${\ displaystyle J (u ^ {k})}$ ${\ displaystyle - \ infty}$ ${\ displaystyle \ | D (u ^ {k}) \ nabla J (u ^ {k}) \ | \ to 0}$ ${\ displaystyle J (u ^ {k}) - J (u ^ {k} + s)}$ ${\ displaystyle \ rho ^ {k}}$

Differences to other procedures

The Levenberg-Marquardt method performs a nondeterministic update of the step size restriction.
The trust region method is Newton-like; H. the corrections are calculated using a second-order Taylor expansion; the Levenberg-Marquardt method solves a quadratic auxiliary problem based on a first-order Taylor expansion.
In contrast to the line search method, in the trust region method, the step size restriction is selected before a correction is calculated. Due to the additional constraints in the minimization problem , a different and better solution is calculated than the same step size restriction in the line search algorithm would provide. ${\ displaystyle \ psi ^ {k} (s)}$

Enhancements to trust region procedures

To solve nonlinear programming problems with even more general constraints of the kind

{\ displaystyle u \ in {\ mathcal {K}} \ subset H: J (u) = \ min!}

wherein can be so-called filter-Trust Region method use.

{\ displaystyle {\ mathcal {K}} = \ {u \ in H: g ^ {I} (u) \ leq 0 {\ text {and}} g ^ {E} (u) = 0 \}}

There are extensions of trust region procedures that also calculate convergence to critical points of the second order, i.e. points for which applies ${\ displaystyle u ^ {*}}$

{\ displaystyle \ | D (u ^ {*}) \ nabla J (u ^ {*}) \ | = 0}

and .

{\ displaystyle \ forall s \ in H: u ^ {*} + s \ in {\ mathcal {K}} \ Rightarrow \ langle s, \ nabla ^ {2} J (u ^ {*}) s \ rangle \ geq 0}

Methods based on trust region procedures

PDFO ( Michael JD Powell )
LANCELOT (Andrew Conn, Nick Gould, and Philippe Toint)

literature

Thomas F. Coleman, Yuying Li: On the convergence of interior-reflective Newton methods for nonlinear minimization subject to bounds. In: Math. Programming. 67 (2, Ser. A) 1994, ISSN 0025-5610 , pp. 189-224.
AR Conn, NIM Gould, Ph. L. Toint: Trust-region methods . Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

Individual evidence

^ ^A ^b Thomas F. Coleman, Yuying Li: On the convergence of interior-reflective Newton methods for nonlinear minimization subject to bounds.
↑ AR Conn, NIM Gould, Ph. L. Toint: Trust-region methods.

[coleman-1] A ^b Thomas F. Coleman, Yuying Li: On the convergence of interior-reflective Newton methods for nonlinear minimization subject to bounds.

[2] AR Conn, NIM Gould, Ph. L. Toint: Trust-region methods.