Nonlinear Optimization

Non-linear program with permissible range and optimum

In mathematics, nonlinear optimization (also called nonlinear program , NLP ) is the project to optimize a scalar objective function of one or more real variables in a restricted range, whereby the objective function or the range limits are not linear ( affine ). It is a sub-area of mathematical optimization and an upper area of convex optimization . In contrast to the terms mentioned, the application to differentiable non-linear objective functions is described here without restriction to the convexity of the objective function or the search area. In the section Terms: objective function, constraints, admissible set, local and global optimization, there are essential explanations.

Fields of application

Nonlinear programs can be found in a variety of ways in science and engineering.

Economics can be about minimizing the costs of a process that has limitations in the availability of resources and capacities. The cost function can be non-linear therein. In theoretical mechanics there is an extremal principle in Hamilton's principle , the solution of which represents a nonlinear program in the case of nonlinear boundary conditions.

Modern engineering applications often involve optimization tasks in a complicated way. It can be a question of minimizing the weight of a component, which at the same time has to meet certain requirements (e.g. restrictions on the installation space, upper limits for deformations with given loads).

When using a mathematical model , it can be a matter of adapting the parameters of the model to measured values. Non-linear influences of the parameters and restrictions on the parameters (e.g. that only positive values are permitted) lead to a non-linear program.

In these questions it is often not known a priori whether the problem posed is convex or not. Sometimes one observes a dependency of the found optimal solution on the starting point of the search. Then local optima have been found and the problem is definitely not convex.

Problem definition

There are many possible formulations of a non-linear program. At this point the most general form should be chosen. The input parameter is taken from the , that is, the problem depends on influencing parameters that are embedded in the vector . The objective function is once continuously differentiable. Furthermore, the constraints (NB) are given in inequality form with and in equality form with and are once continuously differentiable. Then the problem is mathematically: ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle n}$ ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle f \ colon D \ to \ mathbb {R}}$ ${\ displaystyle g_ {i} \ colon D \ to \ mathbb {R}}$ ${\ displaystyle 1 \ leq i \ leq m}$ ${\ displaystyle h_ {k} \ colon D \ to \ mathbb {R}}$ ${\ displaystyle 1 \ leq k \ leq l}$ ${\ displaystyle P}$

{\ displaystyle {\ begin {array} {llll} P: & f ({\ vec {x}}) && \ to \ mathrm {extremum} \\ f: & {\ vec {x}} \ in S \ subset & D \ subseteq \ mathbb {R} ^ {n} & \ to \ mathbb {R} \\ S: & \ {{\ vec {x}} \ in D \; | & g_ {i} ({\ vec {x} }) \ leq 0 & {\ text {for}} \; 1 \ leq i \ leq m, \\ && h_ {k} ({\ vec {x}}) = 0 & {\ text {for}} \; 1 \ leq k \ leq l \} \\\ end {array}}}

.

The permissible range is restricted by the secondary conditions (NB): The NB should be fulfilled for all values of the parameters from the permissible range ( ). Allowed the problem if the allowable range is not empty. ${\ displaystyle S}$ ${\ displaystyle {\ vec {x}} \ in S}$ ${\ displaystyle P}$ ${\ displaystyle S}$

The theoretical treatment of non-linear optimization is mostly restricted to minimization problems. Indeed, the maximization problem of a function can be reformulated into a minimization problem of or , if it is certain. ${\ displaystyle f}$ ${\ displaystyle -f}$ ${\ displaystyle 1 / f}$ ${\ displaystyle f> 0}$

Action

The problem is reduced to the optimization of an auxiliary function without NB using the methods described below. In order to be able to make use of the gradient-based methods, the area to be searched is divided into areas where the objective function can be differentiated. If possible, the sub-areas should be convex and so should the objective function in them. Then you can calculate the global extrema in the sub-areas with the methods listed in Mathematical Optimization and Convex Optimization and select the optimal one.

The construction of the auxiliary function should be explained using an example: Two balls in a trough try to occupy the lowest possible point, but must not penetrate each other. The objective function is therefore the positional energy of the spheres and assumes a minimum in equilibrium. The secondary condition here would be the penetration of the balls and , with negative penetration meaning a positive distance. ${\ displaystyle g (x, y) \ leq 0}$ ${\ displaystyle x}$ ${\ displaystyle y}$

Lagrange multipliers : The NB are multiplied by real factors, the Lagrange multipliers, and incorporated into the objective function, so that if the Lagrange multipliers are positive, violation of the NB is punished. The auxiliary function obtained in this way is called the Lagrange function . The Lagrange multipliers are introduced into the problem as unknowns and must also be determined. In the case of balls, the Lagrange multipliers are precisely the contact forces that the balls exert on one another when they touch, so that they do not penetrate one another.

Barrier functions: The NB are represented with barrier functions that take on positive values when approaching the limit of the definition range and grow to infinity on the limit. The barrier functions are multiplied by barrier parameters and incorporated into the objective function, so that approaching the limit is punished and thus the violation of the NB is prevented. In the spherical image, the spheres would have a more or less thick jacket that becomes more and more rigid the more it is compressed when touched. A violation of the NL is thus prevented at the price that even approaching the range limit is penalized. The method is used with interior point methods . ${\ displaystyle r}$

Penalty functions: The penalty functions are used like the barrier functions. The NB are displayed with penalty functions that disappear in the permissible range and are positive if the NB is violated. The penalty functions are multiplied by penalty parameters and built into the objective function, so that the violation of the NB is punished, hence the name. Active NBs may be violated here and the admissibility of the solution must be checked. In the sphere image, the penalty function corresponds to the “real” penetration (which disappears if the distance between the balls is positive) and the penalty parameter corresponds to a spring stiffness. The spring tries to pull penetrating points back to the surface. The stiffer the spring, the less the penetration will be.

{\ displaystyle r}

Advanced Lagrangian method ( English augmented Lagrange method ): This is a combination of the Lagrange multipliers and penalty method. The Lagrange multiplier is determined iteratively based on the violation of the NB.

Trivial (and therefore often not dealt with in the sources), but it should be mentioned and in practical use, is that active NB can be used to eliminate variables. The variables are set to values so that a violation of the NB is now impossible. In the sphere image, the points of contact of the spheres would be coupled to each other (equate their coordinates) so that penetration (there) can no longer take place.

The advantages and disadvantages of the methods described are summarized in the table:

method	advantages	disadvantage
Lagrange multipliers	Meets the NB exactly The sign of the multiplier indicates whether the NB is active or not.	Introduces additional unknowns. Introduces vanishing diagonal terms into the system of equations (problematic with Cholesky decomposition ).
Barrier functions	The NB are observed. For , the solution found changes into the one found with Lagrange multipliers. ${\ displaystyle r \ to 0}$	The contour lines of the auxiliary function do not match those of the objective function. In the extreme case there is no optimum at all for some values . ${\ displaystyle r}$ When approaching the limit of the permissible range and with , the Hessian matrix is poorly conditioned. ${\ displaystyle r \ to 0}$
Criminal proceedings	For , the solution found changes into the one found with Lagrange multipliers. ${\ displaystyle r \ to \ infty}$	The secondary condition is only approximately fulfilled. With which is Hessian matrix ill-conditioned. ${\ displaystyle r \ to \ infty}$ The activity of the NL can only be checked by evaluating the functions and . ${\ displaystyle g_ {i}}$ ${\ displaystyle h_ {k}}$
Extended Lagrange method	As the number of iterations increases, the solution found changes into the solution found with Lagrange multipliers. Fulfills the NB as precisely as required. The sign of the multiplier indicates whether the secondary condition is active or not.	Requires multiple converged solutions to the global problem.
Elimination of the degrees of freedom	Reduces the number of unknowns. Keeps equality constraints exactly.	Only applicable if the NL's activity is known.

Theory of Optimization

Isolated points

In a non-linear program, NB can restrict the permissible range in some points in such a way that no point in its vicinity is in the permissible range. In mathematical terms, this means that there is an environment such that ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle U _ {\ varepsilon} ({\ vec {x}})}$ ${\ displaystyle U _ {\ varepsilon} ({\ vec {x}})}$

{\ displaystyle U _ {\ varepsilon} ({\ vec {x}}) \ cap S = \ lbrace {\ vec {x}} \ rbrace}

applies. Isolated points must all be checked individually, each for itself, for optimality.

Regularity conditions, tangent and linearizing cones

Example of a non-linear program

For the formulation of optimality conditions, the NL must meet certain requirements. constraint qualifications (CQ). Among other things, it is a matter of excluding optimal points from consideration that are isolated or in which there are redundant NB. There are several formulations of different strengths that ensure compliance with this CQ. Points in which the requirements are met are called regular . Irregular points in which none of these requirements apply must be excluded or considered separately.

The terms tangent cone and linearizing cone are central to the formulation of the requirements for the NB and the optimality conditions . In order to make this clear to yourself, you stand at a point in the permitted area and walk to the destination point, observing the NB (the NB can be imagined as impenetrable walls) . The tangent cone is then the set of all possible directions from which one can arrive at the target point . In the case of a linearizing cone, the NB are first linearized, i.e. H. replaced by their tangents at the target point . The linearizing cone is then the set of all possible directions from which one can arrive at the target point taking into account the linearized NB . The tangent cone and linearizing cone differ where two walls run parallel at the location and the target point is, so to speak, in a corridor (width 0). You can then arrive in the linearizing cone from both directions of the corridor, it linearized the walls. If the initially parallel walls immediately lose their parallelism in one direction and close the aisle, so that no small step in this direction is possible, you can only arrive in the tangent cone from the open direction . That is the difference, see the first pathological case below. In the graphic, tangent cone and linearizing cone match at the optimal point and are indicated in red. ${\ displaystyle T (S, {\ vec {x}})}$ ${\ displaystyle {\ vec {x}} _ {0} \ neq {\ vec {x}}}$ ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle {\ vec {x}}}$ ${\ displaystyle {\ vec {x}}}$

The requirements for the NB ensure that the tangent cone and the linearizing cone coincide at the optimal point and that the optimal point is not isolated. The correspondence between the linearizing cone and the tangential cone is sometimes listed as a regularity condition of its own and is called Abadie Constraint Qualification . Examples of regularity conditions are:

Slater condition (only for convex problems): There is a pointsuch thatfor alland all equation constraints in arefulfilled. At this point it should be mentioned that Slater's Constraint Qualification is generally considered to be the most important. ${\ displaystyle {\ tilde {x}} \ in S}$ ${\ displaystyle g_ {i} ({\ tilde {x}}) <0}$ ${\ displaystyle 1 \ leq i \ leq m}$ ${\ displaystyle {\ tilde {x}}}$
Linear independence - linear independence constraint qualification (LICQ): The gradients of the active inequality conditions and the gradients of the equation conditions are linearly independent in the point. ${\ displaystyle {\ hat {x}}}$
Mangasarian-Fromovitz - Mangasarian-Fromovitz constraint qualification (MFCQ): The gradients of the active inequality conditions and the gradients of the equation conditions are positive-linearly independent in the point. ${\ displaystyle {\ hat {x}}}$
Constant rank constraint qualification (CRCQ): For each subset of the gradients of the inequality conditions that are active and the gradient of the equation conditions, the rank is near constant. ${\ displaystyle {\ hat {x}}}$
Constant positive-linear dependency - constant positive-linear dependence constraint qualification (CPLD): For each subset of the gradients, the inequality conditions that are active, and the gradients of the equation conditions, and if there is a positive-linear dependence in the point , then there is a positive linear dependence close to . ${\ displaystyle {\ hat {x}}}$ ${\ displaystyle {\ hat {x}}}$

It can be shown that the following two strands of inference hold

{\ displaystyle {\ mbox {LICQ}} \ Rightarrow {\ mbox {MFCQ}} \ Rightarrow {\ mbox {CPLD}}}

and ,

{\ displaystyle {\ mbox {LICQ}} \ Rightarrow {\ mbox {CRCQ}} \ Rightarrow {\ mbox {CPLD}}}

although MFCQ is not equivalent to CRCQ. In practice, weaker constraint qualifications are preferred, since they deliver stronger optimality conditions.

Pathological cases

The CQ are there to exclude conditions like those in the origin in the following examples from consideration:

Minimize under the NB and . ${\ displaystyle f (x, y): = x}$ ${\ displaystyle g_ {1} (x, y) = - y \ leq 0}$ ${\ displaystyle g_ {2} (x, y) = yx ^ {3} \ leq 0}$
Minimize under the NB . ${\ displaystyle f (x): = x \ in \ mathbb {R}}$ ${\ displaystyle g (x) = x ^ {2} \ leq 0}$
Minimize under the NB . ${\ displaystyle f (x): = \ left \ {{\ begin {array} {lll} - {\ sqrt {x}} & \ mathrm {if} & x \ geq 0 \\\ infty & \ mathrm {otherwise} & \ end {array}} \ right.}$ ${\ displaystyle g (x) = x \ leq 0}$

Optimality conditions

Necessary condition

Karush-Kuhn-Tucker conditions

The Karush-Kuhn-Tucker conditions are a necessary first-order optimality criterion and a generalization of the necessary condition for optimization problems without constraints and the Lagrange multipliers for optimization problems under equation constraints . In words, the Karush-Kuhn-Tucker theorem roughly means that if there is a feasible, regular and optimal point, the gradient of the objective function can be represented as a positive linear combination of the gradients of the active NB, see also the picture above. ${\ displaystyle \ nabla f ({\ vec {x}}) = {\ vec {0}}}$ ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle - \ nabla f ({\ bar {x}})}$

Be the objective function and the functions with and the functions with are secondary function functions. All functions that occur are once continuously differentiable . Let it be a regular point, that is, one of the regularity requirements (CQ) above is fulfilled. If there is a local optimum, then there exist constants and so that ${\ displaystyle f: D \ rightarrow \ mathbb {R}}$ ${\ displaystyle g_ {i}: D \ rightarrow \ mathbb {R}}$ ${\ displaystyle 1 \ leq i \ leq m}$ ${\ displaystyle h_ {k}: D \ rightarrow \ mathbb {R}}$ ${\ displaystyle 1 \ leq k \ leq l}$ ${\ displaystyle f, g_ {i}, h_ {k}}$ ${\ displaystyle {\ bar {x}} \ in S}$ ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle \ mu _ {i}}$ ${\ displaystyle \ nu _ {k}}$

{\ displaystyle \ nabla f ({\ bar {x}}) \ pm \ sum _ {i = 1} ^ {m} \ mu _ {i} \ nabla g_ {i} ({\ bar {x}}) + \ sum _ {k = 1} ^ {l} \ nu _ {k} \ nabla h_ {k} ({\ bar {x}}) = 0}

("+" when minimizing, "-" when maximizing),

{\ displaystyle g_ {i} ({\ bar {x}}) \ leq 0 \ ,, \; \ mu _ {i} \ geq 0 \ ,, \; \ mu _ {i} g_ {i} ({ \ bar {x}}) = 0}

for all ,

{\ displaystyle 1 \ leq i \ leq m}

{\ displaystyle h_ {k} ({\ bar {x}}) = 0}

for everyone .

{\ displaystyle 1 \ leq k \ leq l}

Every point at which these conditions are met is called the Karush-Kuhn-Tucker point (short: KKT point).

If there is a point of the admissible area in which no NB are active, in particular no secondary equality conditions are present, then all and the above conditions are reduced to the known necessary condition of unrestricted problems . ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle h_ {k} (x) = 0}$ ${\ displaystyle g_ {i} ({\ bar {x}}) <0}$ ${\ displaystyle \ mu _ {i} = 0}$ ${\ displaystyle \ nabla f ({\ bar {x}}) = {\ vec {0}}}$

Fritz John Conditions

The Fritz-John conditions (or FJ conditions for short) are just like the KKT conditions a first-order criterion for optimality. In contrast to the KKT conditions, they manage without regularity conditions, but provide a weaker statement. Under certain circumstances they agree with the KKT conditions.

If there is a feasible point that is locally optimal, then there exist such that ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle \ mu _ {i}, \ nu _ {k}, {\ bar {z}}}$

{\ displaystyle {\ bar {z}} \ nabla f ({\ bar {x}}) \ pm \ sum _ {i = 1} ^ {m} \ mu _ {i} \ nabla g_ {i} ({ \ bar {x}}) + \ sum _ {k = 1} ^ {l} \ nu _ {k} \ nabla h_ {k} ({\ bar {x}}) = 0}

("+" when minimizing, "-" when maximizing),

{\ displaystyle g_ {i} ({\ bar {x}}) \ leq 0 \ ,, \; \ mu _ {i} \ geq 0 \ ,, \; \ mu _ {i} g_ {i} ({ \ bar {x}}) = 0}

for all ,

{\ displaystyle 1 \ leq i \ leq m}

{\ displaystyle h_ {k} ({\ bar {x}}) = 0}

for everyone .

{\ displaystyle 1 \ leq k \ leq l}

and is not equal to the zero vector. ${\ displaystyle (\ mu, \ nu, {\ bar {z}})}$

Every point at which these conditions are met is called the Fritz-John point or FJ point for short. The only difference between the FJ conditions is the introduction of the scalar in front of the gradient of the objective function. ${\ displaystyle {\ bar {z}}}$

Sufficient conditions

If a KKT point and the direction of the steepest ascent or descent includes an angle smaller than 90 ° with the flanks of the tangent cone, then there is a minimum or maximum point. Mathematically: applies ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle {\ bar {x}}}$

{\ displaystyle \ nabla f ({\ bar {x}}) \ cdot {\ vec {d}}> 0 \; (\ mathrm {or} <0)}

for all ,

{\ displaystyle {\ vec {d}} \ in T (S, {\ bar {x}}) \ setminus \ lbrace {\ vec {0}} \ rbrace}

then there is a local minimum (or maximum). This is a sufficient first-order optimality criterion . A sufficient second order optimality criterion for a KKT point says that if a stationary point and the Hessian matrix of the objective function is positive (negative) definite for all vectors from the tangent cone, then there is a local minimum (or maximum). Mathematically: ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle {\ bar {x}}}$ ${\ displaystyle {\ bar {x}}}$

{\ displaystyle \ nabla f ({\ bar {x}}) = {\ vec {0}}}

and for everyone .

{\ displaystyle {\ vec {d}} \ cdot \ nabla ^ {2} f ({\ bar {x}}) {\ vec {d}}> 0 \; (\ mathrm {or} <0)}

{\ displaystyle {\ vec {d}} \ in T (S, {\ bar {x}}) \ setminus \ lbrace {\ vec {0}} \ rbrace}

This contains the tangent cone, see # regularity conditions, tangent and linearizing cone . ${\ displaystyle T (S, {\ vec {x}})}$

Theorems on the approximation methods

In the limit of the barrier parameters approaching zero, the solution found with the barrier functions changes over to the solution found with the Lagrange multipliers.
In the limit of the penalty parameters approaching infinity, the solution found with the penalty functions changes into the solution found with the Lagrange multipliers.
In the limit of infinitely many iterations, the solution found with the extended Lagrange method also strives against the solution found with the Lagrange multipliers.

example

Area f (a, b) = ab

The five methods of solving a problem mentioned above are explained using a simple example. The problem is to maximize the product of two positive real numbers whose sum is at most sixteen. In mathematical terms, this means: we are looking for

{\ displaystyle f \ left (a, b \ right) = a \ cdot b \ rightarrow \ mathrm {max}}

with the NL

{\ displaystyle g \ left (a, b \ right) = a + b-16 \ leq 0}

.

It is vividly clear that the NB is active in the optimum, otherwise a better solution could easily be found. The only stationary point with this in and linear function is in, which is why the search sometimes goes in this direction. Then you have to "put the NB in the way" so that the algorithm "notices" it. ${\ displaystyle \ nabla f = {\ vec {0}}}$ ${\ displaystyle a}$ ${\ displaystyle b}$ ${\ displaystyle (0,0)}$

Elimination of the degrees of freedom

The NB recognized as active is used to determine

{\ displaystyle a + b-16 = 0 \ rightarrow b = 16-a}

and the auxiliary function only depends on , so that the solution can be calculated using a curve discussion: ${\ displaystyle a}$

{\ displaystyle {\ begin {array} {l} \ Pi (a): = f (a, 16-a) = a \ cdot (16-a) = 16a-a ^ {2} \ rightarrow \ mathrm {max } \\\ rightarrow \ Pi '(a) = 16-2a = 0 \ rightarrow a = 8 \ rightarrow b = 16-8 = 8 \\\ rightarrow \ Pi' '(a = 8) = - 2 <0 \ end {array}}}

One sees:

The auxiliary function only has one variable.
The solution is correct because it is a maximum.
The procedure is mainly used when it is known that the NB is active, e.g. B. in contact with firmly bonded components.

Lagrange multiplier

Here the -fold NB is subtracted from the objective function, where the factor is the Lagrange multiplier and is treated like an additional unknown. The subtraction is chosen so that a violation of the NB is punished at . The auxiliary or Lagrange function is here: ${\ displaystyle \ lambda}$ ${\ displaystyle \ lambda}$ ${\ displaystyle \ lambda> 0}$

{\ displaystyle \ Pi (a, b, \ lambda) = a \ cdot b- \ lambda (a + b-16) \ rightarrow \ mathrm {max}}

.

In the minimum, all derivatives vanish after all variables:

{\ displaystyle {\ begin {aligned} {\ frac {\ partial \ Pi} {\ partial a}} = & b- \ lambda = 0 & \ rightarrow \ lambda & = b \\ {\ frac {\ partial \ Pi} { \ partial b}} = & a- \ lambda = 0 & \ rightarrow a & = \ lambda = b \\ {\ frac {\ partial \ Pi} {\ partial \ lambda}} = & 16-ab = 16-2a = 0 & \ rightarrow a & = b = \ lambda = 8 \ end {aligned}}}

and the solution is found. Because of and , the Karush-Kuhn-Tucker condition is met. The above system of equations can be written as a matrix equation: ${\ displaystyle a = b = 8}$ ${\ displaystyle g (a = 8, b = 8) = 0}$ ${\ displaystyle \ lambda = 8 \ geq 0}$

{\ displaystyle {\ begin {pmatrix} 1 & 0 & -1 \\ 0 & 1 & -1 \\ - 1 & -1 & 0 \ end {pmatrix}} {\ begin {pmatrix} b \\ a \\\ lambda \ end {pmatrix}} = {\ begin {pmatrix} 0 \\ 0 \\ - 16 \ end {pmatrix}}}

The method of the Lagrangian multipliers

fulfills the NB exactly,
introduces additional unknowns ( ), ${\ displaystyle \ lambda}$
introduces vanishing diagonal elements into the system of equations, which are problematic when using the Cholesky decomposition .
can use the sign of to assess whether the secondary condition is active or not (positive for activity). ${\ displaystyle \ lambda}$

Barrier function

With barrier functions, secondary conditions can be met with certainty at the price that the NB is not exhausted in the optimum. When looking for a solution, the -fold of a barrier function is added to the objective function , e.g. B .: ${\ displaystyle f}$ ${\ displaystyle r}$

{\ displaystyle \ Pi \ left (a, b \ right) = a {\ cdot} b + 2r \ ln (16-ab) \ rightarrow \ mathrm {max}}

.

There is a logarithmic barrier function and the barrier parameter. In the extreme all derivations disappear again: ${\ displaystyle \ ln (16-ab)}$ ${\ displaystyle 2r}$

{\ displaystyle {\ begin {aligned} {\ frac {{\ partial} \ Pi} {\ partial a}} = b - {\ frac {2r} {16-ab}} = 0 \\ {\ frac {{ \ partial} \ Pi} {\ partial b}} = a - {\ frac {2r} {16-ab}} = 0 \ end {aligned}}}

,

and therefore as well as what the solution ${\ displaystyle a = b}$ ${\ displaystyle a (16-2a) -2r = -2 (a ^ {2} -8a + r) = 0}$

{\ displaystyle a = b = 4 \ pm {\ sqrt {16-r}} \ leq 8}

who accepts for the solutions . The iterative search with the Newton-Raphson method gives you the rule ${\ displaystyle r \ rightarrow 0}$ ${\ displaystyle (a, b) \ in \ {(0.0), (8.8) \}}$

{\ displaystyle {\ nabla} \ Pi + \ nabla ^ {2} \ Pi \ Delta {\ vec {x}} = {\ begin {pmatrix} b - {\ frac {2r} {16-ab}} \\ a - {\ frac {2r} {16-ab}} \ end {pmatrix}} + {\ begin {pmatrix} - {\ frac {2r} {(16-ab) ^ {2}}} & 1 - {\ frac {2r} {(16-ab) ^ {2}}} \\ 1 - {\ frac {2r} {(16-ab) ^ {2}}} & - {\ frac {2r} {(16- ab) ^ {2}}} \ end {pmatrix}} {\ begin {pmatrix} \ Delta a \\\ Delta b \ end {pmatrix}} = {\ begin {pmatrix} 0 \\ 0 \ end {pmatrix} }}

for calculating the increment and . The determinant of the Hessian matrix is: ${\ displaystyle \ Delta a}$ ${\ displaystyle \ Delta b}$ ${\ displaystyle {\ nabla} ^ {2} \ Pi}$

{\ displaystyle {\ frac {4r ^ {2}} {(16-ab) ^ {4}}} - \ left (1 - {\ frac {2r} {(16-ab) ^ {2}}} \ right) ^ {2} = {\ frac {4r} {(16-ab) ^ {2}}} - 1}

.

One sees:

The secondary condition is met.
The exact solution is obtained in the limit value . ${\ displaystyle r \ rightarrow 0}$
There is no optimal point for. In general, the contour lines of the auxiliary function do not coincide with those of the objective function. ${\ displaystyle r> 16}$
When approaching the solution and with , the Hessian matrix can be poorly conditioned. ${\ displaystyle r \ rightarrow 0}$
With an incremental search, it must be ensured that the increment in the unknown is not so large that you accidentally end up on the wrong side of the barrier, where the barrier function is not defined in this example.

Criminal proceedings

With criminal proceedings, secondary conditions can be met approximately. When looking for a solution, the target function is subtracted from the -fold of a penalty function (should punish the violation): ${\ displaystyle f}$ ${\ displaystyle r}$

{\ displaystyle \ Pi (a, b) = a \ cdot b - {\ frac {r} {n}} \ cdot \ mathrm {max} (0, a + b-16) ^ {n} \ rightarrow \ mathrm {Max} }

.

This contains the penalty parameter and the penalty function. With one calls the penalty function exact , but it is not differentiable. Here should be used. In the extreme all derivations disappear again: ${\ displaystyle r}$ ${\ displaystyle \ mathrm {max} (0, a + b-16) ^ {n}}$ ${\ displaystyle n = 1}$ ${\ displaystyle n = 2}$

{\ displaystyle {\ begin {aligned} {\ frac {\ partial \ Pi} {\ partial a}} = br \ cdot \ mathrm {max} (0, a + b-16) = 0 \\ {\ frac { \ partial \ Pi} {\ partial b}} = ar \ cdot \ mathrm {max} (0, a + b-16) = 0 \ end {aligned}}}

.

With you get why you have to start here in the "forbidden" area . Then it follows from ${\ displaystyle a + b-16 \ leq 0}$ ${\ displaystyle a = b = 0}$ ${\ displaystyle a + b-16> 0}$

{\ displaystyle {\ begin {aligned} {\ frac {\ partial \ Pi} {\ partial a}} = & b-r (a + b-16) = 0 & \ rightarrow \ quad a + \ left (1 - {\ frac {1} {r}} \ right) b = & 16 \\ {\ frac {\ partial \ Pi} {\ partial b}} = & a-r (a + b-16) = 0 & \ rightarrow \ quad \ left ( 1 - {\ frac {1} {r}} \ right) a + b = & 16 \ end {aligned}}}

the system of equations

{\ displaystyle {\ begin {pmatrix} 1 - {\ frac {1} {r}} & 1 \\ 1 & 1 - {\ frac {1} {r}} \ end {pmatrix}} {\ begin {pmatrix} a \ \ b \ end {pmatrix}} = {\ begin {pmatrix} 16 \\ 16 \ end {pmatrix}}}

with the solution

{\ displaystyle a = b = {\ frac {16r} {2r-1}} = 8 + {\ frac {8} {2r-1}}> 8}

,

which goes into the solution for . ${\ displaystyle r \ rightarrow \ infty}$ ${\ displaystyle a = b = 8}$

One sees:

The secondary condition is only approximately fulfilled but better and better as the penalty parameter increases, but only because it can be calculated exactly here. With a numerical solution of the system of equations, rounding errors would lead to errors with increasing penalty parameters.
The reason for this is that as the penalty parameter increases, the value of the determinant of the system of equations approaches zero. The problem is increasingly ill-posed. ${\ displaystyle \ left (1 - {\ frac {1} {r}} \ right) ^ {2} -1 = {\ frac {1} {{r} ^ {2}}} - {\ frac {2 } {r}}}$
A compromise has to be found with regard to the conditioning of the system of equations and the accuracy of fulfilling the NB.
By inserting and in the NB it can be checked how badly it is injured. ${\ displaystyle a}$ ${\ displaystyle b}$
No additional variables or vanishing diagonal elements are introduced, there is a solution for all penalty parameters and the method is considered numerically robust. ${\ displaystyle r \ neq 1/2}$

Extended or generalized Lagrange method

The extended or generalized Lagrange method ( English augmented-lagrange-method ) uses the penalty function to calculate the Lagrange multipliers approximately. When searching for a solution of the objective function which times the NB and a penalty function -fold withdrawn (penalty idea): ${\ displaystyle f}$ ${\ displaystyle \ lambda _ {i}}$ ${\ displaystyle r}$

{\ displaystyle \ Pi (a, b) = a \ cdot b - {\ frac {1} {2r}} \ left (\ mathrm {max} (0, \ lambda _ {i} + r (a + b- 16)) ^ {2} - \ lambda _ {i} ^ {2} \ right) \ rightarrow \ mathrm {max}}

.

In the extreme, all derivatives disappear:

{\ displaystyle {\ begin {aligned} {\ frac {\ partial \ Pi (a, b)} {\ partial a}} = b- \ mathrm {max} (0, \ lambda _ {i} + r (a + b-16)) = 0 \\ {\ frac {\ partial \ Pi (a, b)} {\ partial b}} = a- \ mathrm {max} (0, \ lambda _ {i} + r ( a + b-16)) = 0 \ end {aligned}}}

.

With you get . Otherwise arises from ${\ displaystyle \ lambda _ {i} + r (a + b-16) \ leq 0}$ ${\ displaystyle a = b = 0}$

{\ displaystyle {\ begin {aligned} b- \ lambda _ {i} -r (a + b-16) = & 0 & \ rightarrow \ quad a + \ left (1 - {\ frac {1} {r}} \ right ) b = 16 - {\ frac {\ lambda _ {i}} {r}} \\ a- \ lambda _ {i} -r (a + b-16) = & 0 & \ rightarrow \ quad \ left (1- {\ frac {1} {r}} \ right) a + b = 16 - {\ frac {\ lambda _ {i}} {r}} \ end {aligned}}}

the system of equations

{\ displaystyle {\ begin {pmatrix} 1 - {\ frac {1} {r}} & 1 \\ 1 & 1 - {\ frac {1} {r}} \ end {pmatrix}} {\ begin {pmatrix} a \ \ b \ end {pmatrix}} = {\ begin {pmatrix} 16 - {\ frac {\ lambda _ {i}} {r}} \\ 16 - {\ frac {\ lambda _ {i}} {r} } \ end {pmatrix}}}

,

that's the solution

{\ displaystyle a_ {i + 1} = b_ {i + 1} = {\ frac {16r- \ lambda _ {i}} {2r-1}} = 8 + {\ frac {8- \ lambda _ {i }} {2r-1}}}

Has.

The numerical search of the extremum with the extended Lagrange method

usually starts with and the initial values , ${\ displaystyle i = 0}$ ${\ displaystyle a_ {0} = b_ {0} = \ lambda _ {0} = 0}$
calculated and (in the non-linear case iteration until convergence), ${\ displaystyle a_ {i + 1}}$ ${\ displaystyle b_ {i + 1}}$
sets and ${\ displaystyle \ lambda _ {i + 1} = \ mathrm {max} (0, \ lambda _ {i} + r \ cdot g (a_ {i + 1}, b_ {i + 1})) = \ mathrm {max} (0, \ lambda _ {i} + r \ cdot (a_ {i + 1} + b_ {i + 1} -16))}$
terminates if a suitable convergence criterion is met or increments and returns to step 2. ${\ displaystyle i}$

Here, however, a start value must be specified so that the point can be found. With and results in the following iteration progression up to an error : ${\ displaystyle a + b-16> 0}$ ${\ displaystyle (a, b) = (8.8)}$ ${\ displaystyle r = 100 \ ,, \; a_ {0} = 100}$ ${\ displaystyle b_ {0} = \ lambda _ {0} = 0}$ ${\ displaystyle | a_ {i} -a_ {i-1} | <10 ^ {- 10}}$

${\ displaystyle i}$	${\ displaystyle a_ {i}}$	${\ displaystyle a_ {i} -a_ {i-1}}$	${\ displaystyle \ lambda _ {i}}$	${\ displaystyle a_ {i} b_ {i} -a_ {i-1} b_ {i-1}}$	${\ displaystyle a_ {i} + b_ {i} -16}$
1	8.04020100503	−91.959798995	8.04020100502	64.6448322012	0.0804020100502
2	7.9997979849	−0.0404030201258	7.9997979849	−0.648064402007	−0.000404030201256
3	8.00000101515	0.000203030251889	8.00000101515	0.00324844322116	2.03030252166e-06
4th	7.9999999949	−1.02025252513e-06	7.9999999949	−1.63240414395e-05	−1.02025285997e-08
5	8.00000000003	5.12689890542e-09	8.00000000003	8.20303824867e-08	5.12692110988e-11
6th	8.0	−2.57633914202e-11	8.0	−4.12214262724e-10	−2.57571741713e-13

With a larger penalty parameter, the convergence would be faster, but the example would be less illustrative.

The extended Lagrange method

fulfills the NB as precisely as required,
does not introduce new unknowns nor does it affect the conditioning of the system of equations,
requires several converged solutions of the global problem (in the second step) and
can measure the activity of the minor condition . ${\ displaystyle \ lambda _ {i}}$

literature

Avriel, Mordecai: Nonlinear Programming: Analysis and Methods . Dover Publishing, 2003, ISBN 0-486-43227-0 ( limited preview in Google Book Search).
R. Andreani, JM Martínez, ML Schuverdt: On the relation between constant positive linear dependence condition and quasinormality constraint qualification . In: Journal of optimization theory and applications . vol. 125, no2, 2005, p. 473-485 ( researchgate.net [accessed December 5, 2017]).
Florian Jarre, Josef Stoer : Optimization . Springer, Berlin 2003, ISBN 978-3-540-43575-4 , doi : 10.1007 / 978-3-642-18785-8 .
R. Reinhardt, A. Hoffmann, T. Gerlach: Nonlinear Optimization . Springer, 2013, ISBN 978-3-8274-2948-3 ( limited preview in Google book search).

Web links

Book Convex Optimization by Stephen Boyd and Lieven Vandenberghe (PDF; English)