Nonlinear Optimization
In mathematics, nonlinear optimization (also called nonlinear program , NLP ) is the project to optimize a scalar objective function of one or more real variables in a restricted range, whereby the objective function or the range limits are not linear ( affine ). It is a subarea of mathematical optimization and an upper area of convex optimization . In contrast to the terms mentioned, the application to differentiable nonlinear objective functions is described here without restriction to the convexity of the objective function or the search area. In the section Terms: objective function, constraints, admissible set, local and global optimization, there are essential explanations.
Fields of application
Nonlinear programs can be found in a variety of ways in science and engineering.
Economics can be about minimizing the costs of a process that has limitations in the availability of resources and capacities. The cost function can be nonlinear therein. In theoretical mechanics there is an extremal principle in Hamilton's principle , the solution of which represents a nonlinear program in the case of nonlinear boundary conditions.
Modern engineering applications often involve optimization tasks in a complicated way. It can be a question of minimizing the weight of a component, which at the same time has to meet certain requirements (e.g. restrictions on the installation space, upper limits for deformations with given loads).
When using a mathematical model , it can be a matter of adapting the parameters of the model to measured values. Nonlinear influences of the parameters and restrictions on the parameters (e.g. that only positive values are permitted) lead to a nonlinear program.
In these questions it is often not known a priori whether the problem posed is convex or not. Sometimes one observes a dependency of the found optimal solution on the starting point of the search. Then local optima have been found and the problem is definitely not convex.
Problem definition
There are many possible formulations of a nonlinear program. At this point the most general form should be chosen. The input parameter is taken from the , that is, the problem depends on influencing parameters that are embedded in the vector . The objective function is once continuously differentiable. Furthermore, the constraints (NB) are given in inequality form with and in equality form with and are once continuously differentiable. Then the problem is mathematically:
 .
The permissible range is restricted by the secondary conditions (NB): The NB should be fulfilled for all values of the parameters from the permissible range ( ). Allowed the problem if the allowable range is not empty.
The theoretical treatment of nonlinear optimization is mostly restricted to minimization problems. Indeed, the maximization problem of a function can be reformulated into a minimization problem of or , if it is certain.
Action
The problem is reduced to the optimization of an auxiliary function without NB using the methods described below. In order to be able to make use of the gradientbased methods, the area to be searched is divided into areas where the objective function can be differentiated. If possible, the subareas should be convex and so should the objective function in them. Then you can calculate the global extrema in the subareas with the methods listed in Mathematical Optimization and Convex Optimization and select the optimal one.
The construction of the auxiliary function should be explained using an example: Two balls in a trough try to occupy the lowest possible point, but must not penetrate each other. The objective function is therefore the positional energy of the spheres and assumes a minimum in equilibrium. The secondary condition here would be the penetration of the balls and , with negative penetration meaning a positive distance.
 Lagrange multipliers : The NB are multiplied by real factors, the Lagrange multipliers, and incorporated into the objective function, so that if the Lagrange multipliers are positive, violation of the NB is punished. The auxiliary function obtained in this way is called the Lagrange function . The Lagrange multipliers are introduced into the problem as unknowns and must also be determined. In the case of balls, the Lagrange multipliers are precisely the contact forces that the balls exert on one another when they touch, so that they do not penetrate one another.
 Barrier functions: The NB are represented with barrier functions that take on positive values when approaching the limit of the definition range and grow to infinity on the limit. The barrier functions are multiplied by barrier parameters and incorporated into the objective function, so that approaching the limit is punished and thus the violation of the NB is prevented. In the spherical image, the spheres would have a more or less thick jacket that becomes more and more rigid the more it is compressed when touched. A violation of the NL is thus prevented at the price that even approaching the range limit is penalized. The method is used with interior point methods .
 Penalty functions: The penalty functions are used like the barrier functions. The NB are displayed with penalty functions that disappear in the permissible range and are positive if the NB is violated. The penalty functions are multiplied by penalty parameters and built into the objective function, so that the violation of the NB is punished, hence the name. Active NBs may be violated here and the admissibility of the solution must be checked. In the sphere image, the penalty function corresponds to the “real” penetration (which disappears if the distance between the balls is positive) and the penalty parameter corresponds to a spring stiffness. The spring tries to pull penetrating points back to the surface. The stiffer the spring, the less the penetration will be.
 Advanced Lagrangian method ( English augmented Lagrange method ): This is a combination of the Lagrange multipliers and penalty method. The Lagrange multiplier is determined iteratively based on the violation of the NB.
 Trivial (and therefore often not dealt with in the sources), but it should be mentioned and in practical use, is that active NB can be used to eliminate variables. The variables are set to values so that a violation of the NB is now impossible. In the sphere image, the points of contact of the spheres would be coupled to each other (equate their coordinates) so that penetration (there) can no longer take place.
The advantages and disadvantages of the methods described are summarized in the table:
method  advantages  disadvantage 

Lagrange multipliers 


Barrier functions 


Criminal proceedings  For , the solution found changes into the one found with Lagrange multipliers. 

Extended Lagrange method 

Requires multiple converged solutions to the global problem. 
Elimination of the degrees of freedom 

Only applicable if the NL's activity is known. 
Theory of Optimization
Isolated points
In a nonlinear program, NB can restrict the permissible range in some points in such a way that no point in its vicinity is in the permissible range. In mathematical terms, this means that there is an environment such that
applies. Isolated points must all be checked individually, each for itself, for optimality.
Regularity conditions, tangent and linearizing cones
For the formulation of optimality conditions, the NL must meet certain requirements. constraint qualifications (CQ). Among other things, it is a matter of excluding optimal points from consideration that are isolated or in which there are redundant NB. There are several formulations of different strengths that ensure compliance with this CQ. Points in which the requirements are met are called regular . Irregular points in which none of these requirements apply must be excluded or considered separately.
The terms tangent cone and linearizing cone are central to the formulation of the requirements for the NB and the optimality conditions . In order to make this clear to yourself, you stand at a point in the permitted area and walk to the destination point, observing the NB (the NB can be imagined as impenetrable walls) . The tangent cone is then the set of all possible directions from which one can arrive at the target point . In the case of a linearizing cone, the NB are first linearized, i.e. H. replaced by their tangents at the target point . The linearizing cone is then the set of all possible directions from which one can arrive at the target point taking into account the linearized NB . The tangent cone and linearizing cone differ where two walls run parallel at the location and the target point is, so to speak, in a corridor (width 0). You can then arrive in the linearizing cone from both directions of the corridor, it linearized the walls. If the initially parallel walls immediately lose their parallelism in one direction and close the aisle, so that no small step in this direction is possible, you can only arrive in the tangent cone from the open direction . That is the difference, see the first pathological case below. In the graphic, tangent cone and linearizing cone match at the optimal point and are indicated in red.
The requirements for the NB ensure that the tangent cone and the linearizing cone coincide at the optimal point and that the optimal point is not isolated. The correspondence between the linearizing cone and the tangential cone is sometimes listed as a regularity condition of its own and is called Abadie Constraint Qualification . Examples of regularity conditions are:
 Slater condition (only for convex problems): There is a pointsuch thatfor alland all equation constraints in arefulfilled. At this point it should be mentioned that Slater's Constraint Qualification is generally considered to be the most important.
 Linear independence  linear independence constraint qualification (LICQ): The gradients of the active inequality conditions and the gradients of the equation conditions are linearly independent in the point.
 MangasarianFromovitz  MangasarianFromovitz constraint qualification (MFCQ): The gradients of the active inequality conditions and the gradients of the equation conditions are positivelinearly independent in the point.
 Constant rank constraint qualification (CRCQ): For each subset of the gradients of the inequality conditions that are active and the gradient of the equation conditions, the rank is near constant.
 Constant positivelinear dependency  constant positivelinear dependence constraint qualification (CPLD): For each subset of the gradients, the inequality conditions that are active, and the gradients of the equation conditions, and if there is a positivelinear dependence in the point , then there is a positive linear dependence close to .
It can be shown that the following two strands of inference hold
 and ,
although MFCQ is not equivalent to CRCQ. In practice, weaker constraint qualifications are preferred, since they deliver stronger optimality conditions.
Pathological cases
The CQ are there to exclude conditions like those in the origin in the following examples from consideration:
 Minimize under the NB and .
 Minimize under the NB .
 Minimize under the NB .
Optimality conditions
Necessary condition
KarushKuhnTucker conditions
The KarushKuhnTucker conditions are a necessary firstorder optimality criterion and a generalization of the necessary condition for optimization problems without constraints and the Lagrange multipliers for optimization problems under equation constraints . In words, the KarushKuhnTucker theorem roughly means that if there is a feasible, regular and optimal point, the gradient of the objective function can be represented as a positive linear combination of the gradients of the active NB, see also the picture above.
Be the objective function and the functions with and the functions with are secondary function functions. All functions that occur are once continuously differentiable . Let it be a regular point, that is, one of the regularity requirements (CQ) above is fulfilled. If there is a local optimum, then there exist constants and so that
 ("+" when minimizing, "" when maximizing),
 for all ,
 for everyone .
Every point at which these conditions are met is called the KarushKuhnTucker point (short: KKT point).
If there is a point of the admissible area in which no NB are active, in particular no secondary equality conditions are present, then all and the above conditions are reduced to the known necessary condition of unrestricted problems .
Fritz John Conditions
The FritzJohn conditions (or FJ conditions for short) are just like the KKT conditions a firstorder criterion for optimality. In contrast to the KKT conditions, they manage without regularity conditions, but provide a weaker statement. Under certain circumstances they agree with the KKT conditions.
If there is a feasible point that is locally optimal, then there exist such that
 ("+" when minimizing, "" when maximizing),
 for all ,
 for everyone .
and is not equal to the zero vector.
Every point at which these conditions are met is called the FritzJohn point or FJ point for short. The only difference between the FJ conditions is the introduction of the scalar in front of the gradient of the objective function.
Sufficient conditions
If a KKT point and the direction of the steepest ascent or descent includes an angle smaller than 90 ° with the flanks of the tangent cone, then there is a minimum or maximum point. Mathematically: applies
 for all ,
then there is a local minimum (or maximum). This is a sufficient firstorder optimality criterion . A sufficient second order optimality criterion for a KKT point says that if a stationary point and the Hessian matrix of the objective function is positive (negative) definite for all vectors from the tangent cone, then there is a local minimum (or maximum). Mathematically:
 and for everyone .
This contains the tangent cone, see # regularity conditions, tangent and linearizing cone .
Theorems on the approximation methods
 In the limit of the barrier parameters approaching zero, the solution found with the barrier functions changes over to the solution found with the Lagrange multipliers.
 In the limit of the penalty parameters approaching infinity, the solution found with the penalty functions changes into the solution found with the Lagrange multipliers.
 In the limit of infinitely many iterations, the solution found with the extended Lagrange method also strives against the solution found with the Lagrange multipliers.
example
The five methods of solving a problem mentioned above are explained using a simple example. The problem is to maximize the product of two positive real numbers whose sum is at most sixteen. In mathematical terms, this means: we are looking for
with the NL
 .
It is vividly clear that the NB is active in the optimum, otherwise a better solution could easily be found. The only stationary point with this in and linear function is in, which is why the search sometimes goes in this direction. Then you have to "put the NB in the way" so that the algorithm "notices" it.
Elimination of the degrees of freedom
The NB recognized as active is used to determine
and the auxiliary function only depends on , so that the solution can be calculated using a curve discussion:
One sees:
 The auxiliary function only has one variable.
 The solution is correct because it is a maximum.
 The procedure is mainly used when it is known that the NB is active, e.g. B. in contact with firmly bonded components.
Lagrange multiplier
Here the fold NB is subtracted from the objective function, where the factor is the Lagrange multiplier and is treated like an additional unknown. The subtraction is chosen so that a violation of the NB is punished at . The auxiliary or Lagrange function is here:
 .
In the minimum, all derivatives vanish after all variables:
and the solution is found. Because of and , the KarushKuhnTucker condition is met. The above system of equations can be written as a matrix equation:
The method of the Lagrangian multipliers
 fulfills the NB exactly,
 introduces additional unknowns ( ),
 introduces vanishing diagonal elements into the system of equations, which are problematic when using the Cholesky decomposition .
 can use the sign of to assess whether the secondary condition is active or not (positive for activity).
Barrier function
With barrier functions, secondary conditions can be met with certainty at the price that the NB is not exhausted in the optimum. When looking for a solution, the fold of a barrier function is added to the objective function , e.g. B .:
 .
There is a logarithmic barrier function and the barrier parameter. In the extreme all derivations disappear again:
 ,
and therefore as well as what the solution
who accepts for the solutions . The iterative search with the NewtonRaphson method gives you the rule
for calculating the increment and . The determinant of the Hessian matrix is:
 .
One sees:
 The secondary condition is met.
 The exact solution is obtained in the limit value .
 There is no optimal point for. In general, the contour lines of the auxiliary function do not coincide with those of the objective function.
 When approaching the solution and with , the Hessian matrix can be poorly conditioned.
 With an incremental search, it must be ensured that the increment in the unknown is not so large that you accidentally end up on the wrong side of the barrier, where the barrier function is not defined in this example.
Criminal proceedings
With criminal proceedings, secondary conditions can be met approximately. When looking for a solution, the target function is subtracted from the fold of a penalty function (should punish the violation):
 .
This contains the penalty parameter and the penalty function. With one calls the penalty function exact , but it is not differentiable. Here should be used. In the extreme all derivations disappear again:
 .
With you get why you have to start here in the "forbidden" area . Then it follows from
the system of equations
with the solution
 ,
which goes into the solution for .
One sees:
 The secondary condition is only approximately fulfilled but better and better as the penalty parameter increases, but only because it can be calculated exactly here. With a numerical solution of the system of equations, rounding errors would lead to errors with increasing penalty parameters.
 The reason for this is that as the penalty parameter increases, the value of the determinant of the system of equations approaches zero. The problem is increasingly illposed.
 A compromise has to be found with regard to the conditioning of the system of equations and the accuracy of fulfilling the NB.
 By inserting and in the NB it can be checked how badly it is injured.
 No additional variables or vanishing diagonal elements are introduced, there is a solution for all penalty parameters and the method is considered numerically robust.
Extended or generalized Lagrange method
The extended or generalized Lagrange method ( English augmentedlagrangemethod ) uses the penalty function to calculate the Lagrange multipliers approximately. When searching for a solution of the objective function which times the NB and a penalty function fold withdrawn (penalty idea):
 .
In the extreme, all derivatives disappear:
 .
With you get . Otherwise arises from
the system of equations
 ,
that's the solution
Has.
The numerical search of the extremum with the extended Lagrange method
 usually starts with and the initial values ,
 calculated and (in the nonlinear case iteration until convergence),
 sets and
 terminates if a suitable convergence criterion is met or increments and returns to step 2.
Here, however, a start value must be specified so that the point can be found. With and results in the following iteration progression up to an error :
1  8.04020100503  −91.959798995  8.04020100502  64.6448322012  0.0804020100502 
2  7.9997979849  −0.0404030201258  7.9997979849  −0.648064402007  −0.000404030201256 
3  8.00000101515  0.000203030251889  8.00000101515  0.00324844322116  2.03030252166e06 
4th  7.9999999949  −1.02025252513e06  7.9999999949  −1.63240414395e05  −1.02025285997e08 
5  8.00000000003  5.12689890542e09  8.00000000003  8.20303824867e08  5.12692110988e11 
6th  8.0  −2.57633914202e11  8.0  −4.12214262724e10  −2.57571741713e13 
With a larger penalty parameter, the convergence would be faster, but the example would be less illustrative.
The extended Lagrange method
 fulfills the NB as precisely as required,
 does not introduce new unknowns nor does it affect the conditioning of the system of equations,
 requires several converged solutions of the global problem (in the second step) and
 can measure the activity of the minor condition .
literature
 Avriel, Mordecai: Nonlinear Programming: Analysis and Methods . Dover Publishing, 2003, ISBN 0486432270 ( limited preview in Google Book Search).
 R. Andreani, JM Martínez, ML Schuverdt: On the relation between constant positive linear dependence condition and quasinormality constraint qualification . In: Journal of optimization theory and applications . vol. 125, no2, 2005, p. 473485 ( researchgate.net [accessed December 5, 2017]).
 Florian Jarre, Josef Stoer : Optimization . Springer, Berlin 2003, ISBN 9783540435754 , doi : 10.1007 / 9783642187858 .
 R. Reinhardt, A. Hoffmann, T. Gerlach: Nonlinear Optimization . Springer, 2013, ISBN 9783827429483 ( limited preview in Google book search).