Optimization (math)

The field of optimization in applied mathematics is concerned with finding optimal parameters of a - mostly complex - system. “Optimal” means that an objective function is minimized or maximized. Optimization problems arise in business mathematics , statistics , operations research and generally in all scientific disciplines in which unknown parameters are used, such as in physics , chemistry and meteorology . Often an analytical solution of optimization problems is not possible and numerical methods have to be used.

Example of a simple optimization problem

The simplest optimization problem is to find a minimum or maximum of a differentiable one-dimensional function , which is usually achieved by finding the zeros of the first derivative . ${\ displaystyle f (x)}$

Examples of optimization problems

Business Mathematics: The objective function here mostly represents the profit or the turnover of a company; Parameters are raw materials, use of personnel, machines, prices, etc. The objective function should be maximized. Basically, it is a simplified formalization of a basic management problem. It gets its systematic foundation in Operations Research .
Statistics , data mining: Statistical models contain open parameters that are estimated. A parameter set is optimal if the associated model instance represents the data relationships optimally, i.e. H. the deviations of the modeled data (in the sense of a suitable quality function) from the empirical data are as small as possible, i.e. optimal. The target function can be selected differently here, for example as a sum of squares or as a likelihood function .
Climate research: Climate models represent simplified numerical systems of the actual hydrodynamic processes in the atmosphere. Within the grid cells, the interactions must be approximated by equations. The equations can either be derived from fundamental hydrodynamic laws or empirical equations are used, i.e. basically statistical models whose parameters have to be optimized so that the climate models represent the actual processes as well as possible.
Game theory: Does a player population reach a population optimum in a super game? Or at least a Pareto optimum ? Is this equilibrium stable?
physics: Evaluation of measurement curves by varying the parameter values of a theory function ("optimization in the parameter space", see also equalization calculation ("fitting")) so that the theory function matches the measurement curve as closely as possible. Physical systems always strive for an energy minimum; many problems consist precisely in finding this energy minimum.

Demarcation

The area of approximation in numerics is related to optimization . Conversely, one can say: An approximation problem is the problem of minimizing the distance (the metric ) between two functions.

Terms: objective function, constraints, admissible set, local and global optimization

Let us assume a minimization task in the following. What is to be optimized, for example a distance, is called an objective function . What is varied are the parameters or variables of the objective function. In a two-dimensional optimization task (i.e. two independent parameters), the objective function can be imagined in three dimensions by the parameters spanning the length and depth axes. The height is then the objective function value. In the pure intuition , a “mountain range” with mountains and valleys is created (at least with continuous functions).

If the optimization problem is actually an approximation problem , then the “mountain range” is sometimes also referred to as the fit topology . In this case, in most cases the sum of squares is used as the objective function , see details in the article Least Squares Method .

Since the objective function represents a “mountain range”, the optimization problem is to be equated with finding the deepest valley (minimization) or the highest peak (maximum) in this mountain range. The effort to solve the task depends crucially on the shape of the "mountain". An extreme example of a minimization task would be an absolutely flat plane from which individual needle-shaped points protrude at random points. In this case, no search algorithm can help, you can only search randomly ( Monte Carlo method ) or systematically scan the entire area. One of the simplest cases of a two-dimensional optimization task is when the rock has the shape of a parabola that is symmetrical about the height axis and whose apex can be found.

If the optimization task consists in finding the next relative (local) minimum or maximum in the neighborhood from a given point in the rock, then one speaks of local optimization . If the task is to find the absolute minimum or maximum in the entire mountain range, then one speaks of global optimization . The two tasks have very different degrees of difficulty: There are numerous methods for local optimization, all of which lead to the goal more or less quickly in all not too difficult cases with great certainty. With global optimization, the solvability of the task within a given or realizable computing budget depends very much on the target function topology.

Often one is only interested in values for the parameters that meet additional constraints (NB). (If these constraints only apply at the edge of the definition range of the function, they are also called constraints. ) They can be given in the form of equations or inequalities, or they can explicitly describe a set (for example only integer solutions). The set of all parameter values that meet all NB is called the permissible set . In the case of the mountains, the NL would narrow down the area in which the search is being carried out. The optimization problem under consideration is called admissible if the admissible set, i.e. the area to be searched, is not empty. A distinction is made between active and passive secondary conditions: A NB of the form is active when the optimum lies on the edge of the permissible range and is therefore equal . If the NB were passive, it would not represent a restriction in the optimum, so the optimum lies in the area and not on the edge. A NB of the form is always active. ${\ displaystyle g (x) \ leq 0}$ ${\ displaystyle g (x) = 0}$ ${\ displaystyle h (x) = 0}$

classification

Scalar optimization problems

A scalar optimization problem can be mathematically expressed as

"Minimize / maximize under the secondary condition "

{\ displaystyle f (x)}

{\ displaystyle x \ in X}

represent; here is a real-valued function and . Often the allowable amount is given indirectly by a function, usually standardized in form ${\ displaystyle f \ colon V \ to \ mathbb {R}}$ ${\ displaystyle X \ subseteq V}$ ${\ displaystyle X}$

{\ displaystyle X = \ {x \ in V: g (x) \ leq 0 \}}

with a vector-valued function (the comparison means: no component of may be positive).

{\ displaystyle g \ colon V \ to \ mathbb {R} ^ {m}}

{\ displaystyle g (x)}

Depending on the form of , scalar optimization problems can be classified as follows: ${\ displaystyle V, X, f, g}$

Variation problem : is infinitely dimensional, especially a function space.

{\ displaystyle V}

Optimal control problem : Classical variation problem with differential equation constraints

Linear program (LP): where is affine, is linear.

{\ displaystyle V = \ mathbb {R} ^ {n}, g = Ax-b}

{\ displaystyle A \ in \ mathbb {R} ^ {m \ times n}}

{\ displaystyle f = c ^ {T} x}

Quadratic program (QP): as above, only is a quadratic function.

{\ displaystyle f}

Quadratic program with quadratic constraints (QCQP)

Nonlinear program : are any functions (usually assumed to be continuously differentiable ; in the closer vicinity of an optimum, a quadratic approximation can often be used, which leads to some of the practical methods.) ${\ displaystyle f, g}$

Integer program (also discrete program ): In addition, the permissible values are limited to integer or more generally discrete values.

{\ displaystyle x}

Stochastic program : some parameters in the description of are unknown (but their random distribution is known).

{\ displaystyle f, g}

Convex program : is convex and is convex (concave) if minimized (maximized). Convex programs included as a special case

{\ displaystyle X}

{\ displaystyle f}

Conical programs : Generalized inequalities are used, otherwise all functions are affine. Conical programs in turn have three sub-areas:
- Semidefinite programs use the cone of positive semidefinite matrices, so they have a matrix as a variable.
- The SOCPs ( Second Order Cone Program ) use the second-order cone, which is also called the Lorentz cone .
- Linear optimization problems can also be formulated as conical programs.
Under certain conditions, quadratic programs and quadratic programs with quadratic constraints also fall under convex optimization.
Geometric programs are not convex per se, but can be converted into a convex problem.

Each of these sub-areas of optimization has solution procedures specially tailored to the structure of the problem.

Note on terminology: “Program” is to be understood as a synonym for “optimization problem” (and not as “computer program”). The use of the term “program” is historically based: The first applications of optimization were military problems for which a program of actions could be found.

Vector optimization problems

An optimization problem from vector optimization (also called Pareto optimization ), on the other hand, is a problem in which the values of several objective functions have to be optimized at the same time. This can be formalized by optimizing a vector-valued objective function . Solutions that lead all components of the objective function to an optimum at the same time are usually not available; Rather, there is generally a larger set of solutions from which a single optimal point can be selected, for example by scalarization (weighting of the individual components of the objective function). ${\ displaystyle f \ colon V \ to \ mathbb {R} ^ {n}}$

Linear and integer optimization

An important special case is linear optimization . Here the objective function is linear, and the constraints can be represented by a system of linear equations and inequalities. Every local optimum is automatically also a global optimum, since the permissible range is convex. There are pivot methods to calculate the global optimum exactly, in principle, of which the best known are the simplex methods (not to be confused with the downhill simplex method below). However, since the 1990s there have also been efficient interior point methods that can be competitive with the simplex method for certain types of optimization problems.

A restriction to integer variables makes the problem much more difficult, but at the same time expands the possible applications. This so-called integral linear optimization is used, for example, in production planning , in scheduling , in route planning or in the planning of telecommunications or transport networks.

Nonlinear Optimization

Methods of local nonlinear optimization with constraints

The case of non-linear optimization, in which the objective function, the constraints (NB) or both are non- linear, is more difficult than linear optimization . The solution is achieved by reducing the problem to the optimization of an auxiliary function without NB. The methods of nonlinear optimization without NB below can then be applied to this new problem. The procedure should be explained using a contact problem: Two balls in a trough try to occupy the lowest possible point, but must not penetrate each other. The objective function is therefore the positional energy of the spheres and assumes a minimum in equilibrium. The secondary condition here would be the penetration of the balls and , with negative penetration meaning a positive distance. There are five methods for constructing the auxiliary function: ${\ displaystyle g (x, y) \ leq 0}$ ${\ displaystyle x}$ ${\ displaystyle y}$

Lagrange multipliers : The NB are multiplied by real factors, the Lagrange multipliers, and added to the objective function. The factors are introduced into the problem as unknowns and must also be determined (in compliance with the Karush-Kuhn-Tucker conditions ). In the case of balls, the Lagrange multipliers are precisely the contact forces that the balls exert on one another when they touch, so that they do not penetrate one another.

Penalty functions: The NB are represented with penalty functions that disappear in the definition range and are negative if the NB is violated. The penalty functions are multiplied by penalty parameters and added to the objective function (when maximizing, otherwise subtracting), so that the violation of the NB is penalized, hence the name. Active NBs may be violated here and the admissibility of the solution must be checked. In the sphere image, the penalty function corresponds to the real penetration , which therefore disappears with a positive distance, and the penalty parameter corresponds to a spring stiffness. The spring tries to pull penetrating points back to the surface. The stiffer the spring, the less the penetration will be.

{\ displaystyle \ min (0, g (x, y))}

Barrier functions: The barrier functions are used like the penalty functions. However, they already have negative values when they approach the limit of the domain and grow to infinity on the limit. In the spherical image, the spheres would have a more or less thick jacket that becomes more and more rigid the more it is compressed when touched. A violation of the NL is prevented at the price that even approaching the edge is penalized.

Augmented Lagrange Method: This is a combination of the previous methods. The Lagrange multiplier is determined iteratively based on the violation of the NB.

Trivial, but worth mentioning, is that active NB can be used to eliminate parameters of the objective function. The parameters are set to values so that a violation of the NB is now impossible. In the sphere image, the points of contact of the spheres would be coupled to each other (equate their coordinates) so that penetration (there) can no longer take place.

Methods of local nonlinear optimization without constraints

In the case of local optimization, the choice of method depends on the exact problem: Is it an arbitrarily precisely determined target function? (This is often not the case with stochastic objective functions.) Is the objective function in the environment strictly monotonous, only monotonous or could there even be small relative extremes “on the way”? What is the cost of determining a gradient?

Examples of methods:

Derivative-free methods

Bisection method , method of golden section and other methods for optimization of one-dimensional functions or to the line search (sequential search) in multi-dimensional functions.
Downhill simplex method
Trust region procedure

These methods cost many iterations, but are (in part) relatively robust with regard to problems in the objective function, for example small relative extrema, and they do not require the calculation of a gradient. The latter can be very costly if only a relatively imprecise result is sought.

Procedures for which the 1st derivative is required

Secant method (to determine the zero of the 1st derivative)
Gradient method and conjugate gradient method .
Quasi-Newton method

These methods are faster than the derivative-free methods as long as a gradient can be calculated quickly, and they are similarly robust as the derivative-free methods.

Procedures for which the 2nd derivative is required

Newton method or Newton-Raphson method.

The Newton method is commonly known as a method for determining a zero and requires the first derivative. Accordingly, it can also be applied to the derivation of an objective function, since the optimization task amounts to determining the zeros of the 1st derivative. The Newton method is very fast, but not very robust. If you are not sure of the "benignity" of your optimization problem, you also have to use globalization strategies such as step size search or trust region procedures .

For the problems that frequently occur in practice, in which the objective function to be minimized has the special shape of the standard square of a vector-valued function (method of least squares, "least squares"), the Gauss-Newton method is available, which in principle can be used makes use of the fact that for functions of this form, under certain additional assumptions, the expensive 2nd derivative ( Hessian matrix ) can be approximated very well without its explicit calculation as a function of the Jacobi matrix . In this way, a super-linear convergence speed similar to Newton's method is achieved near the target . Since this method inherited the stability problems of the Newton method, so-called globalization and stabilization strategies are also required here in order to be able to guarantee convergence at least to the next local minimum. A popular variant here is the Levenberg-Marquardt algorithm .

Methods of global nonlinear optimization

In contrast to local optimization, global optimization is a quasi unsolved problem in mathematics: There are practically no methods which, when applied, in most cases result in a point that represents the absolute extreme with certainty or even with great probability.

All methods for global optimization have in common that they repeatedly look for local minima / maxima according to a specific system.

Evolutionary algorithms are most often used here . These provide a good result especially when the arrangement of the relative minima and maxima show a certain regularity, knowledge of which can be inherited. A very good method can also be to randomly choose the starting points for the search for local minima / maxima in order to then examine the search results for regularities using statistical methods.

In practice, however, the actual search criterion is often not sufficiently reflected. So it is often much more important not to find the global optimum, but a parameter area within which there are as many relative minima as possible. Methods of cluster analysis or neural networks are suitable here .

Further methods of nonlinear global optimization:

Optimization processes analogous to nature

Flood algorithm ( great deluge algorithm )
Simulated annealing ( simulated annealing )
Metropolis algorithm
Acceptance threshold ( threshold accepting )
Ant algorithm ( ant colony optimization )
Particle swarm optimization

Other procedures
- Primal-Dual-Active-Set-Algorithm for the solution of a quadratic optimization problem over a convex subset of a Hilbert space over the set ${\ displaystyle S}$ ${\ displaystyle V}$ ${\ displaystyle \ Omega}$

The performance of optimization algorithms is often analyzed on the basis of test problems with a complex structure of the minima or maxima for which the exact solution is known. An example of such a test function is the rastrigine function .

Theoretical statements

When optimizing a (differentiable) function without constraints, it is known that minima / maxima can only be at points . This condition is exploited in the construction of many solution methods. In the case of optimization with constraints, there are analogous theoretical statements: duality and Lagrange multipliers . ${\ displaystyle f (x)}$ ${\ displaystyle \ nabla f (x) = 0}$

Convex problems

In the case of convex problems, the area to be searched and the objective function are convex. In the case of a convex area, all points on the line connecting any two points in the area are also completely in the area. Mathematically:

{\ displaystyle \ lambda \ in [0,1] \ Rightarrow x + \ lambda (yx) \ in X \ quad \ forall x, y \ in X}

Examples of convex areas are circles, ellipses, triangles and squares.

The objective function is convex if all values of the objective function of points on the straight line that connect two points in the area lie below this straight line. Mathematically:

{\ displaystyle \ lambda \ in [0,1] \ Rightarrow f (x + \ lambda (yx)) \ leq f (x) + \ lambda (f (y) -f (x)) \ quad \ forall x, y \ in X}

.

The parabolic optimization problem shown at the beginning of this article has a convex objective function.

In the case of convex problems, every local minimum is also a global minimum. If the points are optimal, then all points “between” these points are optimal. Mathematically: ${\ displaystyle x_ {i} \ ,, \; i = 1, \ dotsc, n}$

{\ displaystyle \ alpha _ {1}, \ alpha _ {2}, \ dotsc, \ alpha _ {n} \ in [0,1] \; \ wedge \; \ sum _ {i = 1} ^ {n } \ alpha _ {i} = 1 \ quad \ Rightarrow \ quad f \ left (\ sum _ {i = 1} ^ {n} \ alpha _ {i} x_ {i} \ right) = f _ {\ text { opt}}}

.

duality

The one optimization problem associated (Lagrange) dual problem is ${\ displaystyle \ min \; f (x): \; g (x) = 0}$

{\ displaystyle \ max _ {\ lambda} \ min _ {x} {\ mathcal {L}} (x, \ lambda)}

where is the associated Lagrange function. The hot here Lagrange multipliers , or dual variable or shadow prices . A violation of the secondary conditions is clearly permitted, but punished in the objective function by additional costs per violated unit. A solution for which it is not worth violating the constraints solves the dual problem. For convex (especially linear) problems, the value of the dual problem is equal to the value of the original problem. For linear and convex quadratic problems the inner minimization can be solved in closed form and the dual problem is again a linear or convex quadratic problem. ${\ displaystyle {\ mathcal {L}} (x, \ lambda) = f (x) + \ lambda ^ {T} g (x)}$ ${\ displaystyle \ lambda}$ ${\ displaystyle \ lambda}$ ${\ displaystyle \ lambda}$

Lagrange multipliers

The Lagrange multiplier theorem states that solutions to the restricted optimization problem can only be found in places where there are Lagrange multipliers that satisfy the condition ${\ displaystyle \ min _ {x} \, f (x): \, g (x) = 0}$ ${\ displaystyle x}$ ${\ displaystyle \ lambda _ {i}}$

{\ displaystyle \ nabla _ {x} {\ mathcal {L}} (x, \ lambda) = \ nabla f (x) + \ sum _ {i} \ lambda _ {i} \ nabla g_ {i} (x ) = 0}

fulfill. This condition is the direct generalization of the derivation condition above. Like this, the Lagrange multiplier theorem gives a necessary condition for a minimum or maximum. A sufficient condition can be obtained by considering the second derivatives.

The Lagrange multiplier theorem only applies if the constraints are given by equations. The generalization to inequalities is given by the Karush-Kuhn-Tucker theorem . ${\ displaystyle \ min _ {x} \ {f (x): g (x) \ leq 0 \}}$

Penal functions

In the limit of the penalty parameters approaching infinity, the solution found with the penalty functions changes into the solution found with the Lagrange multipliers.

Extended Lagrange method

In the limit of infinitely many iterations, the solution found with the extended Lagrange method also strives against the solution found with the Lagrange multipliers.

Individual evidence

↑ Father of linear programming ( Memento from November 11, 2009 in the Internet Archive ) In: informs.org

literature

Walter Alt: Nonlinear Optimization - An introduction to theory, procedures and applications . Vieweg, 2002, ISBN 3-528-03193-X .

Yonathan Bard: Nonlinear Parameter Estimation . Academic Press, New York 1974, ISBN 0-12-078250-2 .

Hans Benker: Mathematical Optimization with Computer Algebra Systems. Springer-Verlag, Berlin / Heidelberg / New York 2003.

S. Boyd, L. Vandenberghe: Convex Optimization . Cambridge University Press, 2004. (online)

W. Domschke, A. Drexl: Introduction to Operations Research. 7th edition. Springer, Berlin 2007, ISBN 978-3-540-70948-0 .

R. Fletcher: Practical Methods of Optimization . Wiley, 1987, ISBN 0-471-91547-5 .

U. Hoffmann, H. Hofmann: Introduction to optimization: with application examples from the chemical engineering being . Verlag Chemie, Weinheim 1971, ISBN 3-527-25340-8 .

R. Horst, PM Pardalos (Ed.): Handbook of Global Optimization . Kluwer, Dordrecht 1995, ISBN 0-7923-3120-6 .

F. Jarre, J. Stoer: Optimization . Springer, 2004, ISBN 3-540-43575-1 . limited preview in Google Book search

J. Nocedal, SJ Wright: Numerical Optimization . Springer, Berlin 1999, ISBN 0-387-98793-2 .

C. Geiger, C. Kanzow: Theory and numerics of restricted optimization tasks . Springer, 2002, ISBN 3-540-42790-2 . limited preview in Google Book search

C. Grossmann, J. Terno: Numerics of Optimization . Teubner Study Books, 1997, ISBN 3-519-12090-9 . limited preview in Google Book search

Web links

Commons : Optimization - collection of images, videos and audio files

Literature for optimization in the catalog of the German National Library
General link page global optimization (Engl.)
Introduction to numerical optimization
NEOS Guide Optimization (Engl.)
Decision Tree for Optimization Software (Engl.)
Global Optimization Algorithms - Theory and Application (English; PDF file; 13.14 MB)

[1] Father of linear programming ( Memento from November 11, 2009 in the Internet Archive ) In: informs.org