Convex optimization

The convex optimization is a branch of mathematical optimization .

A certain quantity has to be minimized, the so-called objective function , which depends on a parameter . In addition, certain secondary conditions must be observed, that is, the values that can be selected are subject to certain restrictions. These are usually given in the form of equations and inequalities. If all constraints are met for a value , it is said that it is permissible. One speaks of a convex optimization problem or a convex program if both the objective function and the set of admissible points are convex . Many problems in practice are convex in nature. Often, for example, the optimization is carried out on cuboids , which are always convex, and quadratic shapes are often used as the objective function, as in the quadratic optimization , which are also convex under certain conditions (see definition ). Another important special case is linear optimization , in which a linear objective function is optimized over a convex polyhedron . ${\ displaystyle x}$ ${\ displaystyle x}$ ${\ displaystyle x}$ ${\ displaystyle x}$

An important property of convex optimization in contrast to non-convex optimization is that every local optimum is also a global optimum. This clearly means that a solution that is at least as good as all other solutions in an environment is also at least as good as all admissible solutions. This makes it easy to search for local optima.

Problem

There are many possible formulations of a convex program. One of the most used and mathematically easiest to work with is

{\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g_ {i} (x) \ leq 0 & i = 1, \ dots, k \ \ & h_ {j} (x) = 0 & j = 1, \ dots, l \ end {aligned}}}

The input parameter is from the , that is, the problem depends on influencing parameters . The objective function is convex on its domain , just like the inequality restrictions . They are based on completely defined affine functions of the form . The amount ${\ displaystyle x}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle n}$ ${\ displaystyle f \ colon D_ {0} \ rightarrow \ mathbb {R}}$ ${\ displaystyle D_ {0}}$ ${\ displaystyle g_ {i} \ colon D_ {i} \ rightarrow \ mathbb {R}}$ ${\ displaystyle h_ {j} (x)}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle h_ {j} (x) = a_ {j} ^ {T} x-b_ {j}}$

{\ displaystyle {\ mathcal {D}}: = \ bigcap _ {m = 0} ^ {k} D_ {m}}

is then called the definition set of the convex program. It is the largest set on which all functions are defined and convex. In addition, as the intersection of convex sets, it is also convex. The functions represent the so-called inequality constraints and the functions represent the so-called equation constraints . The function is called the objective function and the set ${\ displaystyle g_ {i}}$ ${\ displaystyle h_ {j}}$ ${\ displaystyle f}$

{\ displaystyle {\ mathcal {R}}: = \ {x \ in {\ mathcal {D}} \ subset \ mathbb {R} ^ {n} \, | \, g_ {i} (x) \ leq 0 \ ,, h_ {j} (x) = 0 \, {\ text {for all}} i, j \}}

the restriction set of the problem. It is a convex set, since sub-level sets of convex functions are convex again.

variants

Concave objective function

In most cases, problems of the form “Maximize under convex constraints” are also called convex problems if is a concave function. This problem is equivalent (not identical) to the above problem in the sense that every optimal point of the concave problem is also an optimal point of the convex problem, which results from “minimize under convex constraints”. In general, however, the optimum values do not match. ${\ displaystyle f (x)}$ ${\ displaystyle f (x)}$ ${\ displaystyle -f (x)}$

Abstract convex optimization problem

Sometimes there are problems of form

{\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & x \ in K, & K {\ text {is a convex set}} \ end { aligned}}}

referred to as an abstract convex optimization problem. They have the same solvability properties as the above problem, but are more difficult to handle mathematically, since criteria such as algorithmic are difficult to grasp. Usually functions are then sought which describe the abstract secondary condition using inequalities in order to reduce the abstract problem to the above problem. Conversely, every convex problem can also be formulated as an abstract convex problem of the form “minimize under the secondary condition ”. Where is the restriction amount. ${\ displaystyle x \ in K}$ ${\ displaystyle x \ in K}$ ${\ displaystyle f (x)}$ ${\ displaystyle x \ in {\ mathcal {R}}}$ ${\ displaystyle {\ mathcal {R}}}$

Mixed forms

The most general form of a convex problem consists of a mixed form that uses inequality restrictions, equation restrictions as well as abstract constraints. However, for the reasons described above, this shape is impractical to use.

Solvability from a theoretical point of view

Abstract convex problems have some powerful properties that make it easier to find global minima:

Every local minimum of the problem is always also a global minimum of the problem.
The set of optimal points is convex. Are in fact optimal for the problem, it is because of the convexity since . On the other hand, for all points of the restriction set , there and are global minima. Thus is for everyone . ${\ displaystyle x, y}$ ${\ displaystyle f (\ lambda x + (1- \ lambda) y) \ leq \ lambda f (x) + (1- \ lambda) f (y) = f (x)}$ ${\ displaystyle f (x) = f (y)}$ ${\ displaystyle f (x) = f (y) \ leq f (z)}$ ${\ displaystyle z}$ ${\ displaystyle {\ mathcal {R}}}$ ${\ displaystyle f (x)}$ ${\ displaystyle f (y)}$ ${\ displaystyle f (\ lambda x + (1- \ lambda) y) = f (x)}$ ${\ displaystyle \ lambda \ in [0,1]}$
If the objective function is strictly convex, the optimal point is unambiguous.

Since any formulation of a convex problem can be transformed into an abstract problem, all of these properties carry over to every formulation of the problem.

history

Carl Friedrich Gauss

The discipline of convex optimization emerged from convex analysis , among other things . The first optimization technique, known as the gradient method , goes back to Gauss . In 1947 the simplex process was introduced by George Dantzig . Furthermore, internal point methods were presented for the first time by Fiacco and McCormick in 1968 . In 1976 and 1977 the ellipsoid method was developed by David Yudin and Arkadi Nemirovski and independently of it by Naum Schor to solve convex optimization problems. Narendra Karmarkar first described a polynomial potentially practical algorithm for linear problems in 1984. In 1994 Arkadi Nemirovski and Yurii Nesterov developed interior point methods for convex optimization, which could solve large classes of convex optimization problems in polynomial time.

In the case of the Karush-Kuhn-Tucker conditions, the necessary conditions for the inequality restriction were listed for the first time in 1939 in the master's thesis (unpublished) by William Karush . However, these only became better known in 1951 after a conference paper by Harold W. Kuhn and Albert W. Tucker .

Prior to 1990, convex optimization was mainly used in operations research and less in engineering. Since 1990, however, there have been more and more application possibilities in engineering . Among other things, control and signal control, communication and circuit design can be mentioned here. In addition, the concept is particularly efficient for structural mechanics. In addition, new problem classes such as semidefinite and 2nd order cone optimization and robust optimization were created .

example

As an example, consider a one-dimensional problem without equation constraints and with only one inequality constraint:

Minimize

{\ displaystyle f (x) = (x-2) ^ {2}}

With

{\ displaystyle x \ in K = [0, \ infty)}

under the secondary condition:

{\ displaystyle g (x) = x ^ {2} -1 \ leq 0}

The permissible range is given by the convex set

{\ displaystyle \ {x \ in K: g (x) \ leq 0 \} = [0,1]}

,

because for values is not met. The drawing can be seen that for the optimum value takes. ${\ displaystyle x> 1}$ ${\ displaystyle g (x) \ leq 0}$ ${\ displaystyle f (x)}$ ${\ displaystyle x = 1}$ ${\ displaystyle 1}$

Classification and generalizations

Convex optimization contains further classes of optimization problems, all of which are characterized by a special structure:

Conical programs : Generalized inequalities are used, otherwise all functions are affine. Conical programs in turn have three sub-areas:
- Semidefinite programs use the cone of positive semidefinite matrices, so they have a matrix as a variable. The generalized inequality used is then the Loewner partial order .
- The SOCPs ( Second Order Cone Program ) use the second-order cone, which is also called the Lorentz cone .
- Also linear optimization problems can be formulated as a conical programs.

A distinction is also made as to whether the functions used are continuously differentiable or not.

Under certain conditions, the following problem classes also fall under convex optimization:

Quadratic programs and quadratic programs with quadratic constraints are convex problems if all occurring matrices are positive semidefinite.
Geometric programs are not convex problems per se, but can be converted into a convex form by means of elementary substitutions.

Generalizations that still preserve some properties of a convex function make certain extended concepts of convex optimization possible.

Pseudoconvex functions are a class of differentiable functions that have a global minimum when the derivative vanishes.

For quasi-convex functions , the sub-level sets are all convex. If one allows quasi-convex functions as inequality restrictions, the restriction set as the intersection of the sub-level sets is still a convex set. An abstract convex optimization problem is thus obtained.

K-convex functions use generalized inequalities to generalize convexity to partial orders on the . In doing so, real cones are defined and, accordingly, functions that are K-convex with respect to the cone and the generalized inequality . The problem then is ${\ displaystyle \ mathbb {R} ^ {n}}$ ${\ displaystyle l}$ ${\ displaystyle \ mathbb {R} ^ {k_ {i}}}$ ${\ displaystyle l}$ ${\ displaystyle g_ {i} \ colon \ mathbb {R} ^ {n} \ mapsto \ mathbb {R} ^ {k_ {i}}}$ ${\ displaystyle K_ {i}}$ ${\ displaystyle \ preccurlyeq _ {K_ {i}}}$

{\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g_ {i} (x) \ preccurlyeq _ {K_ {i}} 0 & i = 1, \ dots, l \\ & Hx-b = 0 & \ end {aligned}}}

for a convex function and a suitably dimensioned matrix and vector . Since the sub-level sets of a K-convex function are also convex, this again yields an abstract convex optimization problem.

{\ displaystyle f}

{\ displaystyle H}

{\ displaystyle b}

Another generalization is the almost convex functions . In general, they no longer form an abstract convex optimization problem; instead, as with convex problems, strong duality applies to them if they satisfy the Slater condition .
More generally, convex maps can be defined for any real vector space with an order cone .

Optimality conditions

There are some important optimality criteria for convex problems . First, the necessary criteria are described, that is, if an optimum is achieved, then these criteria are met. Then the sufficient criteria, that is, if these criteria are met in one point, then it is an optimal point.

Necessary criteria

Karush-Kuhn-Tucker conditions

The Karush-Kuhn-Tucker conditions (also known as the KKT conditions) are a generalization of the Lagrange multipliers of optimization problems under constraints and are used in advanced neoclassical theory . A feasible point of the convex problem satisfies the KKT-conditions if holds ${\ displaystyle x ^ {*}}$

{\ displaystyle \ nabla f (x ^ {*}) + \ sum _ {i = 1} ^ {m} \ mu _ {i} ^ {*} \ nabla g_ {i} (x ^ {*}) + \ sum _ {j = 1} ^ {l} \ lambda _ {j} ^ {*} \ nabla h_ {j} (x ^ {*}) = 0}

{\ displaystyle g_ {i} (x ^ {*}) \ leq 0, {\ text {for}} i = 1, \ ldots, m}

{\ displaystyle h_ {j} (x ^ {*}) = 0, {\ text {for}} j = 1, \ ldots, l}

{\ displaystyle \ mu _ {i} ^ {*} \ geq 0, {\ text {for}} i = 1, \ ldots, m}

{\ displaystyle \ mu _ {i} ^ {*} g_ {i} (x ^ {*}) = 0, {\ text {for}} \; i = 1, \ ldots, m.}

These conditions are called the Karush-Kuhn-Tucker conditions or KKT conditions for short .

A point is then called the Karush-Kuhn-Tucker point, or KKT point for short, of the above optimization problem if it fulfills the above conditions. ${\ displaystyle (x ^ {*}, \ mu ^ {*}, \ lambda ^ {*}) \ in \ mathbb {R} ^ {n + m + l}}$

The actual necessary optimality criterion is now: If a point is a local (and due to the convexity also global) minimum of the convex problem, and if it fulfills certain regularity requirements, then there are such that a KKT point is. Common regularity conditions (also called constraint qualifications) are the LICQ , the MFCQ , the Abadie CQ or, especially for convex problems, the Slater condition . More regularity requirements can be found in the main article on the KKT conditions. ${\ displaystyle x ^ {*}}$ ${\ displaystyle \ mu ^ {*}, \ lambda ^ {*}}$ ${\ displaystyle (x ^ {*}, \ mu ^ {*}, \ lambda ^ {*})}$

Fritz John Conditions

The Fritz-John conditions or FJ conditions are a generalization of the KKT conditions and, in contrast to these, do not require any regularity requirements. Under certain circumstances, both conditions are equivalent. A point is called the Fritz-John point, or FJ point of the convex problem for short , if it meets the following conditions: ${\ displaystyle (z ^ {*}, x ^ {*}, \ mu ^ {*}, \ lambda ^ {*}) \ in \ mathbb {R} ^ {1 + n + m + l}}$

{\ displaystyle z ^ {*} \ nabla f (x ^ {*}) + \ sum _ {i = 1} ^ {m} \ mu _ {i} ^ {*} \ nabla g_ {i} (x ^ {*}) + \ sum _ {j = 1} ^ {l} \ lambda _ {j} ^ {*} \ nabla h_ {j} (x ^ {*}) = 0}

{\ displaystyle g_ {i} (x ^ {*}) \ leq 0, {\ text {for}} i = 1, \ ldots, m}

{\ displaystyle h_ {j} (x ^ {*}) = 0, {\ text {for}} j = 1, \ ldots, l}

{\ displaystyle \ mu _ {i} ^ {*} \ geq 0, {\ text {for}} i = 1, \ ldots, m}

{\ displaystyle \ mu _ {i} ^ {*} g_ {i} (x ^ {*}) = 0, {\ text {for}} \; i = 1, \ ldots, m.}

{\ displaystyle z ^ {*} \ geq 0}

These conditions are called the Fritz-John-Conditions or FJ-Conditions for short .

If the point is the local (and due to the convexity also global) minimum of the optimization problem, then there is such that it is an FJ point and is not equal to the zero vector. ${\ displaystyle x ^ {*}}$ ${\ displaystyle \ mu ^ {*}, \ lambda ^ {*}, z ^ {*}}$ ${\ displaystyle (z ^ {*}, x ^ {*}, \ mu ^ {*}, \ lambda ^ {*})}$ ${\ displaystyle (z ^ {*}, \ mu ^ {*}, \ lambda ^ {*})}$

Sufficient criteria

If a KKT point is a global minimum of the convex problem. Thus the KKT conditions in the convex case are already sufficient for the optimality of the point. In particular, no further regularity requirements are required. Since it can be shown that a KKT point can be constructed from every FJ point if is, the FJ conditions are also sufficient for optimality if applies. ${\ displaystyle (x ^ {*}, \ mu ^ {*}, \ lambda ^ {*})}$ ${\ displaystyle x ^ {*}}$ ${\ displaystyle z ^ {*}> 0}$ ${\ displaystyle z ^ {*}> 0}$

Criteria for non-differentiable functions

If some of the used functions of the convex optimization problem cannot be differentiated, one can still fall back on the saddle point characterization of optimal points. Using the Lagrange function, it can then be shown that every saddle point of the Lagrange function is an optimal solution. Conversely, an optimal solution and the Slater condition are fulfilled, so the optimal solution can be extended to a saddle point of the Lagrange function. ${\ displaystyle x ^ {*}}$

Concrete procedure

Lagrange function

First the following abbreviated notation is introduced:

{\ displaystyle L (x, \ lambda) = f (x) + \ sum _ {i = 1} ^ {m} \ mu _ {i} g_ {i} (x) + \ sum _ {j = 1} ^ {l} \ nu _ {j} h_ {j} (x)}

,

where is the vector of all multipliers. ${\ displaystyle \ lambda}$

Lagrangian multiplier rule for the convex problem

Compare with Lagrangian multiplier rule . Concrete procedure:

Check whether all functions that occur are continuously partially differentiable. If not, this rule does not apply.
Is there a feasible point for which :? If so, then that's optimal. Otherwise go to the next step. ${\ displaystyle {\ hat {x}}}$ ${\ displaystyle \ nabla f ({\ hat {x}}) = 0}$ ${\ displaystyle {\ hat {x}}}$
Find the gradient of the Lagrange function. ${\ displaystyle \ nabla _ {x} L (x, \ lambda)}$
Solve the system , where no multiplier can be negative. If a restriction is not active, the associated multiplier must even be the same . If you find a solution , it is optimal. ${\ displaystyle \ nabla _ {x} L (x, \ lambda) (x - {\ hat {x}}) \ geq 0 ~ (x \ in K)}$ ${\ displaystyle 0}$ ${\ displaystyle {\ hat {x}}}$

literature

Avriel, Mordecai: Nonlinear Programming: Analysis and Methods . Dover Publishing, 2003, ISBN 0-486-43227-0 .
R. Andreani, JM Martínez, ML Schuverdt: On the relation between constant positive linear dependence condition and quasinormality constraint qualification . Journal of optimization theory and applications, vol. 125, no. 2, 2005, pp. 473-485.
Florian Jarre, Josef Stoer: Optimization . Springer, Berlin 2003, ISBN 3-540-43575-1 .
Schmit, LA; Fleury, C. 1980: Structural synthesis by combining approximation concepts and dual methods . J. Amer. Inst. Aeronaut. Astronaut 18, 1252-1260

Web links

Book Convex Optimization by Stephen Boyd and Lieven Vandenberghe (PDF )