Lagrange duality

The Lagrange duality is an important duality in mathematical optimization , which provides both optimality criteria using the Karush-Kuhn-Tucker conditions or the Lagrange multipliers and makes equivalent reformulations of optimization problems possible.

Lagrange function, dual problem

The following optimization problem is given:

{\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g_ {i} (x) \ leq 0 & i = 1, \ dotsc, p \ \ & h_ {j} (x) = 0 & j = 1, \ dotsc, q \\ & x \ in X & \ end {aligned}}}

The abstract restriction can include requirements such as (integer) or for a cone . The functions that occur do not have to be convex or differentiable . ${\ displaystyle x \ in X}$ ${\ displaystyle x \ in \ mathbb {Z} ^ {n}}$ ${\ displaystyle x \ in {\ mathcal {K}}}$ ${\ displaystyle {\ mathcal {K}}}$

The function

{\ displaystyle L (x, \ lambda, \ mu): = f (x) + \ sum _ {i = 1} ^ {p} \ lambda _ {i} g_ {i} (x) + \ sum _ { j = 1} ^ {q} \ mu _ {j} h_ {j} (x)}

is then called the Lagrange function belonging to the above optimization problem . Occasionally the functions and the scalars are combined into vectors and . The Lagrange function is then simplified to ${\ displaystyle g_ {i}, h_ {j}}$ ${\ displaystyle \ lambda _ {i}, \ mu _ {j}}$ ${\ displaystyle g (x) = (g_ {1} (x), \ dots, g_ {p} (x)) ^ {T}, h (x) = (h_ {1} (x), \ dots, h_ {q} (x)) ^ {T}}$ ${\ displaystyle \ lambda = (\ lambda _ {1}, \ dots, \ lambda _ {p}) ^ {T}, \ mu = (\ mu _ {1}, \ dots, \ mu _ {q}) ^ {T}}$

{\ displaystyle L (x, \ lambda, \ mu): = f (x) + \ lambda ^ {T} g (x) + \ mu ^ {T} h (x).}

${\ displaystyle (\ lambda, \ mu)}$ are called dual variables or Lagrange multipliers . The function

{\ displaystyle q (\ lambda, \ mu) = \ inf _ {x \ in X} L (x, \ lambda, \ mu)}

is then called the dual function for the above optimization problem. The optimization problem

{\ displaystyle {\ begin {aligned} {\ text {Maximize}} & q (\ lambda, \ mu) \\ {\ text {under the constraints}} & \ lambda \ geq 0 \ end {aligned}}}

is called the dual problem of the above optimization problem. This is to be understood as components. The original problem is then also referred to as the primary problem . In general, the dual function does not have to be real-valued, it can also take on the value . Then you define ${\ displaystyle \ lambda \ geq 0}$ ${\ displaystyle - \ infty}$

{\ displaystyle \ operatorname {dom} _ {q}: = \ {(\ lambda, \ mu) \ in \ mathbb {R} ^ {p} \ times \ mathbb {R} ^ {q} \, | \, \ lambda \ geq 0, q (\ lambda, \ mu)> - \ infty \}}

as the essential domain of definition of the dual function. The dual variables are said to be dual admissible if they are admissible with respect to the dual problem, that is, if holds. ${\ displaystyle (\ lambda, \ mu)}$ ${\ displaystyle \ lambda \ geq 0}$

example

Consider, as an example, a linear optimization problem in inequality form:

{\ displaystyle {\ begin {aligned} {\ text {Minimize}} & c ^ {T} x \\ {\ text {under the constraints}} & Ax-b \ leq 0 \ end {aligned}}}

There is and . For the sake of completeness, it is set , which in this case does not mean a restriction. The Lagrange function is then ${\ displaystyle x, c \ in \ mathbb {R} ^ {n} \ ,, b \ in \ mathbb {R} ^ {m}}$ ${\ displaystyle A \ in \ mathbb {R} ^ {m \ times n}}$ ${\ displaystyle X = \ mathbb {R} ^ {n}}$

{\ displaystyle L (x, \ lambda) = c ^ {T} x + \ lambda ^ {T} (Ax-b) = (c ^ {T} + \ lambda ^ {T} A) x- \ lambda ^ { T} b}

.

Is , so is independent of and thus is . If, however , it is unlimited in a downward direction and therefore applies . So the dual function is: ${\ displaystyle c ^ {T} = - \ lambda ^ {T} A}$ ${\ displaystyle L (x, \ lambda)}$ ${\ displaystyle x}$ ${\ displaystyle \ inf _ {x \ in X} L (x, \ lambda) = - \ lambda ^ {T} b}$ ${\ displaystyle c ^ {T} \ neq - \ lambda ^ {T} A}$ ${\ displaystyle L (x, \ lambda)}$ ${\ displaystyle \ inf _ {x \ in X} L (x, \ lambda) = - \ infty}$

{\ displaystyle q (\ lambda) = {\ begin {cases} - \ lambda ^ {T} b & {\ text {falls}} c ^ {T} + \ lambda ^ {T} A = 0 \\ - \ infty & {\ text {otherwise}} \ end {cases}}}

If one writes the case distinction from the dual function as a secondary condition in the dual problem, one obtains:

{\ displaystyle {\ begin {aligned} {\ text {Maximize}} & - \ lambda ^ {T} b \\ {\ text {under the constraints}} & A ^ {T} \ lambda + c = 0 \\ & \ lambda \ geq 0 \ end {aligned}}}

This is exactly a linear optimization problem in standard form.

Properties of the dual problem

The definition set of the dual function is convex . ${\ displaystyle \ operatorname {dom} _ {q}}$

The dual function is concave . For the fixed is an affine function and thus is concave as a point-wise infimum of affine functions. Thus the dual problem is always a convex optimization problem , regardless of the structure of the primal problem. ${\ displaystyle {\ tilde {x}}}$ ${\ displaystyle L ({\ tilde {x}}, \ lambda, \ mu)}$ ${\ displaystyle q (\ lambda, \ mu)}$

Hence convex optimization problems are a class of problems whose dual problem is again in the same problem class. Further such classes are the linear optimization problems (see example), the conical programs as well as the semidefinite programs and the SOCPs .

Weak duality

Let be the restriction set of the primal problem and the definition set of the dual problem. Then applies to all and ${\ displaystyle R_ {P}}$ ${\ displaystyle \ operatorname {dom} _ {q}}$ ${\ displaystyle x \ in R_ {P}}$ ${\ displaystyle (\ lambda, \ mu) \ in \ operatorname {dom} _ {q}}$

{\ displaystyle q (\ lambda, \ mu) \ leq f (x).}

The value is then called the duality gap (of the permissible point ). The dual function is therefore always a lower bound for the objective function of the primal problem. In this way, the dual problem can also be motivated: It asks the question of the largest lower bound for the primal problem that is still admissible. ${\ displaystyle f (x) -q (\ lambda, \ mu)}$ ${\ displaystyle (x, \ lambda, \ mu)}$

If the set of values is the primal problem and that of the dual problem, then applies ${\ displaystyle P = f (R_ {P})}$ ${\ displaystyle D = q (\ operatorname {dom} _ {q})}$

{\ displaystyle \ sup (D) \ leq \ inf (P)}

.

The value of the dual optimal solution is therefore always smaller than the value of the primal optimal solution. This statement is also called weak duality . The value is then called the optimal duality gap . ${\ displaystyle \ inf (P) - \ sup (D)}$

These statements follow directly from

{\ displaystyle {\ begin {aligned} q (\ lambda, \ mu) & = \ inf _ {x \ in X} L (x, \ lambda, \ mu) \ leq L ({\ tilde {x}}, \ lambda, \ mu) \\ & = f ({\ tilde {x}}) + \ sum _ {i = 1} ^ {p} \ lambda _ {i} g_ {i} ({\ tilde {x} }) + \ sum _ {j = 1} ^ {q} \ mu _ {j} h_ {j} ({\ tilde {x}}) \ leq f ({\ tilde {x}}) \ quad {\ text {for all}} (\ lambda, \ mu) \ in \ operatorname {dom} _ {q} {\ text {and}} {\ tilde {x}} \ in R_ {P}. \ end {aligned} }}

The last inequality follows, da and (admissibility of ) and (admissibility of ) and thus and . Since the inequality holds for any and , the two statements above follow. ${\ displaystyle g_ {i} ({\ tilde {x}}) \ leq 0}$ ${\ displaystyle h_ {j} ({\ tilde {x}}) = 0}$ ${\ displaystyle {\ tilde {x}}}$ ${\ displaystyle \ lambda _ {i} \ geq 0}$ ${\ displaystyle \ lambda}$ ${\ displaystyle \ lambda _ {i} g_ {i} ({\ tilde {x}}) \ leq 0}$ ${\ displaystyle \ mu _ {i} h_ {i} ({\ tilde {x}}) = 0}$ ${\ displaystyle (\ lambda, \ mu) \ in \ operatorname {dom_ {q}}}$ ${\ displaystyle {\ tilde {x}} \ in R_ {P}}$

Strong duality

Under certain circumstances the optimal value of the primal problem and that of the dual problem coincide, so the optimal duality gap is zero. It then applies

{\ displaystyle \ sup (D) = \ inf (P)}

.

This case is then called strong duality . If strong duality applies and the optimal value is finite and is assumed in or , then the following applies: ${\ displaystyle x ^ {*}}$ ${\ displaystyle (\ lambda ^ {*}, \ mu ^ {*})}$

{\ displaystyle \ sup (D) = \ inf (P) = f (x ^ {*}) = q (\ lambda ^ {*}, \ mu ^ {*}) = L (x ^ {*}, \ lambda ^ {*}, \ mu ^ {*})}

In general, strong duality does not apply, but further regularity requirements ( constraint qualifications ) for the problem must be fulfilled. One of the most important prerequisites for convex optimization problems and almost convex functions , under which strong duality applies, is, for example, the Slater condition .

Complementary slip

Does the strong duality are and primal and dual optimal and is the optimum value is finite, then applies the complementary slackness , even complementary slip called: ${\ displaystyle x ^ {*}}$ ${\ displaystyle (\ lambda ^ {*}, \ mu ^ {*})}$

{\ displaystyle \ lambda _ {i} g_ {i} (x ^ {*}) = 0 {\ text {for all}} i = 1, \ dotsc, p}

If the -th Lagrange multiplier (the -th inequality restriction) is not equal to zero, the -th inequality restriction (the -th Lagrange multiplier) must be equal to zero: ${\ displaystyle i}$ ${\ displaystyle i}$ ${\ displaystyle i}$ ${\ displaystyle i}$

{\ displaystyle {\ begin {aligned} \ lambda _ {i}> 0 \ implies g_ {i} (x ^ {*}) = 0 \\ g_ {i} (x ^ {*}) <0 \ implies \ lambda _ {i} = 0 \ end {aligned}}}

This follows from and . So at least one of the two factors must always be zero. ${\ displaystyle \ lambda _ {i} \ geq 0}$ ${\ displaystyle g_ {i} (x ^ {*}) \ leq 0}$

Saddle points

A point is called a saddle point of the Lagrange function, if for all with ${\ displaystyle (x ^ {*}, \ lambda ^ {*}, \ mu ^ {*})}$ ${\ displaystyle (x, \ lambda, \ mu)}$ ${\ displaystyle \ lambda \ geq 0}$

{\ displaystyle L (x ^ {*}, \ lambda, \ mu) \ leq L (x ^ {*}, \ lambda ^ {*}, \ mu ^ {*}) \ leq L (x, \ lambda ^ {*}, \ mu ^ {*})}

applies. Is equivalent to

{\ displaystyle \ inf _ {x \ in X} \ sup _ {(\ lambda, \ mu) \ in dom_ {q}} L (x, \ lambda, \ mu) = L (x ^ {*}, \ lambda ^ {*}, \ mu ^ {*}) = \ sup _ {(\ lambda, \ mu) \ in dom_ {q}} \ inf _ {x \ in X} L (x, \ lambda, \ mu ).}

This means that there is a maximum of the Lagrange function for fixed and a minimum of the Lagrange function for fixed . ${\ displaystyle (\ lambda ^ {*}, \ mu ^ {*})}$ ${\ displaystyle x ^ {*}}$ ${\ displaystyle x ^ {*}}$ ${\ displaystyle (\ lambda ^ {*}, \ mu ^ {*})}$

It can be shown that a saddle point of the Lagrange function is precisely when primal is optimal, dual is optimal and strong duality applies. ${\ displaystyle (x ^ {*}, \ lambda ^ {*}, \ mu ^ {*})}$ ${\ displaystyle x ^ {*}}$ ${\ displaystyle (\ lambda ^ {*}, \ mu ^ {*})}$

Saddle points play an important role in determining optimal points and suggest a connection to the Karush-Kuhn-Tucker conditions and the Lagrange multipliers . If, for example, all the functions involved are continuously differentiable, the saddle point criterion can be used to derive that at an optimum point ${\ displaystyle x ^ {*}}$

{\ displaystyle \ nabla _ {x} L (x ^ {*}, \ lambda ^ {*}, \ mu ^ {*}) = 0}

must apply, since minimized after the saddle point characterization . This requirement can be found, for example, in the Karush-Kuhn-Tucker conditions for determining an optimal point and as a necessary optimality criterion . ${\ displaystyle x ^ {*}}$ ${\ displaystyle L (x, \ lambda ^ {*}, \ mu ^ {*})}$

More general formulations

Formulation for Hilbert dreams

If one considers an optimization problem with generalized inequalities between real Hilbert spaces (i.e. real complete vector spaces that are provided with a scalar product ), this is usually of the following form:

{\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g_ {i} (x) \ preccurlyeq _ {K_ {i}} 0 & i = 1, \ dotsc, p \\ & h_ {j} (x) = 0 & j = 1, \ dotsc, q \\ & x \ in X & \ end {aligned}}}

Here are the objective function, the inequality restriction functions and the equality restriction functions . They are equipped with the real cone that induces the generalized inequality . The scalar product belonging to the vector space is denoted by. Then you bet . Where and . Then the Lagrange function is of form ${\ displaystyle f: V_ {0} \ mapsto \ mathbb {R}}$ ${\ displaystyle g_ {i}: V_ {0} \ mapsto V_ {i}}$ ${\ displaystyle h_ {j}: V_ {0} \ mapsto V_ {j}}$ ${\ displaystyle V_ {i}}$ ${\ displaystyle K_ {i}}$ ${\ displaystyle \ preccurlyeq _ {K_ {i}}}$ ${\ displaystyle V_ {s}}$ ${\ displaystyle \ langle \ cdot; \ cdot \ rangle _ {s}}$ ${\ displaystyle \ lambda: = (\ lambda _ {1}, \ dots, \ lambda _ {p}) ^ {T}, \ mu: = (\ mu _ {1}, \ dots, \ mu _ {q }) ^ {T}}$ ${\ displaystyle \ lambda _ {i} \ in V_ {i}}$ ${\ displaystyle \ mu _ {j} \ in V_ {j}}$

{\ displaystyle L (x, \ lambda, \ mu): = f (x) + \ sum _ {i = 1} ^ {p} \ langle \ lambda _ {i}; g_ {i} (x) \ rangle _ {i} + \ sum _ {j = 1} ^ {q} \ langle \ mu _ {j}; h_ {j} (x) \ rangle _ {j}}

and the dual function is

{\ displaystyle q (\ lambda, \ mu) = \ inf _ {x \ in X} L (x, \ lambda, \ mu).}

The dual problem is then:

{\ displaystyle {\ begin {aligned} {\ text {Maximize}} & q (\ lambda, \ mu) & \\ {\ text {under the constraints}} & \ lambda _ {i} \ succcurlyeq _ {K ^ { D}} & i = 1, \ dotsc, p \ end {aligned}}}

,

Where is the dual cone of . ${\ displaystyle K ^ {D}}$ ${\ displaystyle K}$

Alternative formulations summarize all inequality restrictions and equality restrictions:

{\ displaystyle g (x) = (g_ {1} (x), \ dots, g_ {p} (x)) ^ {T} \ ,, \, h (x) = (h_ {1} (x) , \ dots, h_ {j} (x)) ^ {T}, K: = K_ {1} \ times \ dots \ times K_ {p}}

This leads to a more compact notation that does without summation symbols and indices and is therefore preferred for theoretical considerations. Alternatively, the requirement for a real cone is weakened; instead, the inequality is defined only by an order cone , which must be at least convex. Sometimes we completely dispense with constraints of equality, instead they are modeled as an order cone with an empty interior. Instead of demanding, you define a cone . Then applies if and only if . For all three variants, the Lagrange function and the dual problem are given in the table below. The dual function is always of the form or, if only one dual variable is used, of the form . ${\ displaystyle g (x) \ in -K \ ,, \, h (x) = 0}$ ${\ displaystyle K '= K \ times \ {0 \}}$ ${\ displaystyle (g (x), h (x)) \ in -K '}$ ${\ displaystyle g (x) \ in -K \ ,, \, h (x) = 0}$ ${\ displaystyle q (\ lambda, \ mu) = \ inf _ {x \ in X} L (x, \ lambda, \ mu)}$ ${\ displaystyle q (\ lambda) = \ inf _ {x \ in X} L (x, \ lambda)}$

Primal problem	Lagrange function	Dual problem
${\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g (x) \ preccurlyeq _ {K} 0 \\ & h (x) = 0 \\ & x \ in X & \ end {aligned}}}$	${\ displaystyle L (x, \ lambda, \ mu) = f (x) + \ langle \ lambda; g (x) \ rangle _ {1} + \ langle \ mu; h (x) \ rangle _ {2} }$	${\ displaystyle {\ begin {aligned} {\ text {Maximize}} & q (\ lambda, \ mu) & \\ {\ text {under the constraints}} & \ lambda \ succcurlyeq _ {K ^ {D}} 0 \ end {aligned}}}$
${\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g (x) \ in -K \\ & h (x) = 0 \\ & x \ in X & \ end {aligned}}}$	${\ displaystyle L (x, \ lambda, \ mu) = f (x) + \ langle \ lambda; g (x) \ rangle _ {1} + \ langle \ mu; h (x) \ rangle _ {2} }$	${\ displaystyle {\ begin {aligned} {\ text {Maximize}} & q (\ lambda, \ mu) & \\ {\ text {under the constraints}} & \ lambda \ in K ^ {D} \ end {aligned }}}$
${\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g (x) \ in -K \\ & x \ in X & \ end {aligned }}}$	${\ displaystyle L (x, \ lambda) = f (x) + \ langle \ lambda; g (x) \ rangle _ {1}}$	: ${\ displaystyle {\ begin {aligned} {\ text {Maximize}} & q (\ lambda) & \\ {\ text {under the constraints}} & \ lambda \ in K ^ {D} \ end {aligned}}}$

It is the objective function, wherein on a cone is excellent, which is a true cone in the first case, the second and third case is a convex cone. It is and . ${\ displaystyle f: V_ {0} \ mapsto \ mathbb {R}}$ ${\ displaystyle g: V_ {0} \ mapsto V_ {1}}$ ${\ displaystyle V_ {1}}$ ${\ displaystyle K}$ ${\ displaystyle h: V_ {0} \ mapsto V_ {2}}$ ${\ displaystyle \ lambda \ in V_ {1}, \, \ mu \ in V_ {2}}$

Formulation for standardized spaces

For problems with mappings between real normalized vector spaces , it should be noted that no scalar product is defined. Instead, one chooses the dual pairing . Accordingly, the dual variables are from the dual space of the vector space. Is a problem of form

{\ displaystyle {\ begin {aligned} {\ text {Minimize}} & f (x) & \\ {\ text {under the constraints}} & g (x) \ in -K \\ & h (x) = 0 \\ & x \ in X & \ end {aligned}}}

where the objective function, the inequality restriction functions and the equation restriction function are. be a cone of order and there are Banach spaces. The Lagrange function is then ${\ displaystyle f: V_ {0} \ mapsto \ mathbb {R}}$ ${\ displaystyle g_ {i}: V_ {0} \ mapsto V_ {1}}$ ${\ displaystyle h_ {j}: V_ {0} \ mapsto V_ {2}}$ ${\ displaystyle K}$ ${\ displaystyle V_ {1}}$ ${\ displaystyle V_ {0}, V_ {1}, V_ {2}}$

{\ displaystyle L (x, u, v) = f (x) + u (g (x)) + v (h (x)).}

Thereby is from the dual space and from the dual space The dual function is then again ${\ displaystyle u}$ ${\ displaystyle V_ {1} ^ {*}}$ ${\ displaystyle v}$ ${\ displaystyle V_ {2} ^ {*}.}$

{\ displaystyle q (u, v) = \ inf _ {x \ in X} L (x, u, v)}

and so the dual problem is:

{\ displaystyle {\ begin {aligned} {\ text {Maximize}} & q (u, v) & \\ {\ text {under the constraints}} & u \ in K ^ {*} \ end {aligned}}}

Here is the dual cone , which in this case is a subset of . Formulations for alternative problems run analogously to the problems for mappings between Hilbert spaces. The dual pairing replaces the scalar product, the dual variables are in the dual space. ${\ displaystyle K ^ {*}}$ ${\ displaystyle V_ {1} ^ {*}}$

Weak and strong duality

The weak duality also applies to the two more general formulations. For the proof in the Hilbert space formulation one uses that is, if and holds (for cones of order one obtains ). In normalized spaces, analogous statements apply with the difference that there is an element of the dual space: If it applies , then it follows . ${\ displaystyle \ langle a; b \ rangle \ leq 0}$ ${\ displaystyle a \ succcurlyeq _ {K ^ {D}} 0}$ ${\ displaystyle b \ preccurlyeq _ {K}}$ ${\ displaystyle a \ in K ^ {D}, \, b \ in -K \ implies \ langle a; b \ rangle \ leq 0}$ ${\ displaystyle a}$ ${\ displaystyle a \ in K ^ {*}, b \ in -K}$ ${\ displaystyle a (b) \ leq 0}$

The strong duality is defined identically to the ordinary case in the more general cases. Even in the generalized case, there are regularity requirements such as the Slater condition , which guarantee strong duality.

literature

Florian Jarre, Josef Stoer: Optimization. Springer, Berlin 2004, ISBN 3-540-43575-1 .
Stephan Boyd, Lieven Vandenberghe: Convex Optimization. Cambridge University Press, 2004, ISBN 978-0-521-83378-3 ( online ).
C. Geiger, C. Kanzow: Theory and numerics of restricted optimization tasks. Springer, 2002, ISBN 3-540-42790-2 .
Johannes Jahn: Introduction to the Theory of Nonlinear Optimization. 3. Edition. Springer, Berlin 2007, ISBN 978-3-540-49378-5 .