# Karush-Kuhn-Tucker conditions

The Karush-Kuhn-Tucker conditions are a necessary first-order optimality criterion in nonlinear optimization . They are the generalization of the necessary condition of optimization problems without constraints and the Lagrange multipliers of optimization problems under equation constraints . They were first listed in 1939 in the unpublished master's thesis by William Karush . However, these only became better known in 1951 after a conference paper by Harold W. Kuhn and Albert W. Tucker . ${\ displaystyle \ nabla f (x) = {\ vec {0}}}$ ## Framework

The KKT conditions enable statements to be made about an optimization problem of the shape

${\ displaystyle \ min _ {x \ in D} f (x)}$ under the constraints

${\ displaystyle g_ {i} (x) \ leq 0 ~, 1 \ leq i \ leq m}$ ${\ displaystyle h_ {j} (x) = 0 ~, 1 \ leq j \ leq l}$ .

All functions considered are continuously differentiable and is a non-empty subset of the . ${\ displaystyle f (x), g_ {i} (x), h_ {j} (x) \ colon D \ to \ mathbb {R}}$ ${\ displaystyle D}$ ${\ displaystyle \ mathbb {R} ^ {n}}$ ## statement

### Karush-Kuhn-Tucker conditions

A point is called the Karush-Kuhn-Tucker point, or KKT point for short, of the above optimization problem if it meets the following conditions: ${\ displaystyle (x ^ {*}, \ mu ^ {*}, \ lambda ^ {*}) \ in \ mathbb {R} ^ {n + m + l}}$ {\ displaystyle {\ begin {alignedat} {3} & \ nabla f (x ^ {*}) + \ sum _ {i = 1} ^ {m} \ mu _ {i} ^ {*} \ nabla g_ { i} (x ^ {*}) + \ sum _ {j = 1} ^ {l} \ lambda _ {j} ^ {*} \ nabla h_ {j} (x ^ {*}) = 0, \\ & h_ {j} (x ^ {*}) = 0, && {\ text {for}} j = 1, \ ldots, l, \\ & g_ {i} (x ^ {*}) \ leq 0, && { \ text {for}} i = 1, \ ldots, m, \\ & \ mu _ {i} ^ {*} \ geq 0, && {\ text {for}} i = 1, \ ldots, m, \ \ & \ mu _ {i} ^ {*} g_ {i} (x ^ {*}) = 0, && {\ text {for}} i = 1, \ ldots, m. \ end {alignedat}}} These conditions are called the Karush-Kuhn-Tucker conditions or KKT conditions for short . Alternatively, the Lagrange function is used

${\ displaystyle L (x, \ mu, \ lambda): = f (x) + \ sum _ {i = 1} ^ {m} \ mu _ {i} g_ {i} (x) + \ sum _ { j = 1} ^ {l} \ lambda _ {j} h_ {j} (x)}$ ,

so you can formulate the first line as . The second and third lines demand that is admissible for the (primal) problem, the fourth demands admissibility of the dual variable for the dual problem, and the last line requires complementarity . ${\ displaystyle \ nabla _ {x} L (x ^ {*}, \ mu ^ {*}, \ lambda ^ {*}) = 0}$ ${\ displaystyle x ^ {*}}$ If the domain of definition is , then one does not necessarily need the formulation about and the associated Lagrange multipliers. Instead, the KKT are then: ${\ displaystyle D = \ mathbb {R} _ {\ geq 0} ^ {n}}$ ${\ displaystyle g_ {i} (x ^ {*}) \ leq 0}$ {\ displaystyle {\ begin {alignedat} {3} & \ nabla f (x ^ {*}) + \ sum _ {i = 1} ^ {m} \ mu _ {i} ^ {*} \ nabla g_ { i} (x ^ {*}) + \ sum _ {j = 1} ^ {l} \ lambda _ {j} ^ {*} \ nabla h_ {j} (x ^ {*}) {\ geq} 0 , \\ & g_ {i} (x ^ {*}) \ leq 0, && {\ text {for}} i = 1, \ ldots, m, \\ & h_ {j} (x ^ {*}) = 0 , && {\ text {for}} j = 1, \ ldots, l, \\ & \ mu _ {i} ^ {*} \ geq 0, && {\ text {for}} i = 1, \ ldots, m, \\ & \ mu _ {i} ^ {*} g_ {i} (x ^ {*}) = 0, && {\ text {for}} \; i = 1, \ ldots, m, \\ & x_ {p} \ cdot \ left (\ nabla f (x ^ {*}) + \ sum _ {i = 1} ^ {m} \ mu _ {i} ^ {*} \ nabla g_ {i} (x ^ {*}) + \ sum _ {j = 1} ^ {l} \ lambda _ {j} ^ {*} \ nabla h_ {j} (x ^ {*}) \ right) = 0, & \; & {\ text {for}} p = 1, \ ldots, n, \\ & x_ {p} \ geq 0, && {\ text {for}} p = 1, \ ldots, n. \\\ end {alignedat }}} ### Optimality criterion

If the point is the local minimum of the optimization problem and if it fulfills certain regularity requirements (see below), then there is such a thing as a KKT point. ${\ displaystyle x ^ {*}}$ ${\ displaystyle \ mu ^ {*}, \ lambda ^ {*}}$ ${\ displaystyle (x ^ {*}, \ mu ^ {*}, \ lambda ^ {*})}$ Thus, the KKT conditions are a necessary criterion for optimality . In general it is not clearly defined. ${\ displaystyle \ mu ^ {*}, \ lambda ^ {*}}$ ## Regularity requirements

There are many different regularity conditions that ensure that the KKT conditions apply. They differ mainly in their generality and the ease with which they are used and verifiable. Based on the English they are also called constraint qualifications .

Examples of constraint qualifications are:

• Abadie CQ : The tangential cone and the linearized tangential cone areidentical.${\ displaystyle {\ hat {x}}}$ • Linear independence - linear independence constraint qualification (LICQ): The gradients of the active inequality conditions and the gradients of the equation conditions are linearly independent in the point. This CQ provides clarity.${\ displaystyle {\ hat {x}}}$ ${\ displaystyle \ mu ^ {*}, \ lambda ^ {*}}$ • Mangasarian-Fromovitz - Mangasarian-Fromovitz constraint qualification (MFCQ): The gradients of the active inequality conditions and the gradients of the equation conditions are positive-linearly independent in the point.${\ displaystyle {\ hat {x}}}$ • Constant rank constraint qualification (CRCQ): For each subset of the gradients of the active inequality conditions and the gradients of the equation conditions, the rank in a neighborhood of is constant.${\ displaystyle {\ hat {x}}}$ • Constant positive-linear dependence - constant positive-linear dependence constraint qualification (CPLD): For each subset of the gradients of the active inequality conditions and the gradients of the equation conditions in the point, the following applies: if there is a positive-linear dependence in the point , then there is a positive- linear dependence in a neighborhood of .${\ displaystyle {\ hat {x}}}$ ${\ displaystyle {\ hat {x}}}$ ${\ displaystyle {\ hat {x}}}$ Especially for convex optimization problems and almost convex functions there is the

• Slater condition : There is a feasible point that is strictly admissible with respect to the inequality restrictions. It provides the regularity of all points in the problem and not just that of the point under investigation.

It can be shown that the following two strands of inference hold

${\ displaystyle {\ mbox {LICQ}} \ Rightarrow {\ mbox {MFCQ}} \ Rightarrow {\ mbox {CPLD}}}$ and ,${\ displaystyle {\ mbox {LICQ}} \ Rightarrow {\ mbox {CRCQ}} \ Rightarrow {\ mbox {CPLD}}}$ although MFCQ is not equivalent to CRCQ. In practice, weaker constraint qualifications are preferred, since they deliver stronger optimality conditions. In particular, the constraint qualifications can also be used to ensure that the KKT conditions match the Fritz John conditions .

## Special cases

### Convex optimization

If the optimization problem is a convex optimization problem , ie if the objective function and the inequality restriction functions and the definition set are convex and if the equation restrictions are affine, stronger statements can be made.

On the one hand, the Slater condition can then be used as the regularity condition , which provides the regularity of all points of the problem; on the other hand, the KKT condition is also a sufficient criterion for optimality for convex problems . Every point that is a KKT point is therefore a local (and due to the convexity even global) minimum. In particular, no regularity requirement is necessary for this.

### Convex objective function with linear restrictions

If the objective function and the definition set are convex and all restrictions are affine, that is, and , then a KKT point is equivalent to the global minimum without further regularity requirements. ${\ displaystyle f (x)}$ ${\ displaystyle D}$ ${\ displaystyle g_ {i} (x) = a_ {i} ^ {T} x-b_ {i}}$ ${\ displaystyle h_ {j} (x) = a_ {j} ^ {T} x-b_ {j}}$ ### General objective function with linear restrictions

If the objective function and the domain of definition are arbitrary within the framework of the above prerequisites and if all restrictions are affine, then the Abadie CQ is automatically fulfilled, since the linearization of the linear functions again provides the functions themselves. In this case, without further requirements for regularity, a local optimum is always a KKT point.

## example

As an example, consider the nonlinear optimization problem

${\ displaystyle \ min _ {x \ in X} [- (x_ {1} +1) ^ {2} - (x_ {2} +2) ^ {2}]}$ with the restriction amount

${\ displaystyle X = \ {x \ in \ mathbb {R} ^ {2} \, | \, g_ {1} (x) = x_ {2} \ leq 0, \, g_ {2} (x) = x_ {1} ^ {2} -x_ {2} -4 \ leq 0, \, g_ {3} (x) = - x_ {1} ^ {2} + x_ {2} +1 \ leq 0 \} }$ .

There is a local minimum in the point . First one checks one of the regularity conditions, in this case LICQ : in the local optimum the inequality restrictions are active and their gradients are linearly independent. Thus, the LICQ is fulfilled, so there is a KKT point. To calculate this, we first find that is , that is, is definitely due to the KKT condition . The other values ​​of the KKT point result from the system of equations for the gradients at the point${\ displaystyle x ^ {*} = (2.0)}$ ${\ displaystyle g_ {1}, g_ {2}}$ ${\ displaystyle \ nabla g_ {1} (x ^ {*}) = (0.1) ^ {T}, \ nabla g_ {2} (x ^ {*}) = (4, -1) ^ {T }}$ ${\ displaystyle g_ {3} (x ^ {*}) <0}$ ${\ displaystyle \ mu _ {3} ^ {*} g_ {3} (x ^ {*}) = 0}$ ${\ displaystyle \ mu _ {3} ^ {*} = 0}$ ${\ displaystyle x ^ {*}}$ ${\ displaystyle {\ begin {pmatrix} -6 \\ - 4 \ end {pmatrix}} + \ mu _ {1} ^ {*} {\ begin {pmatrix} 0 \\ 1 \ end {pmatrix}} + \ mu _ {2} ^ {*} {\ begin {pmatrix} 4 \\ - 1 \ end {pmatrix}} = {\ begin {pmatrix} 0 \\ 0 \ end {pmatrix}}}$ to . Thus a KKT point is given as . ${\ displaystyle \ mu _ {1} ^ {*} = {\ frac {11} {2}}, \, \ mu _ {2} ^ {*} = {\ frac {3} {2}}}$ ${\ displaystyle (2.0, {\ frac {11} {2}}, {\ frac {3} {2}}, 0)}$ However, since the problem is not convex, the converse is not true: the point is a KKT point of the problem, but not an optimum. ${\ displaystyle (-1, -2,0,0,0)}$ ## Generalizations

The Fritz John conditions are a generalization of the KKT conditions . They do not require any regularity requirements, but provide a weaker statement. For convex optimization problems in which the functions are not continuously differentiable, there are also the saddle point criteria of the Lagrange function .