Lagrange multiplier

Visualization of the method of Lagrange multipliers. The red line represents the amount on which is fulfilled. The blue lines are contour lines for different values of . At the point at which is maximum under the constraint , it runs tangent to the contour line . There the gradients of the functions and , represented by blue and red arrows, are collinear .

{\ displaystyle g (x, y) = c}

{\ displaystyle f (x, y) = d}

{\ displaystyle d}

{\ displaystyle f}

{\ displaystyle g = c}

{\ displaystyle g = c}

{\ displaystyle f (x, y) = d_ {1}}

{\ displaystyle f}

{\ displaystyle g}

The same problem as above, with the function values of being plotted on the height axis.

{\ displaystyle f}

The method of Lagrange multipliers (after Joseph-Louis Lagrange ) is a method for solving optimization problems with constraints in mathematical optimization . An optimization problem with constraints is the task of finding a local extremum of a function in several variables with one or more constraints, the constraints being defined by setting functions to given values. This method introduces a new unknown scalar variable for each constraint, a Lagrange multiplier, and defines a linear combination that incorporates the multipliers as coefficients . The solutions to the original optimization task can then be determined as critical points of this so-called Lagrange function under certain conditions .

description

First we consider the two-dimensional case with a constraint. Let us assume that we want to maximize a function while keeping the constraint ( constant). The constraint filters out certain points on the - - plane, which together form curves. For our consideration we assume that the secondary condition is such that it can be represented by a single curve (see adjacent picture, red curve). When we move on this curve , we touch or cut contour lines from . We now see that we only ever achieve a maximum of the function if our movement on the curve is tangential to the contour line: Otherwise, we can increase the function value of even further by moving forwards or backwards on the curve without violating the secondary condition. ${\ displaystyle f (x, y)}$ ${\ displaystyle g (x, y) = c}$ ${\ displaystyle c}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle f (x, y)}$ ${\ displaystyle d}$ ${\ displaystyle f (x, y)}$ ${\ displaystyle g (x, y) = c}$ ${\ displaystyle f (x, y) = d}$ ${\ displaystyle g (x, y) = c}$ ${\ displaystyle f (x, y)}$

A well-known example can be found in the weather maps with their contour lines for temperatures and pressure. The extremes under the secondary condition occur where lines touch when the cards are overlaid. We translate the tangent condition geometrically by saying that the gradients of and at the maximum are parallel vectors , whereby the gradient of must not vanish. ${\ displaystyle f}$ ${\ displaystyle g}$ ${\ displaystyle g}$

So we are looking for points to where and ${\ displaystyle (x, y)}$ ${\ displaystyle g (x, y) = c}$ ${\ displaystyle \ nabla _ {x, y} g \ neq 0}$

{\ displaystyle \ nabla _ {x, y} f = - \ lambda \ nabla _ {x, y} g}

.

The following abbreviations and definitions were used for the associated gradients:

{\ displaystyle \ nabla _ {x, y} f: = \ left ({\ frac {\ partial f} {\ partial x}}, {\ frac {\ partial f} {\ partial y}} \ right)}

and

{\ displaystyle \ nabla _ {x, y} g: = \ left ({\ frac {\ partial g} {\ partial x}}, {\ frac {\ partial g} {\ partial y}} \ right)}

The constant Lagrange multiplier is required because the two gradients should be parallel, but as vectors they can be of different lengths. In order to summarize all the mentioned conditions into one equation, it is useful to use the following Lagrange function: ${\ displaystyle \ lambda}$

{\ displaystyle \ Lambda (x, y, \ lambda): = f (x, y) + \ lambda \ cdot {\ Big (} g (x, y) -c {\ Big)}}

The solution of the above-described optimization problem with a secondary condition now corresponds to a local extremum of the Lagrange function. This extreme can be calculated using the gradient of the Lagrange function:

{\ displaystyle \ nabla _ {x, y, \ lambda} \ Lambda (x, y, \ lambda) = 0.}

The - and the - components of this equation correspond to the requirement for parallelism of the two original gradients, the third component is identical to . ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle \ nabla _ {\ lambda} \ Lambda (x, y, \ lambda) = 0}$ ${\ displaystyle g (x, y) = c}$

Points at which the gradient of the Lagrange function or the constraint vanishes are also called critical points of the Lagrange function. The latter are used because the Lagrange multipliers method cannot make any statements about them and they are therefore considered as candidates for extreme points. Since in general not every critical point of the Lagrange function solves the original optimization problem, this method only supplies one necessary condition for the solution of the optimization problem. ${\ displaystyle g}$

Examples

Representation of an optimization problem with a secondary condition

Example with secondary conditions without vanishing gradients

In this example, the function is to be optimized under the secondary condition . The secondary condition therefore corresponds to the unit circle. With the help of the graphic, the maximum at can be determined very easily. The minimum of the optimization problem is at . ${\ displaystyle f (x, y) = x + y}$ ${\ displaystyle x ^ {2} + y ^ {2} = 1}$ ${\ displaystyle ({\ sqrt {2}} / 2, {\ sqrt {2}} / 2)}$ ${\ displaystyle (- {\ sqrt {2}} / 2, - {\ sqrt {2}} / 2)}$

First we check at which points on the unit circle the gradient of the constraint function vanishes. So we calculate ${\ displaystyle g (x, y) = x ^ {2} + y ^ {2}}$

{\ displaystyle \ nabla _ {x, y} g = (2x, 2y)}

and see that this is only the same in the origin . However, this point is not on the unit circle, so it does not meet the secondary condition and is therefore not included in the list of critical points. ${\ displaystyle (0,0)}$

To be able to apply the Lagrange multiplier method, be and ${\ displaystyle g (x, y) -c = x ^ {2} + y ^ {2} -1}$

{\ displaystyle \ Lambda (x, y, \ lambda) = f (x, y) + \ lambda (g (x, y) -c) = x + y + \ lambda (x ^ {2} + y ^ {2 } -1) = x + y + \ lambda x ^ {2} + \ lambda y ^ {2} - \ lambda}

.

The condition gives the following three equations: ${\ displaystyle d \ Lambda = 0}$

{\ displaystyle {\ begin {aligned} {\ frac {\ partial \ Lambda} {\ partial x}} & = 1 + 2 \ lambda x && = 0, \ qquad {\ text {(i)}} \\ {\ frac {\ partial \ Lambda} {\ partial y}} & = 1 + 2 \ lambda y && = 0, \ qquad {\ text {(ii)}} \\ {\ frac {\ partial \ Lambda} {\ partial \ lambda}} & = x ^ {2} + y ^ {2} -1 && = 0. \ qquad {\ text {(iii)}} \ end {aligned}}}

As always, the third equation (iii) corresponds to the required secondary condition. With (i) can be resolved to. The same is done for equation (ii) and . One thus obtains . If this is inserted in (iii), one obtains , thus . The critical points are thus calculated as and . The function to be optimized has the values , and at these two points . ${\ displaystyle \ lambda \ neq 0}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle x = -1 / (2 \ lambda) = y}$ ${\ displaystyle 2 \ lambda ^ {2} = 1}$ ${\ displaystyle \ lambda = \ pm 1 / {\ sqrt {2}} = \ pm {\ sqrt {2}} / 2}$ ${\ displaystyle ({\ sqrt {2}} / 2, {\ sqrt {2}} / 2)}$ ${\ displaystyle (- {\ sqrt {2}} / 2, - {\ sqrt {2}} / 2)}$ ${\ displaystyle f}$ ${\ displaystyle {\ sqrt {2}}}$ ${\ displaystyle - {\ sqrt {2}}}$

Application-related example

A flock of 15 (punctiform) sheep is enclosed by a fence in such a way that an elliptical plot of land with minimal area is created. The major and minor axes of the ellipse should be parallel to the - and - axes, and the center of gravity should be at the origin.

{\ displaystyle x}

{\ displaystyle y}

A shepherd wants to enclose his flock of sheep with a fence. The demarcated property should have the shape of an ellipse with the center of gravity at the origin and the main and minor axes parallel to the - and - axis. In addition, the plot should match the area ${\ displaystyle 15}$ ${\ displaystyle x}$ ${\ displaystyle y}$

{\ displaystyle A = \ pi from}

be as small as possible. The constraint is in the form of the equation

{\ displaystyle {\ frac {x ^ {2}} {a ^ {2}}} + {\ frac {y ^ {2}} {b ^ {2}}} = 1}

that describes an ellipse with a center at the origin, major axis of length and minor axis of length . Major axis and minor axis are parallel to the - and - axis, which is why the ellipse runs through points and . ${\ displaystyle 2a}$ ${\ displaystyle 2b}$ ${\ displaystyle x}$ ${\ displaystyle y}$ ${\ displaystyle (a, 0)}$ ${\ displaystyle (0, b)}$

The general Lagrange function with any values for and is ${\ displaystyle x}$ ${\ displaystyle y}$

{\ displaystyle \ Lambda (a, b, \ lambda) = \ pi ab + \ lambda \ left ({\ frac {x ^ {2}} {a ^ {2}}} + {\ frac {y ^ {2} } {b ^ {2}}} - 1 \ right) = \ pi ab + x ^ {2} {\ frac {\ lambda} {a ^ {2}}} + y ^ {2} {\ frac {\ lambda} {b ^ {2}}} - \ lambda}

with . The gradient of this is set to zero in order to determine the critical points. Here and is assumed. Because for or you would get an empty ellipse. ${\ displaystyle \ lambda \ neq 0}$ ${\ displaystyle a> x> 0}$ ${\ displaystyle b> y> 0}$ ${\ displaystyle a = 0}$ ${\ displaystyle b = 0}$

{\ displaystyle {\ frac {\ partial} {\ partial a}} \ Lambda (a, b, \ lambda) = \ pi b-2x ^ {2} {\ frac {\ lambda} {a ^ {3}} } = 0 \ Leftrightarrow \ pi a ^ {3} b-2x ^ {2} \ lambda = 0}

{\ displaystyle {\ frac {\ partial} {\ partial b}} \ Lambda (a, b, \ lambda) = \ pi a-2y ^ {2} {\ frac {\ lambda} {b ^ {3}} } = 0 \ Leftrightarrow \ pi from ^ {3} -2y ^ {2} \ lambda = 0}

{\ displaystyle {\ frac {\ partial} {\ partial \ lambda}} \ Lambda (a, b, \ lambda) = {\ frac {x ^ {2}} {a ^ {2}}} + {\ frac {y ^ {2}} {b ^ {2}}} - 1 = 0 \ Leftrightarrow x ^ {2} b ^ {2} + y ^ {2} a ^ {2} = a ^ {2} b ^ {2}}

The first equation is transformed into and inserted into the second equation. results . If you insert this expression into the third equation, you get the solutions by inserting it back ${\ displaystyle \ lambda = {\ frac {\ pi a ^ {3} b} {2x ^ {2}}}}$ ${\ displaystyle \ pi ab ^ {3} - {\ frac {\ pi y ^ {2} a ^ {3} b} {x ^ {2}}} = 0}$ ${\ displaystyle x ^ {2} b ^ {2} = y ^ {2} a ^ {2}}$

{\ displaystyle a = {\ sqrt {2}} x, \ b = {\ sqrt {2}} y \ {\ text {and}} \ \ lambda = {\ pi ab} = {2 \ pi xy} \ neq 0}

.

The gradient of the constraint function

{\ displaystyle \ nabla g (a, b) = \ left (- {\ frac {2x ^ {2}} {a ^ {3}}}, - {\ frac {2y ^ {2}} {b ^ { 3}}} \ right)}

disappears for . Since the point does not lie on the ellipse, it is the minimum we are looking for. This has to be checked graphically in each individual case, since the Lagrange multipliers only provide one necessary criterion. ${\ displaystyle x = y = 0}$ ${\ displaystyle (0,0)}$ ${\ displaystyle (a, b) = ({\ sqrt {2}} x, {\ sqrt {2}} y)}$

Example with secondary condition with a vanishing gradient

We consider the function with . If one examines the function now for extremes, one can determine all extremes in the interior of the definition area with the help of the sufficient criterion for local extreme points. However, the edge extrema are found using the Lagrange multiplier. The edge of the definition area forms the secondary condition. Here it is the two positive coordinate axes and the origin. So we find the secondary condition with . ${\ displaystyle f \ colon \ mathbb {R} _ {0} ^ {+} \ times \ mathbb {R} _ {0} ^ {+} \ to \ mathbb {R}}$ ${\ displaystyle f (x, y) = e ^ {- (x + y)}}$ ${\ displaystyle g \ colon \ mathbb {R} _ {0} ^ {+} \ times \ mathbb {R} _ {0} ^ {+} \ to \ mathbb {R}}$ ${\ displaystyle g (x, y) = xy = c = 0}$

We first set up the Lagrange function:

{\ displaystyle \ Lambda (x, y, \ lambda) = f (x, y) + \ lambda (g (x, y) -c) = e ^ {- (x + y)} + \ lambda xy}

the equation

{\ displaystyle \ nabla _ {x, y, \ lambda} \ Lambda = 0}

leads us to the system of equations

{\ displaystyle {\ begin {aligned} {\ frac {\ partial \ Lambda} {\ partial x}} & = - e ^ {- (x + y)} + \ lambda y && = 0, \ qquad {\ text { (i)}} \\ {\ frac {\ partial \ Lambda} {\ partial y}} & = - e ^ {- (x + y)} + \ lambda x && = 0, \ qquad {\ text {(ii )}} \\ {\ frac {\ partial \ Lambda} {\ partial \ lambda}} & = xy && = 0. \ qquad {\ text {(iii)}} \ end {aligned}}}

The third equation says that or . Assuming it were , then this - inserted into the second equation - leads to a contradiction, because the equation ${\ displaystyle x = 0}$ ${\ displaystyle y = 0}$ ${\ displaystyle x = 0}$

{\ displaystyle -e ^ {- y} = 0}

has no solution because the function has no zeros. Similarly, the case with the first equation leads to a contradiction. The Lagrange multiplier does not provide any critical points. ${\ displaystyle e}$ ${\ displaystyle y = 0}$

However, we have not checked at which points the gradient of the constraint disappears. It applies

{\ displaystyle \ nabla _ {x, y} g = (y, x) = (0,0) \ Leftrightarrow (x, y) = (0,0)}

The gradient of the secondary condition disappears in the origin, and this also lies on the edge of the definition area of (it fulfills the secondary condition). As described above, these points must also be considered as candidates for extrema. And in fact it is and for everyone . The origin is therefore the global maximum of the function. ${\ displaystyle f}$ ${\ displaystyle f (0,0) = e ^ {0} = 1}$ ${\ displaystyle f (x, y) = e ^ {- (x + y)} <1}$ ${\ displaystyle (x, y) \ neq (0,0)}$

However, the presence of critical points does not say anything about the presence of extremes. If one were to replace the domains of definition of and by in this example , one would get the same single critical point, but the origin would not be a global (and also not a local) maximum of (e.g. the function diverges in the 3rd quadrant). Indeed, this would have no local maxima or minima. ${\ displaystyle f}$ ${\ displaystyle g}$ ${\ displaystyle \ mathbb {R} \ times \ mathbb {R}}$ ${\ displaystyle f}$ ${\ displaystyle f}$

Several constraints

Let it be a function defined in an open subset . We define independent constraints , . I.e. the gradients of the constraints are linearly independent for each point , with for all . In particular, this means that none of the gradients disappears. If the gradients are linearly dependent at one point, this point is added to the list of critical points. Now let's set ${\ displaystyle f}$ ${\ displaystyle U \ subseteq \ mathbb {R} ^ {n}}$ ${\ displaystyle s}$ ${\ displaystyle g_ {k} (x) = 0}$ ${\ displaystyle k = 1, \ ldots, s}$ ${\ displaystyle x \ in U}$ ${\ displaystyle g_ {k} (x) = 0}$ ${\ displaystyle k = 1, \ ldots, s}$

{\ displaystyle \ Lambda (x, \ lambda) = f (x) + \ sum _ {k = 1} ^ {s} \ lambda _ {k} g_ {k} (x),}

where and is. ${\ displaystyle x = (x_ {1}, \ dots, x_ {n})}$ ${\ displaystyle \ lambda = (\ lambda _ {1}, \ dots, \ lambda _ {s})}$

Let us now look at the critical point of at ${\ displaystyle \ Lambda}$

{\ displaystyle {\ frac {\ partial \ Lambda} {\ partial x_ {i}}} = 0,}

which is equivalent to

{\ displaystyle {\ frac {\ partial f} {\ partial x_ {i}}} = - \ sum _ {k = 1} ^ {s} \ lambda _ {k} {\ frac {\ partial g_ {k} } {\ partial x_ {i}}}.}

We determine the unknown multipliers with the help of our constraint equations and have thus found a critical point (ie ) of . This is a necessary condition for having an extremum on the set of points that satisfy the constraints. I.e. Here, too, the extremes must be filtered out of the list of critical points by other means. ${\ displaystyle \ lambda _ {k}}$ ${\ displaystyle {\ frac {\ partial \ Lambda} {\ partial x_ {i}}} = 0}$ ${\ displaystyle \ Lambda}$ ${\ displaystyle f}$

Note that it is therefore particularly wrong to speak of "maximizing the Lagrange function". The Lagrange function is unlimited and therefore has no global extrema and can therefore not be maximized. Only the critical points of the Lagrange function indicate points at which the objective function possibly assumes a maximum with regard to the secondary conditions.

Sufficient conditions

This procedure only provides one necessary condition for extreme points. There are various criteria for detecting the extreme points and determining their type. In general, the modified Hessian matrix is formed and its determinant or certain sub-determinants are calculated. However, this approach does not always lead to a statement. Alternatively, a visualization or geometric considerations can be used to determine the type of extreme point.

Importance of the Lagrange multipliers in physics

The importance of the Lagrange multipliers in physics becomes visible when used in classical mechanics . For this purpose, they were introduced by Lagrange around 1777. The equations of motion of classical mechanics can be obtained in the Lagrange formalism with the help of the Euler-Lagrange equation from the condition that the effect assumes an extreme when the coordinates and their time derivatives vary independently of one another . A physical constraint that restricts movement appears as a secondary condition of the extremum. The Lagrange multiplier, with which the constraint is inserted into the Lagrange function, is closely related to the physical constraint force with which the object described by the equation of motion is made to comply with the constraint. The following example of a free point mass moving in two dimensions on a path with a constant radius makes this clear: ${\ displaystyle m}$ ${\ displaystyle R}$

Lagrange function ( kinetic energy in polar coordinates ):

{\ displaystyle L = {\ frac {1} {2}} m ({\ dot {r}} ^ {2} + r ^ {2} {\ dot {\ varphi}} ^ {2}),}

Constraint:

{\ displaystyle g = rR = 0,}

new Lagrange function:

{\ displaystyle L ^ {\ prime} = L + \ lambda g,}

Euler-Lagrange equation (formulated here only for the radial coordinate, since the constraint depends on this; the angular coordinate gives the conservation of angular momentum for this movement):

{\ displaystyle {\ frac {d} {dt}} {\ frac {\ partial L ^ {\ prime}} {\ partial {\ dot {r}}}} - {\ frac {\ partial L ^ {\ prime }} {\ partial r}} = 0}

{\ displaystyle m {\ ddot {r}} - mr {\ dot {\ varphi}} ^ {2} - \ lambda = 0;}

with and as well as ( angular velocity ) follows ${\ displaystyle {\ ddot {r}} = 0}$ ${\ displaystyle r = R}$ ${\ displaystyle {\ dot {\ varphi}} = \ omega}$

{\ displaystyle \ lambda = -mR \ omega ^ {2}.}

This corresponds to the centripetal force formulated in polar coordinates , which forces the point mass to move on a circular path.

Generalizations

The Karush-Kuhn-Tucker conditions and the Fritz-John conditions are a generalization of the Lagrange multipliers for constraints, which are also described by inequalities. Both play an important role in nonlinear optimization . For convex optimization problems in which the functions are not continuously differentiable, there are also the saddle point criteria of the Lagrange function .

literature

Otto Forster: Analysis 2. Vieweg + Teubner, Wiesbaden 2008, ISBN 978-3-8348-0575-1 . P. 110ff
Michael Sauer: Operations Research compact 1st edition, Oldenbourg, Munich 2009, ISBN 978-3-486-59082-1 .
Heinrich Rommelfanger: Mathematics for Economists II Volume 2 (2nd edition, 1992), BI Wissenschaftsverlag, ISBN 9783860259818 . P. 238ff

Web links

Conceptual introduction (English)
Student thesis Lagrange optimization (including sample calculation and references; PDF; 474 kB)