Lagrange multipliers, using tangency to solve constrained optimization

6m read

·Nov 11, 2024

In the last video, I introduced a constrained optimization problem where we were trying to maximize this function f of x y equals x squared times y, but subject to a constraint that your values of x and y have to satisfy x squared plus y squared equals one. The way we were visualizing this was to look at the x y plane, where this circle here represents our constraint— all of the points that make up this set x squared plus y squared equals 1.

Then, this curvy line here is one of the contours of f, meaning we're setting f of x y equal to some constant, and then I was varying around that constant c. For high values of c, the contour would look something like this. This is where the value of x squared times y is big, and then for small values of c, the contours would look like this. All the points on this line would be f of x y equals like 0.01 in this case— something like that.

The way to think about maximizing this function is to try to increase that value of c as much as you can without it falling off the circle. The key observation is that this happens when they’re tangent. You might kind of draw this out in a little sketch and say there’s some curve representing your constraint, which in this case would be, you know, kind of where our circle is, and then the curve representing the contour would just kiss that curve, just barely touch it in some way.

Now that's pretty, but in terms of solving the problem, we still have some work to do. The main tool we're going to use here is the gradient. So, let me go ahead and draw a lot more contour lines than there already are for x squared times y. This is many of the contour lines, and I'll draw the gradient field—the gradient field of f.

I've made a video about the relationship between the gradient and contour lines, and the upshot of it is that these gradient vectors, every time they pass through a contour line, they're perpendicular to it. The basic reason for that is if you walk along the contour line, the function isn't changing value. So if you want it to change most rapidly, you know, it kind of makes sense you should walk in the perpendicular direction, so that no component of the walk that you're taking is, you know, useless, is along the line where the function doesn't change.

But again, there's a whole video on that that's worth checking out if this feels unfamiliar. For our purposes, what it means is that when we're considering this point of tangency, the gradient of f at that point is going to be some vector perpendicular to both of the curves at that point. That little vector represents the gradient of f at this point on the plane.

We can do something very similar to understand the other curve. Right now, I've just written it as a constraint x squared plus y squared equals 1, but to give that function a name, let's say that we've defined g of x y to be x squared plus y squared. In that case, this constraint is pretty much just one of the contour lines for the function g. We can take a look at that if we go over here and we look at all of the other contour lines for this function g.

It should make sense that they're circles because this function is x squared plus y squared. If we took a look at the gradient of g, and we go over and ask about the gradient of g, it has that same property that every gradient vector, if it passes through a contour line, is perpendicular to it. So over on our drawing here, the gradient vector of g would also be perpendicular to both these curves.

You know, maybe in this case it's not as long as the gradient of f, or maybe it's longer— there's no reason that it would be the same length. But the important fact is that it's proportional. The way that we're going to write this in formulas is to say that the gradient of f evaluated—let's see— evaluated at whatever the maximizing value of x and y are. We should give that a name, probably maybe, you know, x sub m y sub m— the specific values of x and y that are going to be at this point that maximizes the function subject to our constraint.

So that’s going to be related to the gradient of g— it's not going to be quite equal, so I'll leave some room here. It's related to the gradient of g evaluated at that same point x m y m, and like I said, they're not equal; they're proportional.

So we need to have some kind of proportionality constant in there. You almost always use the variable lambda. This guy has a fancy name; it's called a Lagrange multiplier. Lagrange was one of those famous French mathematicians— I always get him confused with some of the other French mathematicians at the time, like Legendre or Laplace. There’s a whole bunch of things— let's see, multiplier— distracting myself talking here.

So, Lagrange multiplier. There are a number of things in multi-variable calculus named after Lagrange, and this is one of the big ones. This is a technique that he kind of developed or at the very least popularized, and the core idea is to just set these gradients equal to each other because that represents when the contour line for one function is tangent to the contour line of another.

So, this is something that we can actually work with, and let's start working it out, right? Let's see what this translates to in formulas. I already have g written here, so let's go ahead and just evaluate what the gradient of g should be. That's the gradient of x squared plus y squared.

The way that we take our gradient is it’s going to be a vector whose components are all the partial derivatives. So the first component is the partial derivative with respect to x, so we treat x as a variable; y looks like a constant. The derivative is 2x. The second component, the partial derivative with respect to y—now we’re treating y as the variable and x is the constant—so the derivative looks like 2y. Okay, so that's the gradient of g.

Then, the gradient of f is going to look like the gradient of, let’s see, what is x—what is f? It's x squared times y. So, x squared times y— we do the same thing. First component, partial derivative with respect to x; x looks like a variable, so its derivative is 2 times x, and then that y looks like a constant when we're up here. But then, partial derivative with respect to y— that y looks like a variable, that x squared just looks like a constant sitting in front of it.

That’s what we get. Now, if we kind of work out this Lagrange multiplier expression using these two vectors, what we have written is that this vector 2xy x squared is proportional, with a proportionality constant lambda, to the gradient vector for g, which is 2x 2y.

If you want, you can think about this as two separate equations. I mean, right now it's one equation with vectors, but really, what this is saying is you've got two separate equations: two times x y is equal to lambda times 2x. Gotta change colors a lot here; lambda times 2x.

Gonna be a stickler for color— keep red all of the things associated with g. And then the second equation is that x squared is equal to lambda times 2y. This might seem like a problem because we have three unknowns: x, y, and this new lambda that we introduced— kind of shot ourselves in the foot by giving ourselves a new variable to deal with.

But we only have two equations, so in order to solve this, we're going to need three equations. The third equation is something that we've known the whole time; it's been part of the original problem— it's the constraint itself. x squared plus y squared, excuse me, equals 1. So, that third equation, x squared plus y squared equals 1.

These are the three equations that characterize our constrained optimization problem. The bottom one just tells you that we have to be on this unit circle here, so let me just highlight it. We have to be on this unit circle, and then these top two tell us what's necessary in order for our contour lines—the contour of f and the contour of g— to be perfectly tangent with each other.

In the next video, I’ll go ahead and solve this. At this point, it's pretty much just algebra to deal with, but it’s worth going through. Then, in the next couple of ones, I’ll talk about a way that you can encapsulate all three of these equations into one expression and also a little bit about the interpretation of this lambda that we introduced because it's not actually just a dummy variable. It has a pretty interesting meaning in a physical context once you’re actually dealing with a constrained optimization problem in practice. So, I'll see you next video.

Lagrange multipliers, using tangency to solve constrained optimization

More Articles