Proof for the meaning of Lagrange multipliers | Multivariable Calculus | Khan Academy
All right, so last video I showed you guys this really crazy fact. We have our usual setup here for this constrained optimization situation. We have a function we want to maximize, which I'm thinking of as revenues for some company; a constraint, which I'm thinking of as some kind of budget for that company.
As you know, if you've gotten to this video, one way to solve this constrained optimization problem is to define this function here, the Lagrangian, which involves taking this function that you're trying to maximize—in this case, the revenue—and subtracting a new variable, lambda, what's called the Lagrange multiplier, times this quantity, which is the budget function. You know, however much you spend as a function of your input parameters minus the budget itself, which you might think of as, you know, ten thousand dollars in our example.
So that's all the usual setup, and the crazy fact which I just declared is that when you set this gradient equal to zero and you find some solution—and there will be three variables in this solution: h star, s star, and lambda star—that this lambda star is not meaningless. It's not just a proportionality constant between these gradient vectors, but it will actually tell you how much the maximum possible revenue changes as a function of your budget.
The way to start writing all of that in formulas would be to make explicit the fact that if you consider this value, you know, the ten thousand dollars that is your budget, which I'm calling b a variable and not a constant, then you have to acknowledge that h star and s star are dependent on b, right? It's a very implicit relationship, something that's kind of hard to think about at first because as you change b, it changes what the Lagrangian is, which is going to change where its gradient equals zero, which changes what h star, s star, and lambda star are.
But in principle, they are some function of that budget of b, and the maximum possible revenue is whatever you get when you just plug in that solution to your function r. The claim I made that I just pulled out of the hat is that lambda star, the lambda value that comes packaged in with these two when you set the gradient of the Lagrangian equal to zero, equals the derivative of this maximum value thought of as a function of b.
Maybe I should emphasize that, you know, we're thinking of this maximum value as a function of b with respect to b. So that's kind of a mouthful; it takes a lot just to even phrase what's going on. But in the context of an economic example, it has a very clear precise meaning, which is if you increase your budget by a dollar, right? If you increase it from ten thousand dollars to ten thousand one dollars, you're wondering for that tiny change in budget, that tiny db, what is the ratio of the resulting change in revenue?
So in a sense, this lambda star tells you for every dollar that you increase the budget, how much can your revenue increase if you're always maximizing it. So why on earth is this true, right? This just seems like it comes out of nowhere. Well, there are a couple clever observations that go into proving this.
The first is to notice what happens if we evaluate this Lagrangian function itself at this critical point when you input h star, s star, and lambda star. Remember, the way that these guys are defined is that you look at all of the values where the gradient of the Lagrangian equals the zero vector. And then if you get multiple options—sometimes when you set the gradient equal to zero, you get multiple solutions—whichever one maximizes r that is h star, s star, lambda star.
So now I'm just asking if you plug that not into the gradient of the Lagrangian but to the Lagrangian itself, what do you get? Well, you're going to get, we just look at its definition up here, r evaluated at h star and s star, right? And we subtract off lambda star times b of h star, s star minus the constant that is your budget. You know, something you might think of as ten thousand dollars, whatever you set it'll be equal to.
Okay, Grant, you might say, why does this tell us anything? You're just plugging in stars instead of the usual variables, but the key is that if you plug in h star and s star, this value has to equal zero because h star and s star have to satisfy the constraint. Remember, one of the cool parts about this Lagrangian function as a whole is that when you take its partial derivative with respect to lambda, all that's left is this constraint function minus the constraint portion.
When you set the gradient of the Lagrangian equal to the zero vector, one component of that is to set the partial derivative with respect to lambda equal to zero. If you remember from the Lagrangian video, all that really boils down to is the fact that the constraint holds, right? Which would be your budget achieves ten thousand dollars when you plug in the appropriate h star and s star to this value. You're hitting this constrained amount of money that you can spend.
So by virtue of how h star and s star are defined, the fact that they are solutions to the constrained optimization problem means this whole portion goes to zero. So we can just kind of cancel all that out, and what's left? What's left here is the maximum possible revenue, right?
So evidently, when you evaluate the Lagrangian at this critical point at h star, s star, and lambda star, it equals m star; it equals the maximum possible value for the function you're trying to maximize. So ultimately, what we want is to understand how that maximum value changes when you consider it a function of the budget.
So evidently, what we can look for is to just ask how the Lagrangian changes as you consider it a function of the budget. Now, this is an interesting thing to observe because if we just look up at the definition of the Lagrangian—if you just look at this formula—if I told you to take the derivative of this with respect to little b, right? You know how much does this change with respect to little b—you would notice that this goes to zero; it doesn't have a little b. This would also go to zero and all you'd be left with would be negative lambda times negative b, and the derivative of that with respect to b would be lambda.
So you might say, oh yeah, of course, of course, this the derivative of that Lagrangian with respect to b, once we work it all out, the only term that was left there was the lambda, and that's compelling. But ultimately, it's not entirely right; that overlooks the fact that L is not actually defined as a function of b. When we defined the Lagrangian, we were considering b to be a constant.
So if you really want to consider this to be a function that involves b, the way we should write it—and I'll go ahead and erase this guy—the way we should write this Lagrangian is to say you're a function of h star which itself is dependent on b and s star which is also a function of b, right? As soon as we start considering b a variable and not a constant, we have to acknowledge that this critical point h star, s star, and lambda star depends on the value of b.
So likewise, that lambda star is also going to be a function of b, and then we can consider, as a fourth variable, right? So we're adding on yet another variable to this function—the value of b itself. The value of b itself here, so now when we want to know what is the value of the Lagrangian at the critical point h star, s star, lambda star as a function of b—alright?
So that can be kind of confusing. What you basically have is this function that only really depends on one value, right? It only depends on b, but it kind of goes through a four-variable function. Just to make it explicit, this would equal the value of r as a function of h star and s star, and each one of those is a function of little b, right?
So this term is saying, what's your revenue evaluated on the maximizing h and s for the given budget, and then you subtract off lambda star; oh here I should probably, I'm not going to have room here, am I? So what you subtract off, minus lambda star at b of h star and s star, but each of these guys is also a function of little b, little b minus little b, right?
So you have this large, kind of complicated multivariable function; it's defined in terms of h stars and s stars, which are themselves very implicit, right? We just say, by definition, these are whatever values make the gradient of L equal zero. So it’s very hard to think about what that means concretely, but all of this is really just dependent on the single value little b.
From here, if we want to evaluate the derivative of L, we want to evaluate the derivative of this Lagrangian with respect to little b, which is really the only thing it depends on. It's just via all of these other variables we use the multivariable chain rule. At this point, if you don't know the multivariable chain rule, I have a video on that. Definitely pause, go take a look, make sure that it all makes sense.
Right here, I'm just going to be assuming that you know what the multivariable chain rule is. So what it is, is we take the partial—we're going to look at the partial derivatives with respect to all four of these inputs. So we'll start with the partial derivative of L with respect to h star, with respect to h star, and we're going to multiply that by the derivative of h star with respect to b.
This might seem like a very hard thing to think about: like, how do we know how h star changes as b changes? But don’t worry about it; you'll see something magic happen in just a moment. And then we add in the partial derivative of L with respect to that second variable, s star, with respect to whatever the second variable is multiplied by the derivative of s star with respect to b.
You can see how you really need to know what the multivariable chain rule is, right? This would all seem kind of out of the blue. So what we now add in is the partial derivative of L with respect to that lambda star, with respect to lambda star, multiplied by the derivative of lambda star with respect to little b. And then finally, finally we take the partial derivative of this Lagrangian with respect to that little b, which we're now considering a variable in there, right?
We're no longer considering that b a constant, multiplied by, well, something kind of silly, the derivative of b with respect to itself. So now, if you're thinking that this is going to be horrifying to compute, I can understand where you're coming from. You have to know the derivative of lambda star with respect to b. You have to somehow intimately be familiar with how this lambda star changes as you change b.
And like I said, that's such an implicit relationship, right? We just said that lambda stars, by definition, whatever the solution to this gradient equation is. So somehow you're supposed to know how that changes when you slightly alter b over here. Well, you don't really have to worry about things because by definition, h star, s star, and lambda star are whatever values make the gradient of L equal to zero.
But if you think about that, what does it mean for the gradient of L to equal the zero vector? Well, what it means is that when you take its derivative with respect to that first variable, h star, it equals zero. When you take its derivative with respect to the second variable, that equals zero as well. And with respect to this third variable, that's going to equal zero. By definition, h star, s star, and lambda star are whatever values make it the case that when you plug them in, the partial derivative of the Lagrangian with respect to any one of those variables equals zero.
So we don't even have to worry about most of this equation. The only part that matters here is the partial derivative of L with respect to b that we’re now considering a variable multiplied by, well, what's db? What is the rate of change of a variable with respect to itself? It's 1. It is 1.
So all of this stuff, this entire multivariable chain rule boils down to a single, innocent-looking factor, which is the partial derivative of L with respect to little b. And now there's something very subtle here, right? Because this might seem obvious; I'm saying the derivative of L with respect to b equals the derivative of L with respect to b. But maybe I should give a different notation here, right?
Because here, when I'm taking the derivative, really I'm considering L as a single variable function, right? I'm considering not what happens as you can freely change all four of these variables—three of them are locked into place by b. So maybe I should really give that a different name. I should call that L star. L star is a single variable function, whereas this L is a multivariable function.
This is the function where you can freely change the values of h, s, and lambda, and b as you put them in. So if we kind of scroll up to look at its definition, which I've written all over, I guess here, let me actually rewrite its definition, right? I think that'll be useful. I'm going to rewrite that L. If I consider it as a four-variable function of h, s, lambda, and b, that what that equals is r evaluated h and s minus lambda multiplied by this constraint function b evaluated at h and s minus little b.
And this is now when I'm considering little b to be a variable. So this is the Lagrangian when you consider all four of these to be freely changing as you want, whereas the thing up here that I'm considering a single variable function has three of its inputs locked into place. So effectively, it's just a single variable function with respect to b.
So it's actually quite miraculous that the single-variable derivative of that L here, I should L star with respect to b, ends up being the same as the partial derivative of L. This L, where you're free to change all the variables—that these should be the same. Usually, in any usual circumstance, all of these other terms would have come into play somehow, but what's special here is that by the definition of this L star, the specific way in which these h star, s star, and lambda stars are locked into place happens to be one in which all of these partial derivatives go to zero.
So that's pretty subtle, and I think it's quite clever. And what it leaves us with is that we just have to evaluate this partial derivative, which is quite simple because we look down here and you say, what's the partial derivative of L with respect to b? Well, this r has no b's in it, so don't need to care about that. This term over here, its partial derivative is negative one, right? Just because there's a b here and that's multiplied by the constant lambda, so that all just equals lambda.
But if we're in the situation where lambda is locked into place as a function of little b, then we'd write lambda star as a function of little b, right? So if that feels a little notationally confusing, I'm right there with you. But the important part here, the important thing to remember, is that we just started considering b as a variable, right?
And we were looking at the h star, s star, and lambda star as they depended on that variable. We made the observation that the Lagrangian evaluated at that critical point equals the revenue evaluated at that critical point. The rest of the stuff just cancels out. So if you want to know the derivative of m star, the maximizing revenue, with respect to the budget, you know how much does your maximum revenue change for tiny changes in your budget?
That's the same as looking at the derivative of the Lagrangian with respect to the budget, so long as you're considering it only on the values h star star, lambda star that are critical points of the Lagrangian. And all of that really nicely boils down to just taking a simple partial derivative that gives us the relation we want.