Second partial derivative test intuition
Hey everyone! So, in the last video, I introduced this thing called the second partial derivative test. If you have some kind of multi-variable function, or really just a two-variable function, is what this applies to—something that's f of X, Y—and it outputs a number when you're looking for places where it has a local maximum or a local minimum.
The first step, as I talked about a few videos ago, is to find where the gradient equals 0. Sometimes, you'll hear these called critical points or stable points, but inputs where the gradient equals zero, and that's really just a way of compactly writing the fact that all the partial derivatives are equal to zero.
Now, when you find a point like this, in order to test whether it's a local maximum or a local minimum or a saddle point—without actually looking at the graph, because you don't always have the ability to do that at your disposal—the first step is to compute this long value. This is the thing I want to give intuition behind, where you take all three second partial derivatives: the second partial derivative with respect to X, the second partial derivative with respect to Y, and the mixed partial derivative, where first you do it with respect to X, then you do it with respect to Y.
You compute this value where you evaluate each one of those at your critical point. You multiply the two P or second partial derivatives and then subtract off the square of the mixed partial derivative. Again, I'll give intuition for that in a reason, but right now we just kind of take it over.
Alright, I guess you compute this number and if that value H—if that value H—is greater than zero, what it tells you is that you definitely have either a maximum or a minimum. So, you definitely have either a maximum or a minimum. To determine which one, you just have to look at the concavity in one direction.
So, you'll look at the second partial derivative with respect to X, for example, and if that was positive, that would tell you when you look in the X direction there's a positive concavity. If it was negative, it would mean a negative concavity.
A positive value for that second partial derivative would mean a local minimum, and a negative value would mean a local maximum. So, that's what it means if this value H turns out to be greater than zero.
If this value H turns out to be less than zero—strictly less than zero—then you definitely have a saddle point. A saddle point, which is neither a maximum nor minimum, it's kind of like there's disagreement in different directions over whether it should be a maximum or minimum.
If H equals zero, the test isn't good enough; you would have to do something else to figure it out.
So why does this work? Why does this seemingly random conglomeration of second partial derivatives give you a test that lets you determine what type of stable point you're looking at? Well, let's just understand each term individually.
The second partial derivative with respect to X—since you're taking both partial derivatives with respect to X—you're basically treating the entire multi-variable function as if X is the only variable and Y was just some constant. So, it's like you're only looking at movement in the X direction.
In terms of a graph, let's say we've got like this graph here; you can imagine slicing this with a plane that represents movement purely in the X direction. So, that'll be a constant Y value slice, and you take a look at the curve where this slice intersects your graph.
The one that I have pictured here looks like it has a positive concavity. So, this term right here kind of tells you X concavity. It's kind of like, "What is the concavity as far as the variable X is concerned?"
Then, symmetrically, this over here—when you take the partial derivative with respect to Y two times in a row—it's like you're ignoring the fact that X is even a variable, and you're looking purely at what movement in the Y direction looks like.
On the graph that I have pictured here, it also happens to give you kind of this positive concavity parabolic look, but the point is that the curve on the graph that results from looking at movement purely in the Y direction can be analyzed just looking at this partial derivative with respect to Y twice in a row.
So, that term kind of tells you Y concavity. Now, first of all, notice what happens if these disagree. If say X thought there should be positive concavity and Y thought there should be negative concavity—here I'll write that down—what that means. If X thinks there's positive concavity, we have here some kind of positive number that I'll just write as like a plus sign in parentheses.
Then this here Y concavity would be some kind of negative number, so I'll just put like a negative sign in parenthesis. That would mean this very first term would be a positive times a negative, and that first term would be negative.
Now, the thing that we're subtracting off—I’ll get to the intuition behind this mixed partial derivative term in a moment—but for now, you can notice that it's something squared, it's something that's always a positive term. So, you're always subtracting off a positive term, which means if this initial one is negative, the entire term H is definitely going to be negative.
So, it's going to put you over into this saddle point territory. This makes sense, because if the X direction and the Y direction disagree on concavity, that should be a saddle point.
The quintessential example here is when you have the function f of X, Y is equal to x squared minus y squared. The graph of that, by the way, would look like this, where let's see—so orienting myself here, moving in the X direction, you have kind of positive concavity, which corresponds to the positive coefficient in front of x squared.
In the Y direction, it looks like negative concavity, corresponding to that negative coefficient in front of the Y squared. So, when there's disagreement among these, the test ensures that we're going to have a saddle point.
Now, what about if they agree? Right? What if either it's the case that X thinks there should be positive concavity and Y thinks there should be positive concavity or they both agree that there should be, you know, negative concavity?
In either one of these cases when you multiply them together, they're positive. So, it's kind of like saying if you look purely in the X direction or purely in the Y direction, they agree that there should be, you know, definitely positive concavity or definitely negative concavity.
So, that entire first term is going to be positive. It's kind of like a clever way of capturing whether or not the X directions and Y directions agree. However, the reason that it's not enough is because, in either case, we're still subtracting off something that's always a positive term.
So, when you have this agreement between the X direction and Y direction, it then turns into a battle between this X-Y agreement and whatever's going on with this mixed partial derivative term.
The stronger that mixed partial derivative term, the bigger this negative number. So, the more it's pulling the entire value H towards being negative. So, let me see if I can give a little bit of reasoning behind why this mixed partial derivative term is trying to pull things towards being a saddle point.
Let's take a look at the very simple function f of X, Y. f of X, Y is equal to x times y. So, what that looks like graphically, f of X, Y equals x times y is this; it looks like a saddle point.
So, let's go ahead and look at its partial derivatives. The first partial derivatives—partial with respect to X and partial with respect to Y—when you do it with respect to X, X looks like a variable, and Y looks like a constant; it's just that constant Y.
When you do it with respect to Y, it goes the other way around; Y looks like the variable, and X looks like the constant, so the derivative is that constant X. Now, when you take the second partial derivatives, if you do it with respect to X twice in a row, you're differentiating this with respect to X, and that looks like a constant—so you get zero.
Similarly, if you do it with respect to Y twice in a row, you're doing this, and the derivative of X with respect to Y—X looks like a constant—goes to zero. But the important term, the one that we're getting an intuition about here, this mixed partial derivative, first with respect to X, then with respect to Y—well, we can view it in two ways.
Either you take the derivative of this expression with respect to Y, in which case it's 1, or you think of taking the derivative of this expression with respect to X, in which case it's also 1. So, it's kind of like this function is a very pure way to take a look at what this mixed partial derivative term looks like.
The higher the coefficient here—if I had put a coefficient of, you know, 3 here—that would mean that the mixed partial derivative would ultimately end up being 3. So, notice the reason that this looks like a saddle isn't because the X and Y directions disagree.
In fact, if you take a look at pure movement in the X direction, it just looks like a constant. The height of the graph along this plane, along this line here, is just a constant, which corresponds to the fact that the second partial derivative with respect to X is equal to 0.
Then likewise, if you cut it with a plane representing a constant X value—meaning movement purely in the Y direction—the height of the graph doesn't really change along there; it's constantly 0, which corresponds to the fact that the second partial derivative is 0.
The reason that the whole thing looks like a saddle is because, when you cut it with a diagonal plane here—a diagonal plane—it looks like it has negative concavity, but if you were to chop it, you know, in another direction, it would look like it has positive concavity.
So, in fact, this XY term is kind of like a way of capturing whether there's disagreement in the diagonal directions. One thing that might be surprising at first is that you only need one of these second partial derivatives in order to determine all of the information about the diagonal directions.
Because you could imagine, you know, maybe there's disagreement between movement along one certain vector and movement along another, and you would have to account for infinitely many directions and look at all of them. Yet, evidently, it's the case that you only really need to take a look at this mixed partial derivative term, you know, along with the original pure second partial derivatives with respect to X twice and with respect to Y twice.
But still, looking at only three different terms to take into account possible disagreement in infinitely many directions actually feels like quite the surprise. If you want the full rigorous justification for why this is the case, why this second partial derivative test works in kind of an airtight argument, I've put that in an article that you can find that kind of goes into the dirty details for those who are interested.
But if you just want the intuition, I think it's fine to think about the fact that this mixed partial derivative is telling you how much your function looks like the graph of f of XY equals x times y, which is the graph that captures all of the diagonal disagreement.
When you let that mixed partial derivative term kind of compete with the agreement between the X and Y directions, you know, if they agree very strongly, you have to subtract off a very strong amount in order to pull it back to being negative.
So, this battle back and forth—if it's pulled to be very negative, that will give you a saddle point. If it doesn't pull hard enough, then the agreement between the X and Y directions wins out, and it's either a local maximum or a local minimum.
So, hopefully, that sheds a little bit of light on why this term makes sense and why it's a reasonable way to combine the three different second partial derivatives available to you. Again, if you want the full details, I've written that up in an article form. I'll see you next video!