Second partial derivative test
In the last video, we took a look at this function ( f(x, y) = x^4 - 4x^2 + y^2 ), which has the graph that you're looking at on the left. We looked for all of the points where the gradient is equal to zero, which basically means both partial derivatives are equal to 0. We solved that and found that there were three different points: the origin (0, 0), and then ( (\sqrt{2}, 0) ) and ( (-\sqrt{2}, 0) ), which corresponds to this origin here, which is a saddle point, and then these two local minima.
It seemed like we had a reasonable explanation for why this is a saddle point and why both of those are local minima because we took the second partial derivative with respect to ( x ). I was kind of all over the board here. The second partial derivative with respect to ( x ) found that when you evaluate it at ( x = 0 ), you get a negative number, kind of indicating a negative concavity, so it should look like a maximum.
When you do the same with the second partial derivative of ( y )—man, I always do this, I always leave out the squared on that lower term there—okay, so the second partial derivative with respect to ( y ), you get 2 as a constant, a positive, and that kind of indicates that it looks like a minimum to ( y ). So that's why, you know, this origin point looks like a saddle point; because the ( x ) direction and ( y ) direction disagree.
When you do this with the other points, they kind of both agree that it should look like a minimum. But I said that's not enough; I said that you need to take into account the mixed partial derivative term. To see why that's true, let me go ahead and pull up another example for you here.
So the graph of the function that you're looking at right now clearly has a saddle point at the origin, that we can see visually. But when we get the equation for this function, the equation is ( f(x, y) = x^2 + y^2 - 4xy ). Now, let's go ahead and analyze the partial differential information of this function. We'll just take its partial derivatives.
So the partial with respect to ( x ) is equal to—so we've got when we differentiate this term, we get ( 2x ); the ( 4xy ) looks like a constant, we do nothing, and then this last term looks like negative ( 4y ) because ( y ) looks like a constant—so negative ( 4y ).
When we do the partial derivative with respect to ( y ), very similarly, we're going to get ( 2y ) when we differentiate ( y^2 ) to ( y ), and now we subtract ( -4x ) because ( x ) looks like the constant and ( y ) looks like the variable—minus ( 4x ).
Now, when we plug in ( x ) and ( y ) equal to 0— you know, we plug in the origin point to both of these functions—we see that they're equal to zero because ( x ) is zero, ( y ) is zero; this guy goes to zero. Similarly, over here, that goes to zero, so we will indeed get a flat tangent plane at the origin.
But now let's take a look at the second partial derivatives. If we do the second partial derivatives purely in terms of ( x ) and ( y )—so if we take the pure second partial derivative of ( f ) with respect to ( x ) squared, what we get: we look at this expression, we differentiate it with respect to ( x ), and we get a constant positive ( 2 ) because that ( y ) does nothing for us.
Similarly, when we take the second partial derivative with respect to ( y )—always forget that squared on the bottom—we also get a constant positive ( 2 ) because this ( x ) does nothing for us when we take the derivative with respect to ( y ). So we get a constant positive ( 2 ).
This would suggest that you know there's positive concavity in the ( x ) direction, there's positive concavity in the ( y ) direction; so it would suggest that you know it looks like an upward smiley face from all directions and it should be a local minimum.
But when we look at the graph, this isn't true; it's not a local minimum, it's a saddle point. So what this tells us is that these two second partial derivatives aren't enough; we need more information. What it kind of comes down to is that this last term here, ( -4xy )—oh actually I think I made a mistake, I think I meant to make this ( +4xy ).
So let's see: ( +4xy ), which would influence these; it actually won't make a difference, it still gives a saddle point. But anyway, we've got this ( +4xy ) term that evidently makes a difference, that evidently kind of influences whether this is a local minimum or maximum.
Just to give a loose intuition for what's going on here, if instead of writing 4 here, I wrote—I'm going to write the variable ( p ), okay? And ( p ) is just going to be some number, and I'm going to move that variable around. I'm going to basically let it range from 0 up to 4.
So right now, as you're looking at it, it's sitting at 4. I'm going to pull it back and kind of let it range back to zero just to see how this influences the graph. We see that once we pull it back to zero, we get something where it kind of reflects what you would expect, where from the ( x ) direction, it's a positive smiley face; from the ( y ) direction, it's also a positive smiley face and everything's good: it looks like a local minimum, and it is.
Even as you let ( p ) range more and more—and here ( p ) is around—I'm guessing right now it's around 1.5—you get something that's still a local minimum; it's a positive concavity in all directions. But there’s a critical point here where as you're moving ( p ), at some point it kind of passes over and turns it into a saddle point.
Again, this is entirely the coefficient in front of the ( xy ) term; it has nothing to do with the ( x^2 ) or the ( y^2 ). So at some point, it kind of passes over, and from that point on, everything is going to be a saddle point. In a moment, it'll become clear that that critical point happens when ( p ) is equal to 2.
So right here, it's going to be when ( p ) equals 2; it kind of passes from making things a local minimum to a saddle point. Let me show you the test which will tell us why this is true. The full reasoning behind this test is something that I'll get to in later videos, but right now, I just want to kind of have it on the table and teach you what it is and how to use it.
So this is called the second partial derivative test. Second partial derivative test—I’ll just write derive test since I'm a slow writer. Basically, what it says is if you found a point where the gradient of your function at this point, and I'll write it, kind of ( (x_0, y_0) ) is our point, if you found where it equals zero, then calculate the following value.
You'll take the second partial derivative with respect to ( x ) twice—so here I'm just using that subscript notation, which is completely the same as saying, you know, second partial derivative with respect to ( x ) twice—just different choice in notation. Then you evaluate it at this point ( (x_0, y_0) ).
Then you multiply that by the second partial derivative with respect to ( y ) evaluated at that same point ( (y_0) ). Then you subtract off the mixed partial derivative, the one where first you do it with respect to ( x ), then with respect to ( y ), or in the other order; it doesn't really matter as long as you take it with respect to both variables—you subtract off that guy squared.
So what you do is compute this entire value and it's going to give you some kind of number. Let's give it a name; let's name it ( h ). If it's the case that ( h ) is greater than zero, then you have either a max or a min, you're not sure which one yet—and some max or min—and you can tell whether it's a maximum or a minimum basically by looking at one of these partial derivatives with respect to ( x ) twice or with respect to ( y ) twice and kind of getting a feel for the concavity there.
Right? If this was positive, it would indicate kind of a smiley face concavity and it would be a local minimum. The fact that this entire value ( h ) is greater than zero is what you need to tell you that you can just do that; you can look at the concavity with respect to one of those guys and that’ll tell you the information you need about the entire graph.
But if ( h ) is less than zero—if ( h ) is less than zero—then you definitely have a saddle point; saddle point. If ( h ) is purely equal to zero—if you get that unlucky case—then you don’t know; then the second partial derivative test isn’t enough to determine. But almost all cases, you'll find that either it’s purely greater than zero or purely less than zero.
So as an example, let's see what that looks like in the case of the specific function we started with, where ( p ) is some constant that I was letting kind of range from 0 to 4 when I was animating it here; but you should just think of ( p ) as being some number.
Well, in that case, this value ( h ) that we plug in—and let's say we're plugging it at the origin, right? We're analyzing at the origin. Well, we've already calculated the partial derivative with respect to ( x ) twice in a row and ( y ) twice in a row, and both of those, when we computed those, were just constants—2. They were equal to 2 everywhere, and in particular, they’re equal to 2 at the origin.
So we can go ahead and just plug in those, and we see that this is ( 2 \times 2 ), and then now we need to subtract off the mixed partial derivative squared. So if we go ahead and compute that, where we take the derivative with respect to ( x ) and then ( y ), or ( y ) and then ( x )—let's say we started with the partial derivative with respect to ( x )—when we take this derivative with respect to ( y ), we're going to get this constant term that’s sitting in front of the ( y ), but really it's whatever this constant ( p ) happened to equal.
You might be able to see that just looking at this function, when you take the mixed partial derivative, it's going to be the coefficient in front of the ( xy ) term because it's kind of like first you do the derivative with respect to ( x ), and the ( x ) goes away, and then with respect to ( y ), and that ( y ) goes away, and you're just left with a constant.
So what you end up getting here in the second partial derivative test when we take that value, which is ( p )—which might equal 4 or 0 or whatever we happen to have it as—and we square that, we square that—that's going to be the value that we analyze.
So in the case where ( p ) was equal to zero, right, if we go over here and we scale so that ( p ) is completely equal to zero, then our entire value ( h ) would equal 4. Because ( h ) is positive, it’s definitely a maximum or a minimum, and then by analyzing one of those second partial derivatives with respect to ( x ) or ( y ) and seeing that it’s positive concavity, we would see—oh, it’s definitely a local minimum because positive concavity gives local minimum.
But in the other case where let’s say we let ( p ) range such that ( p ) is all the way equal to four—in formulas, what that means for us when we let ( p ) equal four—is we’re taking ( 2 \times 2 - 4^2 ). So we're taking ( 2 \times 2 - 16 ).
What that would imply—sorry about getting kind of scrunched on the board here—is that ( h ) is equal to let’s see ( 4 - 16 ); negative 12.
So I'm just going to erase this to kind of clear up some room. So when ( p = 4 ), this is negative 12. In fact, this kind of explains the crossover point for when it goes from being local minimum to a saddle point; it's going to be at that point where this entire expression is equal to zero, and you can see that happens when ( p = 2 ).
So over here, the crossover point when it kind of goes from being a local minimum to a saddle point is that ( p = 2 ), and when ( p ) perfectly equals 2—let's see, so about here—the second partial derivative test isn’t going to be enough to tell us anything. It can’t tell us it’s definitely a max, and it can’t tell us that it’s definitely a saddle point.
In this particular case, that corresponds to the fact that the graph is perfectly flat in one direction and a minimum in another direction. In other cases, it might mean something different, and I'll probably make a video just about that special case when this whole value is equal to zero.
But for now, all that I want to emphasize is what this test is, where you take all three second partial derivatives and you kind of multiply together the two pure second partial derivatives where you do ( x ) and then ( x ) and the one where you do ( y ) and ( y ); you multiply those together, and then you subtract off that mixed partial derivative squared.
In the next video, I'll try to give a little bit more intuition for, you know, where this whole formula comes from, why it's not completely random, and why taking this and analyzing whether it's greater than 0 or less than zero is a reasonable thing to do for analyzing whether a point that you're looking at is a local minimum or a local maximum or a saddle point. See you then.