The Hessian matrix | Multivariable calculus | Khan Academy
Hey guys, so before talking about the vector form for the quadratic approximation of multivariable functions, I've got to introduce this thing called the Hessen Matrix. The Hessen Matrix, and essentially what this is, it's just a way to package all the information of the second derivatives of a function.
So let's say you have some kind of multivariable function, like, I don't know, like the example we had in the last video, e to the x halves multiplied s of y. So some kind of multivariable function. What the Hessen Matrix is—and it's often denoted with an H or kind of a bold-faced h—is it's a matrix, incidentally enough, that contains all the second partial derivatives of f.
So the first component is going to be the partial derivative of f with respect to x, kind of twice in a row. Everything in this first column, it's kind of like you first do it with respect to x, because the next part is the second derivative, where first you do it with respect to x and then you do it with respect to y. So that's kind of the first column of the matrix.
And then up here, it's the partial derivative where first you do it with respect to y and then you do it with respect to x. And then over here, it's where you do it with respect to y both times in a row—so partial with respect to y both times in a row.
So let's go ahead and actually compute this and think about what this would look like in the case of our specific function here. In order to get all the second partial derivatives, we first should just kind of keep a record of the first partial derivatives.
So the partial derivative of f with respect to x, the only place x shows up is in this e to the x halves, kind of bringing down that 1/2 e to the x halves, and s of y just looks like a constant as far as x is concerned, s of y. Then the partial derivative with respect to y, partial derivative of f with respect to y—now e to the x halves looks like a constant, and it's being multiplied by something that has a y in it.
e to the x halves, the derivative of s of y since we're doing it with respect to y is cosine of y. So these terms won't be included in the Hess itself, but we're kind of just keeping a record of them because now, when we go in to fill in the matrix, this upper left component, we're taking the second partial derivative where we do it with respect to x, then x again.
So up here, when we did it with respect to x, if we did it with respect to x again, we kind of bring down another half, so that becomes 1/4 e to the x halves and that s of y just still looks like a constant, s of y. Then this mixed partial derivative, where we do it with respect to x then y, so we did it with respect to x here.
When we differentiate this with respect to y, the 2 e to the x halves just looks like a constant, but then the derivative of s of y ends up as cosine of y. And then up here, it's going to be the same thing, but let's kind of see how when you do it in the other direction, when you do it first with respect to y, then x.
So over here, we did it first with respect to y; if we took this derivative with respect to x, the half would come down, so that would be 1/2 e to the x halves multiplied by cosine of y because that just looks like a constant since we're doing it with respect to x the second time, so that would be cosine of y.
And it shouldn't feel like a surprise that both of these terms turn out to be the same, with most functions that's the case. Technically not all functions; you can come up with some crazy things where this won't be symmetric, where you'll have different terms than the diagonal, but for the most part, those you can kind of expect to be the same.
And then this last term here, where we do it with respect to y twice, we now think of taking the derivative of this whole term with respect to y. That e to the x halves looks like a constant, and the derivative of cosine is negative sine of y—so this whole thing, a matrix each of whose components is a multivariable function, is the Hessian.
This is the Hessian of f, and sometimes people will write it as Hessian of f, kind of specifying what function it's of. You could think of this, I mean, you could think of it as a matrix-valued function, which feels kind of weird, but you know, you plug in two different values x and y and you'll get a matrix.
So it's this matrix-valued function, and the nice thing about writing it like this is that you could actually extend it so that rather than just for functions that have two variables—let's say you had a function, I'll kind of like clear this up. Let's say you had a function that had three variables or four variables, or kind of any number.
So let's say it was, you know, a function of x, y, and z. Then you can follow this pattern, and following down the first column here, the next term that you would get would be the second partial derivative of f where first you do it with respect to x and then you do it with respect to z.
And then over here, it would be the second partial derivative of f where first you did it with respect to y and then you do it with respect to z. I'll clear up even more room here because you'd have another column where you'd have the second partial derivative where this time everything—you know, first you do it with respect to z and then with respect to x, and then over here you'd have the second partial derivative where first you do it with respect to z and then with respect to y.
And then as the very last component, you'd have the second partial derivative where first you do it with respect to, well I guess you do it with respect to z twice. So this whole thing, this 3x3 matrix would be the Hess of a three-variable function, and you can see how you could extend this pattern, where if it was a four-variable function, you'd get a 4x4 matrix of all of the possible second partial derivatives.
And if it was a 100-variable function, you would have a 100 by 100 matrix. So the nice thing about having this is then we can talk about that by just referencing the symbol. And we'll see in the next video how this makes it very nice to express, for example, the quadratic approximation of any kind of multivariable function—not just a two-variable function—and the symbols don't get way out of hand because you don't have to reference each one of these individual components.
You can just reference the matrix as a whole and start doing matrix operations. And I will see you in that next video.