yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

The Hessian matrix | Multivariable calculus | Khan Academy


5m read
·Nov 11, 2024

Hey guys, so before talking about the vector form for the quadratic approximation of multivariable functions, I've got to introduce this thing called the Hessen Matrix. The Hessen Matrix, and essentially what this is, it's just a way to package all the information of the second derivatives of a function.

So let's say you have some kind of multivariable function, like, I don't know, like the example we had in the last video, e to the x halves multiplied s of y. So some kind of multivariable function. What the Hessen Matrix is—and it's often denoted with an H or kind of a bold-faced h—is it's a matrix, incidentally enough, that contains all the second partial derivatives of f.

So the first component is going to be the partial derivative of f with respect to x, kind of twice in a row. Everything in this first column, it's kind of like you first do it with respect to x, because the next part is the second derivative, where first you do it with respect to x and then you do it with respect to y. So that's kind of the first column of the matrix.

And then up here, it's the partial derivative where first you do it with respect to y and then you do it with respect to x. And then over here, it's where you do it with respect to y both times in a row—so partial with respect to y both times in a row.

So let's go ahead and actually compute this and think about what this would look like in the case of our specific function here. In order to get all the second partial derivatives, we first should just kind of keep a record of the first partial derivatives.

So the partial derivative of f with respect to x, the only place x shows up is in this e to the x halves, kind of bringing down that 1/2 e to the x halves, and s of y just looks like a constant as far as x is concerned, s of y. Then the partial derivative with respect to y, partial derivative of f with respect to y—now e to the x halves looks like a constant, and it's being multiplied by something that has a y in it.

e to the x halves, the derivative of s of y since we're doing it with respect to y is cosine of y. So these terms won't be included in the Hess itself, but we're kind of just keeping a record of them because now, when we go in to fill in the matrix, this upper left component, we're taking the second partial derivative where we do it with respect to x, then x again.

So up here, when we did it with respect to x, if we did it with respect to x again, we kind of bring down another half, so that becomes 1/4 e to the x halves and that s of y just still looks like a constant, s of y. Then this mixed partial derivative, where we do it with respect to x then y, so we did it with respect to x here.

When we differentiate this with respect to y, the 2 e to the x halves just looks like a constant, but then the derivative of s of y ends up as cosine of y. And then up here, it's going to be the same thing, but let's kind of see how when you do it in the other direction, when you do it first with respect to y, then x.

So over here, we did it first with respect to y; if we took this derivative with respect to x, the half would come down, so that would be 1/2 e to the x halves multiplied by cosine of y because that just looks like a constant since we're doing it with respect to x the second time, so that would be cosine of y.

And it shouldn't feel like a surprise that both of these terms turn out to be the same, with most functions that's the case. Technically not all functions; you can come up with some crazy things where this won't be symmetric, where you'll have different terms than the diagonal, but for the most part, those you can kind of expect to be the same.

And then this last term here, where we do it with respect to y twice, we now think of taking the derivative of this whole term with respect to y. That e to the x halves looks like a constant, and the derivative of cosine is negative sine of y—so this whole thing, a matrix each of whose components is a multivariable function, is the Hessian.

This is the Hessian of f, and sometimes people will write it as Hessian of f, kind of specifying what function it's of. You could think of this, I mean, you could think of it as a matrix-valued function, which feels kind of weird, but you know, you plug in two different values x and y and you'll get a matrix.

So it's this matrix-valued function, and the nice thing about writing it like this is that you could actually extend it so that rather than just for functions that have two variables—let's say you had a function, I'll kind of like clear this up. Let's say you had a function that had three variables or four variables, or kind of any number.

So let's say it was, you know, a function of x, y, and z. Then you can follow this pattern, and following down the first column here, the next term that you would get would be the second partial derivative of f where first you do it with respect to x and then you do it with respect to z.

And then over here, it would be the second partial derivative of f where first you did it with respect to y and then you do it with respect to z. I'll clear up even more room here because you'd have another column where you'd have the second partial derivative where this time everything—you know, first you do it with respect to z and then with respect to x, and then over here you'd have the second partial derivative where first you do it with respect to z and then with respect to y.

And then as the very last component, you'd have the second partial derivative where first you do it with respect to, well I guess you do it with respect to z twice. So this whole thing, this 3x3 matrix would be the Hess of a three-variable function, and you can see how you could extend this pattern, where if it was a four-variable function, you'd get a 4x4 matrix of all of the possible second partial derivatives.

And if it was a 100-variable function, you would have a 100 by 100 matrix. So the nice thing about having this is then we can talk about that by just referencing the symbol. And we'll see in the next video how this makes it very nice to express, for example, the quadratic approximation of any kind of multivariable function—not just a two-variable function—and the symbols don't get way out of hand because you don't have to reference each one of these individual components.

You can just reference the matrix as a whole and start doing matrix operations. And I will see you in that next video.

More Articles

View All
Journeying With Bats Across Mexico | Perpetual Planet: Mexico
I just learned how to hold a bat correctly. This is what they do to learn more about the different species that live in this region. They’re nervous. We’re told to not hold them for very long. It’s easy to forget that the nocturnal world is teeming with w…
How to sell private jets to billionaires
So Steve, tell me, what’s the biggest lesson you’ve ever learned in business? Couple things. One, no doesn’t necessarily always mean no. Never give up, never give up no matter what. And you have to set a target in order to reach one. How old were you wh…
Cooling Cities By Throwing Shade | Podcast | Overheard at National Geographic
It’s a hot breezy summer day in Los Angeles. I’m just recording the sounds of my neighborhood here in the Huntington Park neighborhood. You might see a woman named Eileen Garcia driving from tree to tree, trying to give them some much-needed relief from t…
Warren Buffett's Advice for Investors for 2024
I don’t know if you guys have noticed, but Warren Buffett has kept very quiet over the past six months. No media interviews, very few changes to his portfolio. The guy has been keeping well out of the spotlight. So much so that when his longtime business …
Bitcoin nears $10k: Why I’m NOT investing in Bitcoin (The Truth)
What’s up you guys? It’s Graham here. So, as you’re watching this right now, just know I am safe and sound in a bunker somewhere in the middle of nowhere, safe from all of the inevitable dislikes and extreme comments I’m gonna get on this video. Because e…
Finding zeros of polynomials (2 of 2) | Mathematics III | High School Math | Khan Academy
[Voiceover] In the last video, we factored this polynomial in order to find the real roots. We factored it by grouping, which essentially means doing the distributive property in reverse twice. I mentioned that there’s two ways you could do it. You could …