yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Vector form of multivariable quadratic approximation


6m read
·Nov 11, 2024

Okay, so we are finally ready to express the quadratic approximation of a multivariable function in vector form. So, I have the whole thing written out here where ( f ) is the function that we are trying to approximate. ( X_0 ) and ( Y_K ) is the constant point about which we are approximating, and then this entire expression is the quadratic approximation, which I've talked about in past videos.

If it seems very complicated or absurd, or you're unfamiliar with it, let's just dissect it real quick. This over here is the constant term; this is just going to evaluate to a constant. Everything over here is the linear term because it just involves taking a variable multiplied by a constant. Then the remainder, every one of these components will have two variables multiplied into it. So, like ( X^2 ) comes up, and ( X \cdot Y ) and ( Y^2 ) comes up, so that's the quadratic term.

Now, to vectorize things, first of all, let's write down the input variable ( XY ) as a vector. Typically, we'll do that with a boldfaced ( \mathbf{X} ) to indicate that it's a vector, and its components are just going to be the single variables ( X ) and ( Y ), the non-boldfaced. So, this is the vector representing the variable input. Correspondingly, a boldfaced ( \mathbf{X} ) with a little subscript ( 0 ) (i.e., ( X_0 )) is going to be the constant input, the single point in space near which we are approximating.

When we write things like that, this constant term simply enough is going to look like evaluating your function at that boldfaced ( X_0 ). So, that's probably the easiest one to handle. Now, the linear term looks like a dot product, and if we kind of expand it out as the dot product, it looks like we're taking the partial derivative of ( f ) with respect to ( X ) and then the partial derivative with respect to ( Y ), and we're evaluating both of those at that boldfaced ( X_0 ) input.

Now, each one of those partial derivatives is multiplied by the variable minus the constant number. So, this looks like taking the dot product. Here, I'm going to erase the word "linear." We're taking it with ( X - X_0 ) and ( Y ) as ( Y_K ). This is just expressing the same linear term but as a dot product, but the convenience here is that this is totally the same thing as saying the gradient of ( f ).

That's the vector that contains all the partial derivatives evaluated at the special input ( X_0 ), and then we're taking the dot product between that and the variable vector boldfaced ( \mathbf{X} - X_0 ). Since when you do this component-wise, boldfaced ( \mathbf{X} ) as ( X_0 ), if we kind of think here, it'll be ( X ), the variable minus ( X_0 ), the constant, ( Y ), the variable minus ( Y_0 ), the constant, which is what we have up there.

So, this expression kind of vectorizes the whole linear term. And now, the beef here, the hard part, how are we going to vectorize this quadratic term? Now, that's what I was leading to in the last couple of videos where I talked about how you express a quadratic form like this with a matrix.

The way that you do it, I'll just kind of scroll down to give us some room. The way that you do it is we'll have a matrix whose components are all of these constants. It'll be this ( \frac{1}{2} ) times the second partial derivative evaluated there, and I'm just going to, for convenience sake, I'm going to just take ( \frac{1}{2} ) times the second partial derivative with respect to ( X ) and leave it as understood that we're evaluating it at this point.

On the other diagonal, you have ( \frac{1}{2} ) times the other kind of partial derivative with respect to ( Y ) two times in a row. Then we're going to multiply it by this constant here, but this term kind of gets broken apart into two different components. If you'll remember in the quadratic form video, it was always things where it was ( A ) and then ( 2B ) and ( C ) as your constants for the quadratic form.

So, if we're interpreting this as two times something, then it gets broken down, and on one corner, it shows up as ( f_{xy} ) and on the other one, kind of ( \frac{1}{2} f_{XY} ). So like both of these together are going to constitute the entire mixed partial derivative. The way that we express the quadratic form is we're going to multiply this by, well, the first component is whatever the thing is that's squared here. So it's going to be that ( X - X_0 ) and then the second component is whatever the other thing squared is, which in this case is ( Y - Y_K ).

Of course, we take that same vector but we put it in on the other side too. So, let me make a little bit of room; this is going to be wide. We're going to take that same vector and kind of put it on its side, so it'll be ( X - X_0 ) as the first component and then ( Y - Y_K ) as the second component, but it's written horizontally.

If you multiply out the entire matrix, it's going to give us the same expression that you have up here. If that seems unfamiliar, if that seems, you know, "How do you go from there to there?" check out the video on quadratic forms or you can check out the article where I'm talking about the quadratic approximation as a whole. I kind of go through the computation there.

Now this matrix right here is almost the Hessian matrix; this is why I made a video about the Hessian matrix. It's not quite because everything has a ( \frac{1}{2} ) multiplied into it, so I'm just going to kind of take that out, and we'll remember we have to multiply a ( \frac{1}{2} ) in at some point. But otherwise, it is the Hessian matrix which we denote with a kind of boldfaced ( \mathbf{H} ), and I emphasize that it's the Hessian of ( f ).

The Hessian is something you take of a function, and like I said, remember each of these terms we should be thinking of as evaluated on the special input point, evaluating it at that special, you know, boldfaced ( X_0 ) input point. I was just kind of too lazy to write it in each time, the ( X_0, Y_0, Y_0 ), all of that, but what we have then is we're multiplying it on the right by this whole vector, the variable vector boldfaced ( \mathbf{X} - \mathbf{X}_0 ).

That's what that entire vector is, and then we kind of have the same thing on the right, you know, boldfaced vector ( \mathbf{X} - X_0 ), except that we transpose it. We kind of put it on its side, and the way you denote that is you have a little ( T ) there for transpose. So this term captures all of the quadratic information that we need for the approximation.

So just to put it all together, if we go back up, when we put the constant term that we have, the linear term, and this quadratic form that we just found all together, what we get is that the quadratic approximation of ( f ), which is a function, we'll think of it as a vector input ( \mathbf{X} ), equals the function itself evaluated at, you know, whatever point we're approximating near plus the gradient of ( f ), which is kind of its vector analog of a derivative evaluated at that point.

So this is a constant vector dot product with the variable vector ( \mathbf{X} - \mathbf{X}_0 ), that whole thing, plus ( \frac{1}{2} ) the -- we'll just copy down this whole quadratic term up there, the variable minus the constant multiplied by the Hessian, which is kind of like an extension of the second derivative to multivariable functions.

We're evaluating that, let's see, we're evaluating it at the constant ( X_0 ), and then on the right side, we're multiplying it by the variable ( \mathbf{X} - \mathbf{X}_0 ). This is the quadratic approximation in vector form, and the important part is now it doesn't just have to be of a two-variable input.

You could imagine plugging in a three-variable input or a four-variable input, and all of these terms make sense. You know, you take the gradient of a four-variable function, you'll get a vector with four components. You take the Hessian of a four-variable function, you would get a ( 4 \times 4 ) matrix, and all of these terms make sense.

I think it's also prettier to write it this way because it looks a lot more like a Taylor expansion in the single-variable world. You have, you know, a constant term plus the value of a derivative times ( X ) as a constant plus ( \frac{1}{2} ), what's kind of like the second derivative term, was kind of like taking an ( X^2 ). But this is how it looks in the vector world. So, in that way, it's actually maybe a little bit more familiar than writing it out in the full, you know, component by component term where it's easy to kind of get lost in the weeds there.

So, um, full vectorized form of the quadratic approximation of a scalar-valued multivariable function — boy, is that a lot to say!

More Articles

View All
Quick and Easy Voting for Normal People
Hello Internet! You know I love me some voting videos. These, however, are mostly about how organizations can improve their elections. But normal people need better voting too. Say a group of you are trying to decide what to have for dinner. There are th…
Examples thinking about multiplying even and odd numbers
We are told Liam multiplies two numbers and gets an even product. What could be true about the numbers Liam multiplied? It says choose two answers, so pause this video and see if you can figure out which two of these could be true. All right, now let’s d…
Carolynn Levy and Kirsty Nathoo - Startup Investor School Day 1
All right, this next session is actually one of my very favorites because there’s so much mystery in the fundamentals of how you actually do a startup investment, what it really means, and how it works. There are no two people who are greater experts in t…
Office Hours with Kevin & Qasar
All right, hi everyone, my name is Kevin Hail. I’m a partner at Y Combinator. Um, I went through YC myself back in 2006. I co-founded a company called WFU Online Form Builder. Um, ran that company for about 5 years and it was acquired by SurveyMonkey back…
WARNING: The Index Fund Bubble
What’s up you guys, it’s Graham here. So we got to sit down today and have the talk. And no, this is not the talk where I go and ask you to hit the like button, although we’ll have that one a little later. Instead, we’re gonna be having the talk about the…
Can You Picture That? This Photographer Can and Does | Podcast | Overheard at National Geographic
Foreign [Music] November 2nd, and I am getting into my Tyvek suit. So, because bats carry diseases that we don’t know about, we have to wear PPE. And we all know about PPE because of COVID. So that’s Mark Thiessen. He’s a staff photographer for National G…