Vector form of multivariable quadratic approximation
Okay, so we are finally ready to express the quadratic approximation of a multivariable function in vector form. So, I have the whole thing written out here where ( f ) is the function that we are trying to approximate. ( X_0 ) and ( Y_K ) is the constant point about which we are approximating, and then this entire expression is the quadratic approximation, which I've talked about in past videos.
If it seems very complicated or absurd, or you're unfamiliar with it, let's just dissect it real quick. This over here is the constant term; this is just going to evaluate to a constant. Everything over here is the linear term because it just involves taking a variable multiplied by a constant. Then the remainder, every one of these components will have two variables multiplied into it. So, like ( X^2 ) comes up, and ( X \cdot Y ) and ( Y^2 ) comes up, so that's the quadratic term.
Now, to vectorize things, first of all, let's write down the input variable ( XY ) as a vector. Typically, we'll do that with a boldfaced ( \mathbf{X} ) to indicate that it's a vector, and its components are just going to be the single variables ( X ) and ( Y ), the non-boldfaced. So, this is the vector representing the variable input. Correspondingly, a boldfaced ( \mathbf{X} ) with a little subscript ( 0 ) (i.e., ( X_0 )) is going to be the constant input, the single point in space near which we are approximating.
When we write things like that, this constant term simply enough is going to look like evaluating your function at that boldfaced ( X_0 ). So, that's probably the easiest one to handle. Now, the linear term looks like a dot product, and if we kind of expand it out as the dot product, it looks like we're taking the partial derivative of ( f ) with respect to ( X ) and then the partial derivative with respect to ( Y ), and we're evaluating both of those at that boldfaced ( X_0 ) input.
Now, each one of those partial derivatives is multiplied by the variable minus the constant number. So, this looks like taking the dot product. Here, I'm going to erase the word "linear." We're taking it with ( X - X_0 ) and ( Y ) as ( Y_K ). This is just expressing the same linear term but as a dot product, but the convenience here is that this is totally the same thing as saying the gradient of ( f ).
That's the vector that contains all the partial derivatives evaluated at the special input ( X_0 ), and then we're taking the dot product between that and the variable vector boldfaced ( \mathbf{X} - X_0 ). Since when you do this component-wise, boldfaced ( \mathbf{X} ) as ( X_0 ), if we kind of think here, it'll be ( X ), the variable minus ( X_0 ), the constant, ( Y ), the variable minus ( Y_0 ), the constant, which is what we have up there.
So, this expression kind of vectorizes the whole linear term. And now, the beef here, the hard part, how are we going to vectorize this quadratic term? Now, that's what I was leading to in the last couple of videos where I talked about how you express a quadratic form like this with a matrix.
The way that you do it, I'll just kind of scroll down to give us some room. The way that you do it is we'll have a matrix whose components are all of these constants. It'll be this ( \frac{1}{2} ) times the second partial derivative evaluated there, and I'm just going to, for convenience sake, I'm going to just take ( \frac{1}{2} ) times the second partial derivative with respect to ( X ) and leave it as understood that we're evaluating it at this point.
On the other diagonal, you have ( \frac{1}{2} ) times the other kind of partial derivative with respect to ( Y ) two times in a row. Then we're going to multiply it by this constant here, but this term kind of gets broken apart into two different components. If you'll remember in the quadratic form video, it was always things where it was ( A ) and then ( 2B ) and ( C ) as your constants for the quadratic form.
So, if we're interpreting this as two times something, then it gets broken down, and on one corner, it shows up as ( f_{xy} ) and on the other one, kind of ( \frac{1}{2} f_{XY} ). So like both of these together are going to constitute the entire mixed partial derivative. The way that we express the quadratic form is we're going to multiply this by, well, the first component is whatever the thing is that's squared here. So it's going to be that ( X - X_0 ) and then the second component is whatever the other thing squared is, which in this case is ( Y - Y_K ).
Of course, we take that same vector but we put it in on the other side too. So, let me make a little bit of room; this is going to be wide. We're going to take that same vector and kind of put it on its side, so it'll be ( X - X_0 ) as the first component and then ( Y - Y_K ) as the second component, but it's written horizontally.
If you multiply out the entire matrix, it's going to give us the same expression that you have up here. If that seems unfamiliar, if that seems, you know, "How do you go from there to there?" check out the video on quadratic forms or you can check out the article where I'm talking about the quadratic approximation as a whole. I kind of go through the computation there.
Now this matrix right here is almost the Hessian matrix; this is why I made a video about the Hessian matrix. It's not quite because everything has a ( \frac{1}{2} ) multiplied into it, so I'm just going to kind of take that out, and we'll remember we have to multiply a ( \frac{1}{2} ) in at some point. But otherwise, it is the Hessian matrix which we denote with a kind of boldfaced ( \mathbf{H} ), and I emphasize that it's the Hessian of ( f ).
The Hessian is something you take of a function, and like I said, remember each of these terms we should be thinking of as evaluated on the special input point, evaluating it at that special, you know, boldfaced ( X_0 ) input point. I was just kind of too lazy to write it in each time, the ( X_0, Y_0, Y_0 ), all of that, but what we have then is we're multiplying it on the right by this whole vector, the variable vector boldfaced ( \mathbf{X} - \mathbf{X}_0 ).
That's what that entire vector is, and then we kind of have the same thing on the right, you know, boldfaced vector ( \mathbf{X} - X_0 ), except that we transpose it. We kind of put it on its side, and the way you denote that is you have a little ( T ) there for transpose. So this term captures all of the quadratic information that we need for the approximation.
So just to put it all together, if we go back up, when we put the constant term that we have, the linear term, and this quadratic form that we just found all together, what we get is that the quadratic approximation of ( f ), which is a function, we'll think of it as a vector input ( \mathbf{X} ), equals the function itself evaluated at, you know, whatever point we're approximating near plus the gradient of ( f ), which is kind of its vector analog of a derivative evaluated at that point.
So this is a constant vector dot product with the variable vector ( \mathbf{X} - \mathbf{X}_0 ), that whole thing, plus ( \frac{1}{2} ) the -- we'll just copy down this whole quadratic term up there, the variable minus the constant multiplied by the Hessian, which is kind of like an extension of the second derivative to multivariable functions.
We're evaluating that, let's see, we're evaluating it at the constant ( X_0 ), and then on the right side, we're multiplying it by the variable ( \mathbf{X} - \mathbf{X}_0 ). This is the quadratic approximation in vector form, and the important part is now it doesn't just have to be of a two-variable input.
You could imagine plugging in a three-variable input or a four-variable input, and all of these terms make sense. You know, you take the gradient of a four-variable function, you'll get a vector with four components. You take the Hessian of a four-variable function, you would get a ( 4 \times 4 ) matrix, and all of these terms make sense.
I think it's also prettier to write it this way because it looks a lot more like a Taylor expansion in the single-variable world. You have, you know, a constant term plus the value of a derivative times ( X ) as a constant plus ( \frac{1}{2} ), what's kind of like the second derivative term, was kind of like taking an ( X^2 ). But this is how it looks in the vector world. So, in that way, it's actually maybe a little bit more familiar than writing it out in the full, you know, component by component term where it's easy to kind of get lost in the weeds there.
So, um, full vectorized form of the quadratic approximation of a scalar-valued multivariable function — boy, is that a lot to say!