Expressing a quadratic form with a matrix
Hey guys, there's one more thing I need to talk about before I can describe the vectorized form for the quadratic approximation of multivariable functions, which is a mouthful to say. So let's say you have some kind of expression that looks like ( ax^2 ). I’m thinking of ( X ) as a variable, ( B \cdot XY ). ( Y ) is another variable plus ( C \cdot y^2 ). I’m thinking of ( A ), ( B ), and ( C ) as being constants, and ( X ) and ( Y ) as being variables.
Now, this kind of expression has a fancy name; it’s called a quadratic form. Quadratic form—and that always threw me off. I always kind of was like, “What? What does form mean?” You know, I know what a quadratic expression is and quadratic typically means something is squared or you have two variables, but why do they call it a form?
Basically, it just means that the only things in here are quadratic. You know, it’s not the case that you have like an ( X ) term sitting on its own or like a constant out here like two, and you’re adding all of those together. Instead, it’s just you have purely quadratic terms. But of course, mathematicians don’t want to call it just a purely quadratic expression; instead, they have to give a fancy name to things so that it seems more intimidating than it needs to be.
But anyway, so we have a quadratic form, and the question is: how can we express this in a vectorized sense? For analogy, let’s think about linear terms, where let’s say you have ( a \cdot X + B \cdot Y ), and I’ll throw another variable in there—another constant times another variable ( Z ). If you see something like this, where every variable is just being multiplied by a constant and then you add terms like that to each other, we can express this nicely with vectors.
You pile all of the constants into their own vector—a vector containing ( A ), ( B ), and ( C )—and you imagine the dot product between that and a vector that contains all of the variable components ( X ), ( Y ), and ( Z ). The convenience here is then you can have just a symbol like ( \mathbf{V} ), let’s say, which represents this whole constant vector, and then you can write down, take the dot product between that, and then have another symbol, maybe a bold-faced ( \mathbf{X} ), which represents a vector that contains all of the variables.
This way, your notation just kind of looks like a constant times a variable, just like in the single variable world when you have a constant number times a variable number. It’s kind of like taking a constant vector times a variable vector. The importance of writing things down like this is that ( \mathbf{V} ) could be a vector that contains not just three numbers but like 100 numbers, and then ( \mathbf{X} ) would have 100 corresponding variables. The notation doesn’t become any more complicated; it’s generalizable to higher dimensions.
So the question is, can we do something similar like that with our quadratic form? Because you can imagine, let’s say we started introducing the variable ( Z ). Then you would have to have some other term, you know, another constant times the ( XZ ) quadratic term, and then some other constant times the ( Z^2 ) quadratic term and another one for the ( YZ ) quadratic term. It would get out of hand, and as soon as you start introducing things like 100 variables, it would get seriously out of hand because there are a lot of different quadratic terms. So we want a nice way to express this, and I’m just going to kind of show you how we do it and then we’ll work it through to see why it makes sense.
So usually, instead of thinking of ( B \cdot XY ), we actually think of this as ( 2 \cdot \text{some constant} \cdot XY ). This, of course, doesn’t make a difference; you would just change what ( B ) represents, but you’ll see why it’s more convenient to write it this way in just a moment.
So the vectorized way to describe a quadratic form like this is to take a matrix—a ( 2 \times 2 ) matrix—since this is two dimensions, where ( A ) and ( C ) are on the diagonal, and then ( B ) is on the other diagonal. We always think of these as being symmetric matrices, so if you imagine kind of reflecting the whole matrix about this line, you’ll get the same numbers.
So it’s important that we have that kind of symmetry. Now, what you do is you multiply the vector—the variable vector that’s got ( XY ) on the right side of this matrix—and then you multiply it again, but you kind of—you turn it on its side. So instead of being a vertical vector, you transpose it to being a horizontal vector on the other side, and this is a little bit analogous to you know having two variables multiplied in. You have two vectors multiplied in but on either side.
And this is a good point, by the way. If you are uncomfortable with matrix multiplication, maybe pause the video, go find the videos about matrix multiplication, and kind of refresh or learn about that.
Moving forward, I’m just going to assume that it’s something you’re familiar with. So going about computing this—first, let’s tackle this right multiplication here. We have a matrix multiplied by a vector. Well, the first component that we get, we’re going to multiply the top row by each corresponding term in the vector, so it’ll be ( A \cdot X + B \cdot Y ).
For the bottom term, we’ll take the bottom row and multiply the corresponding terms, so that gives us ( B \cdot X + C \cdot Y ). So that’s what it looks like when we do that right multiplication. And of course, we’ve got to keep our transposed vector over there on the right-hand side.
Now we have this as just a ( 2 \times 1 ) vector now, and this is a ( 1 \times 2 ); you could think of it as a horizontal vector or you know ( 1 \times 2 ) matrix. But now, when we multiply these guys, you just kind of line up the corresponding terms. You’ll have ( X ) multiplied by that entire top expression, so ( X ) multiplied by ( A \cdot X + B \cdot Y ), and then we add that to the second term ( Y ) multiplied by the second term of this guy, which is ( B \cdot X + C \cdot Y ).
So ( Y ) multiplied by ( B \cdot X + C \cdot Y ), and all of these are numbers, so we can simplify it. Once we start distributing, the first term is ( X \cdot A \cdot X ) so that’s ( A \cdot X^2 ). Then the next term is ( X \cdot B \cdot Y ), so that’s ( B \cdot XY ).
Over here, we have ( Y \cdot B \cdot X ), so that’s the same thing as ( B \cdot XY ). So that’s kind of why we have—it’s convenient to write a two there because that naturally comes out of our expansion. The last term is ( Y \cdot C \cdot Y ), so that’s ( C \cdot Y^2 ).
So we get back the original quadratic form that we were shooting for. You know, ( A \cdot X^2 + 2B \cdot XY + C \cdot Y^2 ) that’s how this entire term expands. As you kind of work it through, you end up with the same quadratic expression.
Now, the convenience of this quadratic form being written with a matrix like this is that we can write this more abstractly. Instead of writing the whole matrix ( M ), you could just let a letter like ( M ) represent that whole matrix, and then take the vector that represents the variable, maybe like a bold-faced ( \mathbf{X} ), and you would multiply it on the right—then you transpose it and multiply it on the left.
So typically, you denote that by putting a little ( T ) as a superscript. So ( \mathbf{X}^T ) multiplied by the matrix from the left, and this expression—this is what a quadratic form looks like in vectorized form. The convenience is the same as it was in the linear case—just like ( \mathbf{V} ) could represent something that had 100 different numbers in it and ( \mathbf{X} ) would have 100 different constants, you could do something similar here where you can write that same expression, even if the matrix ( M ) is super huge.
Let’s just see what this would look like in a three-dimensional circumstance. So, actually, I’ll need more room, so I’ll go down even further. So we have ( \mathbf{X}^T ) multiplied by the matrix multiplied by ( \mathbf{X} )—bold-faced ( \mathbf{X} ). Let’s say instead this represented—you know, you have ( X ), then ( Y ), then ( Z ), our transposed vector, and then our matrix. Let’s say it was ( A, B, C, D, E, F ), and because it needs to be symmetric, whatever term is in this spot here needs to be the same as over here, kind of when you reflect it about that diagonal.
Similarly, ( C ) that’s going to be the same term here, and ( E ) would be over here. So there are only really six free terms that you have, but it fills up this entire matrix. And then on the right side, we would multiply that by ( X, Y, Z ).
Now, I won’t work it out in this video, but you can imagine actually multiplying this matrix by this vector and then multiplying in the corresponding vector that you get by this transposed vector, and you’ll get some kind of quadratic form with three variables. The point is you’ll get a very complicated one, but it’s very simple to express things like this.
So with that tool in hand, in the next video, I will talk about how we can use this notation to express the quadratic approximations for multivariable functions. See you then!