The Jacobian matrix

5m read

·Nov 11, 2024

In the last video, we were looking at this particular function. It's a very non-linear function, and we were picturing it as a transformation that takes every point (x, y) in space to the point (x + sin(y), y + sin(x)).

Moreover, we zoomed in on a specific point, and let me actually write down what point we zoomed in on. It was (-2, 1). That's something we're going to want to record here: (-2, 1). I added a couple extra grid lines around it just so we can see in detail what the transformation does to points that are in a neighborhood of that point.

Over here, this square shows the zoomed-in version of that neighborhood. What we saw was that even though the function as a whole, as a transformation, looks rather complicated, around that one point it looks like a linear function. It's locally linear.

So what I’ll show you here is what matrix is going to tell you—the linear function that this looks like. This is going to be some kind of 2 by 2 matrix. I'll make a lot of room for ourselves here; it'll be a 2x2 matrix. The way to think about it is to first go back to our original setup before the transformation and think of just a tiny step to the right, what I'm going to think of as a little partial x, a tiny step in the x direction.

What that turns into after the transformation is going to be some tiny step in the output space. Here, let me actually kind of draw on what that tiny step turned into. It's no longer purely in the x direction. It has some rightward component but now also some downward component.

To be able to represent this in a nice way, what I’m going to do is instead of writing the entire function as something with a vector-valued output, I'm going to go ahead and represent this as two separate scalar-valued functions. I'm going to write the scalar value functions f1(x, y) and f2(x, y). So I'm just giving a name to x + sin(y) and f2(x, y), again, all I'm doing is giving a name to the functions we already have written down.

When I look at this vector, the consequence of taking a tiny dx step in the input space corresponds to some 2D movement in the output space. The x component of that movement, right—if I was going to draw this out and say, "Hey, what's the x component of that movement?"—that's something we think of as a little partial change in f1, the x component of our output.

If we divide this—if we take, you know, partial f1 divided by the size of that initial tiny change—it basically scales it up to be a normal-sized vector, not a tiny nudge, but something that's more constant that doesn't shrink as we zoom in further and further. Then, similarly, the change in the y direction, right? The vertical component of that step that was still caused by the dx, right, it's still caused by that initial step to the right—that is going to be the tiny partial change in f2, the y component of the output.

Because here, we're all just looking in the output space that was caused by a partial change in the x direction. Again, I kind of like to think about this: we're dividing by a tiny amount. This partial f2 is really a tiny, tiny nudge, but by dividing by the size of the initial tiny nudge that caused it, we're getting something that's basically a number, something that doesn't shrink when we consider more and more zoomed in versions.

So, that's all what happens when we take a tiny step in the x direction. But another thing you could do, another thing you can consider, is a tiny step in the y direction. We want to know, "Hey, if you take a single step, some tiny unit upward, what does that turn into after the transformation?"

What that looks like is this vector that still has some upward component, but it also has a rightward component. Now, I'm going to write its components as the second column of the matrix because, as we know, when you're representing a linear transformation with a matrix, the first column tells you where the first basis vector goes, and the second column shows where the second basis vector goes.

If that feels unfamiliar, either check out the refresher video or maybe go and look at some of the linear algebra content. To figure out the coordinates of this guy, we do basically the same thing. We say, first of all, the change in the x direction here, the x component of this nudge vector, that's going to be given as a partial change to f1, right, to the x component of the output.

Here, we're looking in the output space, so we're dealing with f1 and f2. We're asking what that change was that was caused by a tiny change in the y direction. So the change in f1 caused by some tiny step in the y direction, divided by the size of that tiny step, and then the y component of our output here, the y component of the step in the output space that was caused by the initial tiny step upward in the input space—well, that is the change of f2, the second component of our output, as caused by dy, as caused by that little partial y.

And of course, all of this is very specific to the point that we started at, right? We started at the point (-2, 1). So each of these partial derivatives is something that really we're saying, "Don't take the function, evaluate it at the point (-2, 1)." When you evaluate each one of these at the point (-2, 1), you'll get some number, and that'll give you a very concrete 2 by 2 matrix that's gonna represent the linear transformation that this guy looks like once you've zoomed in.

So this matrix here that's full of all of the different partial derivatives has a very special name. It's called, as you may have guessed, the Jacobian, or more fully, you'd call it the Jacobian matrix. One way to think about it is that it carries all of the partial differential information, right? It's taking into account both of these components of the output and both possible inputs, and giving you kind of a grid of what all the partial derivatives are.

But as I hope you see, it's much more than just a way of recording what all the partial derivatives are. There's a reason for organizing it like this, in particular, and it really does come down to this idea of local linearity. If you understand that the Jacobian matrix is fundamentally supposed to represent what a transformation looks like when you zoom in near a specific point, almost everything else about it will start to fall in place.

In the next video, I'll go ahead and actually compute this just to show you what the process looks like and how the result we get kind of matches with the picture we're looking at. See you then.

The Jacobian matrix

More Articles