yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Multivariable chain rule intuition


6m read
·Nov 11, 2024

So, in the last video, I introduced this multi-variable chain rule, and here, I want to explain a loose intuition for why it's true, why you would expect something like this to happen.

The way you think about an expression like this, you have this multivariable function ( f(x,y) ), and you're plugging things in. But just that function itself, you'll be thinking of taking a two-dimensional space. You know, here's our ( xy ) plane, and then mapping it to, you know, just a real number line. And I'll think of this as ( f ) as the output.

Somehow, our whole function, our function takes things from this two-dimensional space and plugs it onto this output ( t ) you're thinking of as just another number line up here, so ( t ). Then, you've got two separate functions here, you know ( x(t) ) and ( y(t) ). Each of which takes that same value for a specific input. You know, it's not that they're acting on different inputs ( x ) of some other input ( t ) and ( y ) of some other input. It's the same one.

And then they move that somewhere to this output space, which itself gets moved over. In this way, you're thinking of it as just a single variable function that goes from ( t ) and ultimately outputs ( f ). It's just that there's multi-dimensional stuff happening in between.

Now, if we start thinking about the derivative of it, what does that mean? What does that mean for the conception of the picture that we have going on here? Well, that bottom part, that ( dt ), you're thinking of as a tiny change to ( t ). Right? So, you think of it as kind of a nut. I'll draw it as a sizeable line here for like moving from some original input over, but you might in principle think of it as a very, very tiny nudge in ( t ).

And over here, you'd say, well, that's going to move your intermediary output in the ( xy ) plane to, you know, maybe it'll move it in some amount. Again, imagine this is a very small nudge. I'm going to give it some size here just so I can write into it. Then whatever that nudge in the output space, and it's a nudge in some direction, that's going to correspond to some change in ( f ).

Some change based on, you know, based on the differential properties of the multivariable function itself. And if we think about this, this change, you might break it into components and say this shift here has some kind of ( dx ), some kind of shift in the ( x ) direction, and some kind of ( dy ), some shift in the ( y ) direction.

But you can actually reason about what these should be, because it's not just an arbitrary change in ( x ) or an arbitrary change in ( y ). It's the one that was caused by ( dt ). So if I go over here, I might say that ( dx ) is caused by that ( dt ). And the whole meaning of the derivative, the whole meaning of the single variable derivative would be that when we take ( \frac{dx}{dt} ), this is the factor that tells us, you know, a tiny nudge in ( t ), how much does that change the ( x ) component.

And if you want, you could think of this as kind of canceling out the ( dts ), and you're just left with ( x ). But really, you're saying there's a tiny nudge in ( t ), and that results in a change in ( x ). This derivative is what tells you the ratio between those sizes. Similarly, that change in ( y ) here, that change in ( y ) is going to be somehow proportional to the change in ( t ), and that proportion is given by the derivative of ( y ) with respect to ( t ).

That's the whole point of the derivative, no, no, with respect to ( t ). And again, you can kind of think of it as if you're canceling out the ( t )s. And this is why the fractional writing, this Leibniz notation, is actually pretty helpful. You know, people will say, "Oh, mathematicians would like shake their heads at the idea of treating these like fractions."

But not only is it a useful thing to do because it is a helpful mnemonic, it's reflective of what you're going to do when you make a very formal argument. I think I'll do that in one of the following videos. I'll describe this in a very, much more formal way that's a little bit more airtight than the kind of hand-waving nudging around.

But the intuition you get from just writing this as a fraction is basically the scaffolding for that formal argument. So it's a fine thing to do. I don't think mathematicians are shaking their heads every time that a student or a teacher does this. But anyway, this is kind of what gives you what that ( dx ) is, what that ( dy ) is.

And then over here, if you're saying how much does that change the ultimate output of the ( f ), you could say, well, your nudge of size ( dx ) over here, you're wondering how much that changes the output of ( f ). That's the meaning of the partial derivative. Right? If we say we have the partial derivative with respect to ( x ), what that means is that if you take a tiny nudge of size ( x ), this is giving you the ratio between that and the ultimate, you know, change to the output that you want.

You could kind of think of it like this: partial ( f ) is canceling out with that ( dx ), if you wanted. Or you could just say, you know, this is a tiny nudge in ( x ). This is going to result in some change in ( f ). I'm not sure what, but the meaning of the derivative is the ratio between those two, and that's what lets you figure it out.

Similarly, that just, you might call this the change in ( f ) caused by ( x ), like due to ( x ), due to (I should say) ( dx ). But that's not the only thing changing the value of ( f ), right? That's not the only change happening in the input space. You also have another change in ( f ), and this one I might say is due to ( dy ), due to that tiny shift in ( y ).

What that's going to be, we know it's going to be proportional to that tiny shift in ( y ), and the proportionality constant, this is the meaning of the partial derivative. That when you nudge ( y ) in some way, it results in some kind of nudge in ( f ), and the ratio between those two is what the derivative gives.

So ultimately, if you put this all together, what you'd say is there's two different things causing an ultimate change to ( f ). So if you put these together, and you want to know what the total change in ( f ) is, I might go over here and say the total change in ( f ): one of them is caused by ( \frac{\partial f}{\partial x} ), and I can multiply it by ( dx ) here.

But really, we know that ( dx ), the change there, was in turn caused by ( dt ), so that in turn is caused by the change in the ( x ) component that was due to ( dt ), that was of course of size ( dt ). For similar reasons, the other way that this changes in the ( y ) direction is ( \frac{\partial f}{\partial y} ).

But what caused that initial shift in ( y )? You'd say, well, that was, you know, a shift in ( y ) that was due to ( t ), and that size is ( dy = \frac{dy}{dt} \times dt), you could think of it. So slight nudge in ( t ) causes the change in ( y ), that change in ( y ) causes the change in ( f ), and when you add those two together, that's everything that's going on. That's everything that influences the ultimate change in ( f ).

Then, if you take this whole expression and you divide everything out by ( dt ), so, you know, I kind of erase it from this side and put it over here. ( dt ), this is your multivariable chain rule. Of course, I've just written the same thing again, but hopefully, this gives a little bit of an intuition for how you're composing different nudges and why you want to think about it that way.

Of course, you can see this, and you see that like the ( \partial f ) kind of cancels out with that ( dx ), and this ( \partial y ) kind of cancels out with that ( dy ), and you're left with the two different things that constitute a change in ( x ). You know, this one is only partially the change in ( f ). This is also only partially the change in ( f ), but together they give the ultimate change in ( f ).

I think that gives a very strong reason, if you break it down like that, why this should be true.

More Articles

View All
Comparative advantage worked example | Basic economics concepts | AP Macroeconomics | Khan Academy
The countries of Kalos and Johto can produce two goods: shiny charms and berries. You got to love these worlds created in these economic questions. The table below describes the production possibilities of each country in a day. So, here it tells us that…
Homeroom with Sal & Superintendent Austin Beutner - Wednesday, September 30
Hi everyone! Sal Khan here from Khan Academy. Welcome to our homeroom live stream. I’m very excited about today’s guest, Superintendent Austin Buettner from Los Angeles Unified School District. So already, start thinking about some questions you might ha…
Turning Sound Into Music—Why Do We Do It? | Short Film Showcase
What is sound? Uh, what is sound? Sound is just a cross-modal version of touch in a way, and that there are these waves that sort of move through the air, and they get in your ear and they actually hit the eardrum, and they push it back and forth. And so …
Hear/here and accept/except | Frequently confused words | Usage | Grammar
Hello grammarians! Today, we’re going to talk about two sets of frequently confused words: hear and here, and accept versus except. These words are pronounced very similarly to one another, but they have very different meanings. So, what I’m going to try…
Example: Analyzing distribution of sum of two normally distributed random variables | Khan Academy
Shinji commutes to work, and he worries about running out of fuel. The amount of fuel he uses follows a normal distribution for each part of his commute, but the amount of fuel he uses on the way home varies more. The amounts of fuel he uses for each part…
Citizenship in early America, 1789-1830s | Citizenship | High school civics | Khan Academy
In this video and the one that follows, I’m going to give you a brief overview of citizenship rights in early America. Who was considered a citizen? Did having citizenship mean that you had the right to vote? How did citizenship and voting rights change…