yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Multivariable chain rule


6m read
·Nov 11, 2024

So I've written here three different functions. The first one is a multivariable function; it has a two variable input, (XY), and a single variable output, that's (x^2 \cdot y). That's just a number. And then the other two functions are each just regular old single variable functions.

What I want to do is start thinking about the composition of them. So, I'm going to take as the first component the value of the function (x(t)). So you pump (t) through that, and then you um make that the first component of (f), and the second component will be the value of the function (y(t)). The image that you might have in your head for something like this is you can think of (t) as just living on a number line of some kind. Then you have (x) and (y), which is just a plane, so that'll be, you know, your (x) coordinate, your (y) coordinate in two-dimensional space.

Then you have your output, which is just whatever the value of (f) is. For this whole function, for this whole composition of functions, you're thinking of (x(t), y(t)) as taking a single point in (t) and kind of moving it over to two-dimensional space somewhere. And then from there, our multivariable function takes that back down. So this is just a single variable function, nothing, you know, nothing too fancy going on in terms of where you start and where you end up. It's just what's happening in the middle.

What I want to know is, what's the derivative of this function? If I take this, and it's just an ordinary derivative, not a partial derivative, because this is a single variable function: one variable input, one variable output. How do you take its derivative? There's a special rule for this; it's called the Chain Rule, the multivariable chain rule. But you don't actually need it. So let's actually walk through this, showing that you don't need it. It's not that you'll never need it; it's just for computations like this, you could go without it.

It's a very useful theoretical tool, a very useful model to have in mind for what function composition looks like and implies for derivatives in the multivariable world. So let's just start plugging things in here. If I have (f) of (x(t)) of (y(t)), the first thing I might do is write (F) and instead of (x(t)), just write in (\cos(t)) since that's the function that I have for (x(t)), and then (y) we replace that with (s(t)). Of course, I'm hoping to take the derivative of this, and then from there, we can go to the definition of (f):

[f(x,y) = x^2 \cdot y]

which means we take that first component squared, so we'll take that first component (\cos(t)) and then square it, square that guy, and then we'll multiply it by the second component (s(t)). Again, we're just taking this derivative and you might be wondering, "Okay, why am I doing this?" You're just showing me how to take a first derivative, an ordinary derivative. But the pattern that we'll see is going to lead us to the multivariable chain rule, and it's actually kind of surprising when you see it in this context, 'cause it pops out in a way that you might not expect things to pop out.

So, continuing or chugging along, when you take the derivative of this, you do the product rule: left (d) right * plus right (d) left. So in this case, the left is (\cos^2(t)); we just leave that as it is, (\cos^2(t)), and multiply it by the derivative of the right (d) right. So that's going to be (s(t)) multiplied by (\cos(t)). Then we add to that, right, which is, you know, keep that right side unchanged multiplied by the derivative of the left.

For that, we use the chain rule, the single variable chain rule, where you think of taking the derivative of the outside. So you PP that, plop that two down like you're taking the derivative of (2x), but you're just writing in (\cos(t)) instead of (x); (\cos(t)) and then you multiply that by the derivative of the inside. That's a tongue twister, um, which is negative (s(t)).

I'm afraid I'm going to run off the edge here, certainly with the many, many parentheses that I need. I'll go ahead and rewrite this though. I'm going to rewrite it anyway because there's a certain pattern that I hope to make clear. So let me just rewrite this side, um, just copy that down here. I just want to rewrite this guy; you might be wondering why, but it'll become clear in just a moment why I want to do this.

In this case, I'm going to write this as (2 \cdot \cos(t) \cdot s(t)), and then all of that multiplied by negative (s(t)). So this is the derivative, this is the derivative of the composition of functions that ultimately was a single variable function, but it kind of went through two different variables. I just want to make an observation in terms of the partial derivatives of (f).

So let me just make a copy of this guy, give ourselves a little bit of room down here, just paste that over here. So let's look at the partial derivatives of (f) for a second here. If I took the partial derivative with respect to (X) (\partial X), which means (Y) is treated as a constant. So I take the derivative of (x^2) to get (2x) and then multiply it by that constant, which is just (y) if I also do it with respect to (Y), get all of them in there. So now (Y) looks like a variable, (X) looks like a constant.

So (X^2) also looks like a constant, constant times a variable; the derivative is just that constant. These two, their pattern comes up in the ultimate result that we got. This is the whole reason that I rewrote it: if you look at this (2xy), you can see that over here where (\cos(t)) corresponds to (x), (s) corresponds to (y) based on our original functions. Then (x^2) here corresponds with squaring the (x) that we put in there.

If we take the derivative of our two intermediary functions, the ordinary derivative of (x) with respect to (t), that's the derivative of (\cos(t)), which is negative (s(t)), and then similarly the derivative of (y) just the ordinary derivative, no partial going on here with respect to (t); that's equal to (\cos), derivative of (s) is (\cos).

These guys show up right; you see (-s) over here and you see (\cos) show up over here. We could generalize this; we could write it down and say at least for this specific example, it looks like the derivative of the composition is this part, which is the partial of (f) with respect to (y), right? That's kind of what it looks like here.

Once we've plugged in the intermediary functions, multiplied by this guy, which was the ordinary derivative of (y) with respect to (t). So that was the ordinary derivative of (y) with respect to (t). Very similarly, this guy was the partial of (f) with respect to (x), (\partial X), and we're multiplying it by the ordinary derivative of (x(t)) with respect to (t).

Of course, when I write this (\partial F/\partial Y), what I really mean is you plug in for (X) and (Y) the two coordinate functions (x(t), y(t)). Um, so if I say (\partial F/\partial y) over here, what I really mean is you take that (x^2) and then you plug in (X(t)^2) to get (\cos^2(t)), and same deal over here; you're always plugging things in, so you ultimately have a function of (t).

But this right here has a name: this is the multivariable chain rule, and it's important enough I'll just kind of I'll just write it out all on its own here. If we take the ordinary derivative with respect to (t) of a composition of a multivariable function, in this case just two variables (x(t), y(t)), where we're plugging in two intermediary functions (x(t), y(t)), each of which is just single variable, the result is that we take the partial derivative with respect to (X) and we multiply it by the derivative of (x) with respect to (t), and then we add to that the partial derivative with respect to (Y) multiplied by the derivative of (y) with respect to (t).

So this entire expression here is what you might call the simple version of the multivariable chain rule. Um, and you get there's a more general version and we'll kind of build up to it, but this is the simplest example you can think of where you start with one dimension and then you move over to two dimensions somehow, and then you move from those two dimensions down to one.

So this is that, and in the next video I'm going to talk about the intuition for why this is true. You know, here I just went through an example and showed, oh it just happens to be true, it fills this pattern. But there's a very nice line of reasoning for where this comes about, and I'll also talk about a more generalized form where you'll see it.

We start using vector notation; it makes things look very clean, and I might even get around to a more formal argument for why this is true. So see you next video.

More Articles

View All
Will This Go Faster Than Light?
The speed of light is meant to be the ultimate speed limit in the universe. According to Einstein’s special theory of relativity, nothing should move through space faster than light. But that doesn’t stop people from trying. Every day I get a lot of mess…
Dark Matter: The Unknown Force
A quick thanks to Squarespace for sponsoring this video! What if I told you that your entire life, everything you’ve ever seen, everyone you’ve ever met, every cluster of galaxies, stars, our planet, only makes up for less than 5% of the entire universe?…
Identifying the constant of proportionality from equation | 7th grade | Khan Academy
When you hear “constant of proportionality,” it can seem a little bit intimidating at first. It seems very technical, but as we’ll see, it’s a fairly intuitive concept, and we’ll do several examples. Hopefully, you’ll get a lot more comfortable with it. …
Measuring lengths in different units
So I have the same green rectangle up here and down here, and what I want to do is measure its width. But we’re going to measure its width in two different ways. Up here, we’re going to measure its width in terms of how many of these paper clips wide the …
Sharing Nkashi: Race for the Okavango with people of the Okavango Delta | National Geographic
Around the Okavango Delta, it isn’t just wildlife that relies on the waterways. The Delta is what we base our livelihood on. My relationship with mokoro goes way back to when I was a child. I was raised on it; I fish on it. It’s what I use to raise and pr…
Discover Ancient Wonders on the Coast of Newfoundland and Labrador | National Geographic
[Music] Mistaken Point around us, missed underfoot, petrified. Deep time rises, and Wealth’s to prod our souls here and there, breaking into sudden vow relief. 88% of Earth’s history is called the Precambrian age. Mistaken Point is the only World Heritage…