Multivariable chain rule

6m read

·Nov 11, 2024

So I've written here three different functions. The first one is a multivariable function; it has a two variable input, (XY), and a single variable output, that's (x^2 \cdot y). That's just a number. And then the other two functions are each just regular old single variable functions.

What I want to do is start thinking about the composition of them. So, I'm going to take as the first component the value of the function (x(t)). So you pump (t) through that, and then you um make that the first component of (f), and the second component will be the value of the function (y(t)). The image that you might have in your head for something like this is you can think of (t) as just living on a number line of some kind. Then you have (x) and (y), which is just a plane, so that'll be, you know, your (x) coordinate, your (y) coordinate in two-dimensional space.

Then you have your output, which is just whatever the value of (f) is. For this whole function, for this whole composition of functions, you're thinking of (x(t), y(t)) as taking a single point in (t) and kind of moving it over to two-dimensional space somewhere. And then from there, our multivariable function takes that back down. So this is just a single variable function, nothing, you know, nothing too fancy going on in terms of where you start and where you end up. It's just what's happening in the middle.

What I want to know is, what's the derivative of this function? If I take this, and it's just an ordinary derivative, not a partial derivative, because this is a single variable function: one variable input, one variable output. How do you take its derivative? There's a special rule for this; it's called the Chain Rule, the multivariable chain rule. But you don't actually need it. So let's actually walk through this, showing that you don't need it. It's not that you'll never need it; it's just for computations like this, you could go without it.

It's a very useful theoretical tool, a very useful model to have in mind for what function composition looks like and implies for derivatives in the multivariable world. So let's just start plugging things in here. If I have (f) of (x(t)) of (y(t)), the first thing I might do is write (F) and instead of (x(t)), just write in (\cos(t)) since that's the function that I have for (x(t)), and then (y) we replace that with (s(t)). Of course, I'm hoping to take the derivative of this, and then from there, we can go to the definition of (f):

[f(x,y) = x^2 \cdot y]

which means we take that first component squared, so we'll take that first component (\cos(t)) and then square it, square that guy, and then we'll multiply it by the second component (s(t)). Again, we're just taking this derivative and you might be wondering, "Okay, why am I doing this?" You're just showing me how to take a first derivative, an ordinary derivative. But the pattern that we'll see is going to lead us to the multivariable chain rule, and it's actually kind of surprising when you see it in this context, 'cause it pops out in a way that you might not expect things to pop out.

So, continuing or chugging along, when you take the derivative of this, you do the product rule: left (d) right * plus right (d) left. So in this case, the left is (\cos^2(t)); we just leave that as it is, (\cos^2(t)), and multiply it by the derivative of the right (d) right. So that's going to be (s(t)) multiplied by (\cos(t)). Then we add to that, right, which is, you know, keep that right side unchanged multiplied by the derivative of the left.

For that, we use the chain rule, the single variable chain rule, where you think of taking the derivative of the outside. So you PP that, plop that two down like you're taking the derivative of (2x), but you're just writing in (\cos(t)) instead of (x); (\cos(t)) and then you multiply that by the derivative of the inside. That's a tongue twister, um, which is negative (s(t)).

I'm afraid I'm going to run off the edge here, certainly with the many, many parentheses that I need. I'll go ahead and rewrite this though. I'm going to rewrite it anyway because there's a certain pattern that I hope to make clear. So let me just rewrite this side, um, just copy that down here. I just want to rewrite this guy; you might be wondering why, but it'll become clear in just a moment why I want to do this.

In this case, I'm going to write this as (2 \cdot \cos(t) \cdot s(t)), and then all of that multiplied by negative (s(t)). So this is the derivative, this is the derivative of the composition of functions that ultimately was a single variable function, but it kind of went through two different variables. I just want to make an observation in terms of the partial derivatives of (f).

So let me just make a copy of this guy, give ourselves a little bit of room down here, just paste that over here. So let's look at the partial derivatives of (f) for a second here. If I took the partial derivative with respect to (X) (\partial X), which means (Y) is treated as a constant. So I take the derivative of (x^2) to get (2x) and then multiply it by that constant, which is just (y) if I also do it with respect to (Y), get all of them in there. So now (Y) looks like a variable, (X) looks like a constant.

So (X^2) also looks like a constant, constant times a variable; the derivative is just that constant. These two, their pattern comes up in the ultimate result that we got. This is the whole reason that I rewrote it: if you look at this (2xy), you can see that over here where (\cos(t)) corresponds to (x), (s) corresponds to (y) based on our original functions. Then (x^2) here corresponds with squaring the (x) that we put in there.

If we take the derivative of our two intermediary functions, the ordinary derivative of (x) with respect to (t), that's the derivative of (\cos(t)), which is negative (s(t)), and then similarly the derivative of (y) just the ordinary derivative, no partial going on here with respect to (t); that's equal to (\cos), derivative of (s) is (\cos).

These guys show up right; you see (-s) over here and you see (\cos) show up over here. We could generalize this; we could write it down and say at least for this specific example, it looks like the derivative of the composition is this part, which is the partial of (f) with respect to (y), right? That's kind of what it looks like here.

Once we've plugged in the intermediary functions, multiplied by this guy, which was the ordinary derivative of (y) with respect to (t). So that was the ordinary derivative of (y) with respect to (t). Very similarly, this guy was the partial of (f) with respect to (x), (\partial X), and we're multiplying it by the ordinary derivative of (x(t)) with respect to (t).

Of course, when I write this (\partial F/\partial Y), what I really mean is you plug in for (X) and (Y) the two coordinate functions (x(t), y(t)). Um, so if I say (\partial F/\partial y) over here, what I really mean is you take that (x^2) and then you plug in (X(t)^2) to get (\cos^2(t)), and same deal over here; you're always plugging things in, so you ultimately have a function of (t).

But this right here has a name: this is the multivariable chain rule, and it's important enough I'll just kind of I'll just write it out all on its own here. If we take the ordinary derivative with respect to (t) of a composition of a multivariable function, in this case just two variables (x(t), y(t)), where we're plugging in two intermediary functions (x(t), y(t)), each of which is just single variable, the result is that we take the partial derivative with respect to (X) and we multiply it by the derivative of (x) with respect to (t), and then we add to that the partial derivative with respect to (Y) multiplied by the derivative of (y) with respect to (t).

So this entire expression here is what you might call the simple version of the multivariable chain rule. Um, and you get there's a more general version and we'll kind of build up to it, but this is the simplest example you can think of where you start with one dimension and then you move over to two dimensions somehow, and then you move from those two dimensions down to one.

So this is that, and in the next video I'm going to talk about the intuition for why this is true. You know, here I just went through an example and showed, oh it just happens to be true, it fills this pattern. But there's a very nice line of reasoning for where this comes about, and I'll also talk about a more generalized form where you'll see it.

We start using vector notation; it makes things look very clean, and I might even get around to a more formal argument for why this is true. So see you next video.

Multivariable chain rule

More Articles