yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Multivariable chain rule


6m read
·Nov 11, 2024

So I've written here three different functions. The first one is a multivariable function; it has a two variable input, (XY), and a single variable output, that's (x^2 \cdot y). That's just a number. And then the other two functions are each just regular old single variable functions.

What I want to do is start thinking about the composition of them. So, I'm going to take as the first component the value of the function (x(t)). So you pump (t) through that, and then you um make that the first component of (f), and the second component will be the value of the function (y(t)). The image that you might have in your head for something like this is you can think of (t) as just living on a number line of some kind. Then you have (x) and (y), which is just a plane, so that'll be, you know, your (x) coordinate, your (y) coordinate in two-dimensional space.

Then you have your output, which is just whatever the value of (f) is. For this whole function, for this whole composition of functions, you're thinking of (x(t), y(t)) as taking a single point in (t) and kind of moving it over to two-dimensional space somewhere. And then from there, our multivariable function takes that back down. So this is just a single variable function, nothing, you know, nothing too fancy going on in terms of where you start and where you end up. It's just what's happening in the middle.

What I want to know is, what's the derivative of this function? If I take this, and it's just an ordinary derivative, not a partial derivative, because this is a single variable function: one variable input, one variable output. How do you take its derivative? There's a special rule for this; it's called the Chain Rule, the multivariable chain rule. But you don't actually need it. So let's actually walk through this, showing that you don't need it. It's not that you'll never need it; it's just for computations like this, you could go without it.

It's a very useful theoretical tool, a very useful model to have in mind for what function composition looks like and implies for derivatives in the multivariable world. So let's just start plugging things in here. If I have (f) of (x(t)) of (y(t)), the first thing I might do is write (F) and instead of (x(t)), just write in (\cos(t)) since that's the function that I have for (x(t)), and then (y) we replace that with (s(t)). Of course, I'm hoping to take the derivative of this, and then from there, we can go to the definition of (f):

[f(x,y) = x^2 \cdot y]

which means we take that first component squared, so we'll take that first component (\cos(t)) and then square it, square that guy, and then we'll multiply it by the second component (s(t)). Again, we're just taking this derivative and you might be wondering, "Okay, why am I doing this?" You're just showing me how to take a first derivative, an ordinary derivative. But the pattern that we'll see is going to lead us to the multivariable chain rule, and it's actually kind of surprising when you see it in this context, 'cause it pops out in a way that you might not expect things to pop out.

So, continuing or chugging along, when you take the derivative of this, you do the product rule: left (d) right * plus right (d) left. So in this case, the left is (\cos^2(t)); we just leave that as it is, (\cos^2(t)), and multiply it by the derivative of the right (d) right. So that's going to be (s(t)) multiplied by (\cos(t)). Then we add to that, right, which is, you know, keep that right side unchanged multiplied by the derivative of the left.

For that, we use the chain rule, the single variable chain rule, where you think of taking the derivative of the outside. So you PP that, plop that two down like you're taking the derivative of (2x), but you're just writing in (\cos(t)) instead of (x); (\cos(t)) and then you multiply that by the derivative of the inside. That's a tongue twister, um, which is negative (s(t)).

I'm afraid I'm going to run off the edge here, certainly with the many, many parentheses that I need. I'll go ahead and rewrite this though. I'm going to rewrite it anyway because there's a certain pattern that I hope to make clear. So let me just rewrite this side, um, just copy that down here. I just want to rewrite this guy; you might be wondering why, but it'll become clear in just a moment why I want to do this.

In this case, I'm going to write this as (2 \cdot \cos(t) \cdot s(t)), and then all of that multiplied by negative (s(t)). So this is the derivative, this is the derivative of the composition of functions that ultimately was a single variable function, but it kind of went through two different variables. I just want to make an observation in terms of the partial derivatives of (f).

So let me just make a copy of this guy, give ourselves a little bit of room down here, just paste that over here. So let's look at the partial derivatives of (f) for a second here. If I took the partial derivative with respect to (X) (\partial X), which means (Y) is treated as a constant. So I take the derivative of (x^2) to get (2x) and then multiply it by that constant, which is just (y) if I also do it with respect to (Y), get all of them in there. So now (Y) looks like a variable, (X) looks like a constant.

So (X^2) also looks like a constant, constant times a variable; the derivative is just that constant. These two, their pattern comes up in the ultimate result that we got. This is the whole reason that I rewrote it: if you look at this (2xy), you can see that over here where (\cos(t)) corresponds to (x), (s) corresponds to (y) based on our original functions. Then (x^2) here corresponds with squaring the (x) that we put in there.

If we take the derivative of our two intermediary functions, the ordinary derivative of (x) with respect to (t), that's the derivative of (\cos(t)), which is negative (s(t)), and then similarly the derivative of (y) just the ordinary derivative, no partial going on here with respect to (t); that's equal to (\cos), derivative of (s) is (\cos).

These guys show up right; you see (-s) over here and you see (\cos) show up over here. We could generalize this; we could write it down and say at least for this specific example, it looks like the derivative of the composition is this part, which is the partial of (f) with respect to (y), right? That's kind of what it looks like here.

Once we've plugged in the intermediary functions, multiplied by this guy, which was the ordinary derivative of (y) with respect to (t). So that was the ordinary derivative of (y) with respect to (t). Very similarly, this guy was the partial of (f) with respect to (x), (\partial X), and we're multiplying it by the ordinary derivative of (x(t)) with respect to (t).

Of course, when I write this (\partial F/\partial Y), what I really mean is you plug in for (X) and (Y) the two coordinate functions (x(t), y(t)). Um, so if I say (\partial F/\partial y) over here, what I really mean is you take that (x^2) and then you plug in (X(t)^2) to get (\cos^2(t)), and same deal over here; you're always plugging things in, so you ultimately have a function of (t).

But this right here has a name: this is the multivariable chain rule, and it's important enough I'll just kind of I'll just write it out all on its own here. If we take the ordinary derivative with respect to (t) of a composition of a multivariable function, in this case just two variables (x(t), y(t)), where we're plugging in two intermediary functions (x(t), y(t)), each of which is just single variable, the result is that we take the partial derivative with respect to (X) and we multiply it by the derivative of (x) with respect to (t), and then we add to that the partial derivative with respect to (Y) multiplied by the derivative of (y) with respect to (t).

So this entire expression here is what you might call the simple version of the multivariable chain rule. Um, and you get there's a more general version and we'll kind of build up to it, but this is the simplest example you can think of where you start with one dimension and then you move over to two dimensions somehow, and then you move from those two dimensions down to one.

So this is that, and in the next video I'm going to talk about the intuition for why this is true. You know, here I just went through an example and showed, oh it just happens to be true, it fills this pattern. But there's a very nice line of reasoning for where this comes about, and I'll also talk about a more generalized form where you'll see it.

We start using vector notation; it makes things look very clean, and I might even get around to a more formal argument for why this is true. So see you next video.

More Articles

View All
Technology and presidential communication | US government and civics | Khan Academy
In this video, we’re going to talk a little bit about how modern technology, like social media, has enhanced the communication power of the presidency. Now, being president has a lot of advantages, but politically, one of those advantages is that as pres…
Influential points in regression | AP Statistics | Khan Academy
I’m pretty sure I just tore my calf muscle this morning while sprinting with my son. But the math must not stop, so I’m here to help us think about what we could call influential points when we’re thinking about regressions. To help us here, I have this …
Tracing arithmetic expressions | Intro to CS - Python | Khan Academy
How does the computer evaluate expressions with multiple operators, multiple function calls, or even nested function calls? That’s a function call inside the parentheses of another function call. To examine this order of operations, let’s trace a program …
Introduction to frames of reference
I’d like to do in this video is talk about the notion of a frame of reference, and this is an introductory video. In future videos, we’ll go into a lot more depth. But a frame of reference is really the idea; it’s a point of view from which you are measu…
Discovering Homo Naledi: Journey to Find a Human Ancestor, Part 1 | Nat Geo Live
Lee: I’d come to South Africa. I’d launched myself into exploration. And out I went looking to combine these technologies: satellite imagery and handheld GPS. I started mapping sites. I saw that cave sites formed in linear lines. I saw fossil sites cluste…
How The Rich Live Longer
When your life looks exactly as you dreamed of, you want to live forever. Which is exactly what the ultra-rich are trying to do. Well, forever might be a bit of a stretch, but not entirely excluded, as you’ll see later. So what if money could buy you not …