yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Multivariable chain rule and directional derivatives


5m read
·Nov 11, 2024

So in the last video, I introduced the vector form of the multivariable chain rule. Just to remind ourselves, I'm saying you have some kind of function f, and in this case, I said it comes from a 100-dimensional space.

You might imagine, well, I can't imagine a 100-dimensional space, but in principle, you're just thinking of some area that's 100 dimensions. It could be two if you wanted to think more concretely—two dimensions—and it's a scalar-valued function. So it just outputs to a number line, some kind of number line that I'll think of as f as its output.

What we're going to do is we compose it with a vector-valued function, so some function that takes in a single number t and then outputs into that super high-dimensional space. You're thinking you go from the single variable t to some very high-dimensional space that we think of as full of vectors. Then you take from that over to a single variable, over to a number.

You know, the way you'd write that out is you'd say f composed with the output of v, so f composed with v of t. What we're interested in doing is taking its derivative. The derivative of that composition is, and I told you and we kind of walked through where this comes from, the gradient of f evaluated at v of t, evaluated at your original output dot product with the derivative of v, the vectorized derivative.

What that means, you know, for v, you're just taking the derivative of every component. So when you take this, and you take the derivative with respect to t, all that means is that each component you’re taking the derivative of it: dx1/dt, dx2/dt, on and on until d (and then) the 100th component/dt.

So this was the vectorized form of the multivariable chain rule. What I want to do here is show how this looks a lot like a directional derivative. If you haven't watched the video on the directional derivative, maybe go back, take a look, kind of remind yourself.

But in principle, you say if you're in the input space of f, and you nudge yourself along some kind of vector v (and maybe just because I'm using v there, I'll instead say some kind of vector w, so not a function, just a vector), you're wondering, "Hey, how much does that result in a change to the output of f?" That's answered by the directional derivative.

You'd write the directional derivative in the direction of w of f, the directional derivative of f, and I should say at some point, some input point p for that input point. It's a vector in this case, like a 100-dimensional vector.

The way you evaluate it is you take the gradient of f. This is why we use the nabla notation in the first place—it's indicative of how we compute it—the gradient of f evaluated at that same input point, same input vector p. So here, just to be clear, you'd be thinking of whatever vector to your input point, that's p, but then the nudge, the nudge away from that input point is w.

You take the dot product between that and the vector itself, the vector that represents your nudge direction. But that looks a lot like the multivariable chain rule up here, except instead of w, you're taking the derivative—the vector-valued derivative of v.

So this whole thing, you could say, is the directional derivative in the direction of the derivative of t. That's kind of confusing—the directional derivative in the direction of a derivative of f. At what point are you taking this? At what point are you taking this directional derivative?

Well, it's wherever the output of v is. So this is very compact; it's saying quite a bit here. But a way that you could be thinking about this is v of t. So I'm going to kind of erase here, v of t as you're zooming all about. As you shift t, it kind of moves you through this space in some way, and each one of these output points here represents the vector v of t at some point.

The derivative of that—what does this derivative represent? That's the tangent vector to that motion. You know, so you're zipping about through that space. The tangent vector to your motion—that's how we interpret v prime of t, the derivative of v with respect to t.

And why should that make sense? Why should the directional derivative in the direction of v prime of t—this change to the intermediary function v—have anything to do with the multivariable chain rule? Well, remember what we're asking when we say d/dt of this composition—what we're saying is we take a tiny nudge to t.

So that tiny change here in the value t, and we're wondering what change that results in after the composition. Well, at a given point, that tiny nudge in t causes a change in the direction of v prime of t. That's kind of the whole meaning of this vector-valued derivative.

You change, you change t by a little bit, and that's going to tell you how you move in the output space. But then you say, "Okay, so I've moved a little bit in this intermediary 100-dimensional space. How does that influence the output of f based on the behavior of just the multivariable function f?"

Well, that's what the directional derivative is asking. It says you take a nudge in the direction of some vector. In this case, I wrote v prime of t over here. More generally, you could say any vector w. You take a nudge in that direction, and more importantly, you know, the size of the prime of t matters here. If you're moving really quickly, you would expect that change to be larger.

So the fact that v prime of t would be larger is helpful, and the directional derivative is telling you the size of the change in f as a ratio of the proportion of that directional vector that you went along. Right? You could, you know, another notation for the directional derivative is to say partial f and then partial whatever that vector is.

Basically saying you take a size of that nudge along that vector as a proportion of the vector itself, and then you consider the change to the output, and you're taking the ratio.

I think this is a very beautiful way of understanding the multivariable chain rule because it gives this image of, you know, you're thinking of v of t, and you're thinking of zipping along in some way. The direction and value of your velocity as you zip along is what determines the change in the output of the function f.

So hopefully that helps give a better understanding both of the directional derivative and of the multivariable chain rule. It’s one of those nice little interpretations.

More Articles

View All
Period of a Pendulum | Simple harmonic motion and rotational motion | AP Physics 1 | Khan Academy
So a simple pendulum is just a mass hanging from a string, and if you were to pull this mass—sometimes it’s called a pendulum bob—if you were to pull it back and then let go, gravity would act as a restoring force, and this mass would swing back and forth…
How Billionaires Foolproof Their Wealth
Most people think that making money is hard, but that’s false. Making money is actually relatively easy. The hard part is keeping and transferring wealth across generations. This is what most people have a lot of trouble with, so let’s fix that by learnin…
Interval of convergence for derivative and integral | Series | AP Calculus BC | Khan Academy
Times in our dealings with power series, we might want to take the derivative or we might want to integrate them. In general, we can do this term by term. What do I mean by that? Well, that means that the derivative of f prime of x is just going to be the…
Would Neil deGrasse Tyson Accept a Drone Delivery? | StarTalk
[Music] I don’t want a drone coming outside my window; it’s that simple. If you have a drop point for drones to deliver goods and services, fine. If you got a package, leave it in the back. But don’t come up to my window knocking and say, “Are you in? Ca…
The #1 PROBLEM with Betterment Investing
What’s up you guys, it’s Graham here. So lately there’s been a very big focus towards investment apps and high interest savings accounts that offer you a pretty substantial value for what it is. Like, at first we had a lie bank with their 2.2 percent int…
Funding Is an Outcome of Building a Good Business - Porter Braswell of Jopwell
Maybe the best place to start would be, let’s explain what job well is, and then we can kind of go back in time and get to where we are now. Cool, cool. So also thanks for coming in. Absolutely my pleasure, thank you for having me. Appreciate it. Yeah, s…