Directional derivatives and slope
Hello everyone! So what I want to talk about here is how to interpret the directional derivative in terms of graphs. I have here the graph of a function, a multivariable function: it's ( F(x, y) = x^2 \cdot y ).
In the last couple of videos, I talked about what the directional derivative is, how you can formally define it, and how you compute it using the gradient. Generally, the way that you—um—the setup that you might have is you'll have some kind of vector, and this is a vector in the input space. So in this case, it's going to be in the ( xy ) plane. Um, and in this case, I'll just take the vector—I'll take the vector ( (1, 1) ).
Okay. The directional derivative, which we denote by kind of taking the gradient symbol except you stick the name of that vector down in the lower part there, the directional derivative of your function will still take the same inputs. This is kind of a measure of how the function changes when the input moves in that direction.
So I'll show you what I mean. I mean, you can imagine slicing this graph by some kind of plane, but that plane doesn't necessarily have to be parallel to the x or y axis. You know, that's what we did for the partial derivative; we took a plane that represented a constant ( x ) value or a constant ( y ) value. But this is going to be a plane that kind of tells you what movement in the direction of your vector looks like.
And like I have a number of other times, I'm going to go ahead and slice the graph along that plane. Just to make it clear, I'm going to color in where the graph intersects that slice. This vector here, this little ( v ), you'd be thinking of as living on the ( xy ) plane, and it's determining the direction of this plane that we're slicing things with. On the ( xy ) plane, you've got this vector ( (1, 1) ); it kind of points in that diagonal direction.
You take the whole plane and you slice your graph. If we want to interpret the directional derivative here—I'm going to go ahead and fill in an actual value—so let's say we wanted to do it at like ( (1, 1) ) or ( (-1, -1) ) because I guess I chose a plane that passes through the origin. I've got to make sure that the point I'm evaluating actually goes along this plane.
But you could imagine one that points in the same direction, but you kind of slide it back and forth. If we're doing this, we can interpret this as a slope. But you have to be very careful. If you're going to interpret this as a slope, it has to be the case that you're dealing with a unit vector; that the magnitude of your vector is equal to one.
I mean, it doesn't have to be—you can kind of account for it later—but it makes it easier to think about if we're just thinking of a unit vector. So when I go over here instead of saying that it's ( (1, 1) ), I'm going to say it's whatever vector points in that same direction but has a unit length. In this case, that happens to be ( \left( \frac{\sqrt{2}}{2}, \frac{\sqrt{2}}{2} \right) ) for each of the components. You can kind of think about why that would be true, Pythagorean theorem and all.
But this is a vector with unit length; its magnitude is 1, and it points in that direction. If we're evaluating this at a point like ( (-1, 1) ), we can draw that on the graph and see where it actually is. In this case, it'll be moving things about. When I add a point, it'll be this point here; and you kind of look from above, you say okay, that is kind of ( (1, 1) ).
If we want the slope at that point, you're kind of thinking of the tangent line here—the tangent line to that curve—and we're wondering what its slope is. The reason that the directional derivative is going to give us this slope is because you think about this. Another notation that might be kind of helpful for what this directional derivative is: some people will write ( \frac{\partial F}{\partial v} ).
You know, you could think about that as taking a slight nudge in the direction of ( v ). Right? So this would be a little nudge, a little partial nudge in the direction of ( v ), and then you're saying what change in the value of the function does that result in? You know, the height of the graph tells you the value of the function, and as this initial change approaches zero and the resulting change approaches zero as well, that ratio—the ratio of ( \frac{dF}{dv} ) to ( \frac{\partial F}{\partial v} )—is going to give you the slope of this tangent line.
Um, so conceptually that's kind of a nicer notation. But the reason we use this other notation, this ( \nabla F \cdot v ), is that it's very indicative of how you compute things. Once you need to compute it, you take the gradient of ( F ), just the vector-valued function, the gradient of ( F ), and take the dot product with your vector.
So let's actually do that, just to see what this would look like. I'll go ahead and write it over here; I'll use a different color. So, the gradient of ( F ), first of all, the gradient of ( F ) is a vector full of partial derivatives. So it'll be the partial derivative of ( F ) with respect to ( x ) and the partial derivative of ( F ) with respect to ( y ).
When we actually evaluate this, we take a look: the partial derivative of ( F ) with respect to ( x ) looks like the variable ( y ) is just a constant, so its partial derivative is ( 2 \cdot x \cdot y ). ( 2 \cdot x \cdot y ).
But when we take the partial with respect to ( y ), ( y ) now looks like a variable while ( x ) looks like a constant. The derivative of a constant times the variable is just that constant ( x^2 ).
If we were to evaluate this at the point ( (1, 1) ), you could plug that in: ( 2 \cdot 1 \cdot 1 ) would be ( 2 ) and then ( 1^2 ) would be ( 1 ). So that would be our gradient at that point, which means if we want to evaluate ( \nabla F \cdot v ), we could go over here and say that that's ( (2, 1) ) because we evaluate the gradient at the point we care about and then take the dot product with ( v ) itself.
In this case, ( \left( \frac{\sqrt{2}}{2}, \frac{\sqrt{2}}{2} \right) ), and ( \left( \frac{\sqrt{2}}{2}, \frac{\sqrt{2}}{2} \right) ). The answer that we'd get—we multiply the first two components together: ( 2 \cdot \frac{\sqrt{2}}{2} = \sqrt{2} ), and then here we multiply the second components together and that's going to be ( 1 \cdot \frac{\sqrt{2}}{2} ), ( \frac{\sqrt{2}}{2} ).
And that would be our answer; that would be our slope. But this only works if your vector is a unit vector. Because—and I showed this in the last video where we talked about the formal definition of the partial of the directional derivative—if you scale ( v ) by two, and I could do it here, if instead of ( v ) you're talking about ( 2v ), I'll go ahead and make myself some room here.
If you're taking the directional derivative along ( 2v ) of ( F ), the way that we're computing that, you're still taking the gradient of ( F ), dot product with ( 2 ) times your vector, and dot products—you can pull out that two.
This is just going to double the value of the entire thing of ( v ); this dot with ( v ) is going to be twice the value. The derivative will become twice the value, but you don't necessarily want that because you'd say this plane that you sliced it with—if instead of doing it in the direction of, you know, ( v ) the unit vector, you did it in the direction of ( 2 \cdot v ), it's the same plane; it's the same slice you're taking.
You'd want that same slope, so that's going to mess everything up. So this is super important. If you're thinking about things in the context of slope, one thing that you could say is your formula for the slope of a graph in the direction of ( v ) is you take your directional derivative, that dot product between ( F ) and ( v ), and you just always make sure to divide it by the magnitude of ( v ; \text{—} ) you know, divide it by that magnitude and that'll always take care of what you want.
That's basically a way of making sure that really you're taking the directional derivative in the direction of a certain unit vector. Some people even go so far as to define the directional derivative to be this, to be something where you normalize out the length of that vector.
I don't—I don't really like that, but I think that's because they're thinking of the slope context. They're thinking of, you know, rates of change as being the slope of a graph. And one thing I'd like to emphasize, as always, graphical intuition is good, and visual intuition is always great; you should always be trying to find a way to think about things visually.
But with multivariable functions, the graph isn't the only way. You can kind of more generally think about just a nudge in the ( v ) direction, and in the context where ( v ) doesn't have a length of one, you know, the nudge doesn't represent an actual size, but it's a certain scaling constant times that vector.
You can look at the video on the formal definition for the directional derivative if you want more details on that. But I do think this is actually a good way to get a feel for what the directional derivative is all about.