yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Example: Correlation coefficient intuition | Mathematics I | High School Math | Khan Academy


5m read
·Nov 11, 2024

So I took some screen captures from the Khan Academy exercise on correlation coefficient intuition. They've given us some correlation coefficients, and we need to match them to the various scatter plots on that exercise. There's a little interface where we can drag these around in a table to match them to the different scatter plots.

The point isn't to figure out how exactly to calculate these; we'll do that in the future, but really to get an intuition of what we're trying to measure. The main idea is that correlation coefficients are trying to measure how well a linear model can describe the relationship between two variables.

For example, if I have... let me draw some coordinate axes here. So, let's say that's one variable; say that's my y variable, and let's say that is my X variable. And so, let's say when X is low, Y is low. When X is a little higher, Y is a little higher. When X is a little bit higher, Y is higher. When X is really high, Y is even higher. This one, a linear model would describe it very, very well.

We can... it's quite easy to draw a line that effectively goes through those points. Something like this would have an R of one. R is equal to one; a linear model perfectly describes it. It's a positive correlation. When one increases, when one variable gets larger, then the other variable is larger. When one variable is smaller, then the other variable is smaller, and vice versa.

Now, what would an R of negative 1 look like? Well, that would once again be a situation where a linear model works really well, but when one variable moves up, the other one moves down and vice versa. So let me draw my coordinate axes again. I'm going to try to draw a dataset where the R would be negative 1.

So maybe when Y is high, X is very low. When Y becomes lower, X becomes higher. When Y becomes a good bit lower, X becomes a good bit higher. So once again, when Y decreases, X increases or as X increases, Y decreases. So they're moving in opposite directions. But you can fit a line very easily to this. So the line would look something like this.

So this would have an R of negative 1. An R of zero; R is equal to zero would be a dataset where a line doesn't really fit very well at all. I'll do that one really small since I don't have much space here.

So an R of zero might look something like this. Oh, maybe I have a data point here. Maybe I have a data point here. Maybe I have a data point here. Maybe I have one there, there, there, there, and it wouldn’t necessarily be this well organized. But this gives you a sense of things.

How would you actually try to fit a line here? You could equally justify a line that looks like that or a line that looks like that or a line that looks like that. So there really isn't a linear model that describes the relationship between the two variables that well right over here.

So with that as a primer, let's see if we can tackle these scatter plots. The way I'm going to do it is I'm just going to try to eyeball what a linear model might look like. There are different methods of trying to fit a linear model to a dataset, an imperfect dataset. I drew very perfect ones at least for R equals 1 and R equals -1, but these are what the real world actually looks like.

Very few times will things perfectly sit on a line. So for scatter plot A, if I were to try to fit a line, it would look something like that. If I were to try to minimize distances from these points to the line, I do see a general trend that when Y is... you know, if we look at these data points over here, when Y is high, X is low, and when X is high, when X is larger, Y is smaller.

So it looks like R is going to be less than zero and a reasonable bit less than zero. It's going to approach this thing here. And if we look at our choices, it wouldn’t be R equal to 0.65. These are positive, so I wouldn’t use that one or that one. And this one is almost no correlation, R equal to 0.02. This is pretty close to zero.

So I feel good with R equal to 0.72. R equal to 0.72. Now I want to be clear: if I didn't have these choices here, I wouldn’t just be able to say, just looking at these data points without being able to do a calculation that R is equal to 0.72. I'm just basing it on the intuition that it is a negative correlation.

It seems pretty strong; you know, the pattern kind of jumps out at you that when Y is large, X is small. When X is large, Y is small. So I like something that's approaching R equals -1. So I've used this one up already.

Now, scatter plot B. If I were to just try to eyeball it again, this is going to be imperfect. But the trend, if I were to try to fit a line, it looks something like that. So, it looks like a line fits it reasonably well. There are some points that would still be hard to fit; they're still pretty far from the line.

And it looks like it's a positive correlation. When X is small, Y is small. X is relatively small and vice versa. And when X... as X grows, Y grows. And when Y grows, X grows. So this one's going to be positive, and it looks like it would be reasonably positive.

I have two choices here, so I don’t know which of these it’s going to be. It’s either going to be R equal to 0.65 or R equal to 0.84. Let’s look at scatter plot C. Now, this one's all over the place. It kind of looks like what we did over here.

You know, I could... you know, well, what does a line look like? You can almost imagine anything. Does it look like that? Does it look like that? Does a line look like that? These things really don't seem to... there's not a direction that you could say, well, as X increases, maybe Y increases or decreases; there's no rhyme or reason here.

So this looks very non-correlated. This one is pretty close to zero, so I feel pretty good that this is R equal to 0.02. In fact, you know, if we tried, probably the best line that could be fit would be one with a slight negative slope. So it might look something like this.

And notice even when we try to fit a line, there are all sorts of points that are way off the line. So the linear model did not fit it that well. So R equal to 0.02. So we use that one.

Now we have scatter plot D. So that's going to use one of the other positive correlations. It does look like, you know, there is a positive correlation. When Y is low, X is low, and when X is high, Y is high, and vice versa.

We could try to fit something that looks something like that, but it's still not as good as that one. You can see the points that we're trying to fit; there are several points that are still pretty far away from our model.

So the model is not fitting it that well. I would say scatter plot B is a better fit. A linear model works better for scatter plot B than it works for scatter plot D. So I would give the higher R to scatter plot B and the lower R, R equal to 0.65 to scatter plot D. R is equal to 0.65.

Once again, that's because with the linear model, it looks like there's a trend, but there are several data points that are really way off the line in scatter plot D compared to scatter plot B. There are a few that are still way off the line in B, but these are even more off of the line in D.

More Articles

View All
The Less You Seek, The More You’ll Find | The Happiness Paradox
The less we try to think about a blue elephant, the more likely this creature persists in residing in our thoughts. Imagine the blue elephant represents our unhappiness – our dissatisfaction with life – hence the color blue. Obviously, no one likes feelin…
Ellipse standard equation from graph | Precalculus | High School Math | Khan Academy
So we have an ellipse graph right over here. What we’re going to try to do is find the equation for this ellipse. So like always, pause this video and see if you can figure it out on your own. All right, so let’s just remind ourselves of the form of an e…
Ray Dalio & Bill Belichick on Tough Love: Part 1
The most challenging part was to be tough on tough love. I used to think about Vince Lombardi’s tough love. Tough love, you know, you got to be that toughness that then raises them to another level. And then when you give it with love, you got to give it …
BEST of MARGIN CALL #4 - Senior Partners Emergency Meeting
Please, sit down. Welcome, everyone. I must apologize for dragging you all here at such an uncommon hour. But from what I’ve been told, this matter needs to be dealt with urgently. So urgently, in fact, it probably should have been addressed weeks ago. Bu…
Addition using groups of 10 and 100 | 2nd grade | Khan Academy
[Voiceover] So, let’s do some practice problems on Khan Academy exercises that make us rewrite an addition problem so that we can get them to rounder numbers. Numbers that might be multiples of 10, or multiples of 100. So, let’s see here, I have 63 plus…
Pitch Practice with FlavorCloud, Holly Liu, and Adora Cheung
So the next thing we’re going to do is bring up Flavor Cloud, who is going to pitch Holly, who is the investor here, and then go from there. Yep, so I guess we’re gonna be sharing. Sorry, so I’m gonna be an angel investor, and I’ve done some angel investi…