yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Example: Correlation coefficient intuition | Mathematics I | High School Math | Khan Academy


5m read
·Nov 11, 2024

So I took some screen captures from the Khan Academy exercise on correlation coefficient intuition. They've given us some correlation coefficients, and we need to match them to the various scatter plots on that exercise. There's a little interface where we can drag these around in a table to match them to the different scatter plots.

The point isn't to figure out how exactly to calculate these; we'll do that in the future, but really to get an intuition of what we're trying to measure. The main idea is that correlation coefficients are trying to measure how well a linear model can describe the relationship between two variables.

For example, if I have... let me draw some coordinate axes here. So, let's say that's one variable; say that's my y variable, and let's say that is my X variable. And so, let's say when X is low, Y is low. When X is a little higher, Y is a little higher. When X is a little bit higher, Y is higher. When X is really high, Y is even higher. This one, a linear model would describe it very, very well.

We can... it's quite easy to draw a line that effectively goes through those points. Something like this would have an R of one. R is equal to one; a linear model perfectly describes it. It's a positive correlation. When one increases, when one variable gets larger, then the other variable is larger. When one variable is smaller, then the other variable is smaller, and vice versa.

Now, what would an R of negative 1 look like? Well, that would once again be a situation where a linear model works really well, but when one variable moves up, the other one moves down and vice versa. So let me draw my coordinate axes again. I'm going to try to draw a dataset where the R would be negative 1.

So maybe when Y is high, X is very low. When Y becomes lower, X becomes higher. When Y becomes a good bit lower, X becomes a good bit higher. So once again, when Y decreases, X increases or as X increases, Y decreases. So they're moving in opposite directions. But you can fit a line very easily to this. So the line would look something like this.

So this would have an R of negative 1. An R of zero; R is equal to zero would be a dataset where a line doesn't really fit very well at all. I'll do that one really small since I don't have much space here.

So an R of zero might look something like this. Oh, maybe I have a data point here. Maybe I have a data point here. Maybe I have a data point here. Maybe I have one there, there, there, there, and it wouldn’t necessarily be this well organized. But this gives you a sense of things.

How would you actually try to fit a line here? You could equally justify a line that looks like that or a line that looks like that or a line that looks like that. So there really isn't a linear model that describes the relationship between the two variables that well right over here.

So with that as a primer, let's see if we can tackle these scatter plots. The way I'm going to do it is I'm just going to try to eyeball what a linear model might look like. There are different methods of trying to fit a linear model to a dataset, an imperfect dataset. I drew very perfect ones at least for R equals 1 and R equals -1, but these are what the real world actually looks like.

Very few times will things perfectly sit on a line. So for scatter plot A, if I were to try to fit a line, it would look something like that. If I were to try to minimize distances from these points to the line, I do see a general trend that when Y is... you know, if we look at these data points over here, when Y is high, X is low, and when X is high, when X is larger, Y is smaller.

So it looks like R is going to be less than zero and a reasonable bit less than zero. It's going to approach this thing here. And if we look at our choices, it wouldn’t be R equal to 0.65. These are positive, so I wouldn’t use that one or that one. And this one is almost no correlation, R equal to 0.02. This is pretty close to zero.

So I feel good with R equal to 0.72. R equal to 0.72. Now I want to be clear: if I didn't have these choices here, I wouldn’t just be able to say, just looking at these data points without being able to do a calculation that R is equal to 0.72. I'm just basing it on the intuition that it is a negative correlation.

It seems pretty strong; you know, the pattern kind of jumps out at you that when Y is large, X is small. When X is large, Y is small. So I like something that's approaching R equals -1. So I've used this one up already.

Now, scatter plot B. If I were to just try to eyeball it again, this is going to be imperfect. But the trend, if I were to try to fit a line, it looks something like that. So, it looks like a line fits it reasonably well. There are some points that would still be hard to fit; they're still pretty far from the line.

And it looks like it's a positive correlation. When X is small, Y is small. X is relatively small and vice versa. And when X... as X grows, Y grows. And when Y grows, X grows. So this one's going to be positive, and it looks like it would be reasonably positive.

I have two choices here, so I don’t know which of these it’s going to be. It’s either going to be R equal to 0.65 or R equal to 0.84. Let’s look at scatter plot C. Now, this one's all over the place. It kind of looks like what we did over here.

You know, I could... you know, well, what does a line look like? You can almost imagine anything. Does it look like that? Does it look like that? Does a line look like that? These things really don't seem to... there's not a direction that you could say, well, as X increases, maybe Y increases or decreases; there's no rhyme or reason here.

So this looks very non-correlated. This one is pretty close to zero, so I feel pretty good that this is R equal to 0.02. In fact, you know, if we tried, probably the best line that could be fit would be one with a slight negative slope. So it might look something like this.

And notice even when we try to fit a line, there are all sorts of points that are way off the line. So the linear model did not fit it that well. So R equal to 0.02. So we use that one.

Now we have scatter plot D. So that's going to use one of the other positive correlations. It does look like, you know, there is a positive correlation. When Y is low, X is low, and when X is high, Y is high, and vice versa.

We could try to fit something that looks something like that, but it's still not as good as that one. You can see the points that we're trying to fit; there are several points that are still pretty far away from our model.

So the model is not fitting it that well. I would say scatter plot B is a better fit. A linear model works better for scatter plot B than it works for scatter plot D. So I would give the higher R to scatter plot B and the lower R, R equal to 0.65 to scatter plot D. R is equal to 0.65.

Once again, that's because with the linear model, it looks like there's a trend, but there are several data points that are really way off the line in scatter plot D compared to scatter plot B. There are a few that are still way off the line in B, but these are even more off of the line in D.

More Articles

View All
Aretha Franklin Finds Her Sound | Genius: Aretha
[music playing] That was a wonderful performance. I wish I could stay. I think I sang really well. You always sing well. But we haven’t found it yet, have we? Not for lack of trying. So let’s get you back into the studio and put our heads together. Ham…
Give Society What It Doesn't Know How to Get
You’re not going to get rich renting out your time, but you say that you will get rich by giving society what it wants but does not yet know how to get at scale. That’s right. So essentially, I could… We talked about before, money is IOU’s from society sa…
See Why These Cute Little Goats Are the Latest Yoga Craze | Short Film Showcase
So I have six goats: Anel and Adams, because I’m a photographer, so that seemed fitting for my first two goats. They are all mini goats, but Dodger—that’s who I got next—and he’s a B goat. He was going to go in someone’s freezer; he’s a huge pain. Then I …
Equivalent ratios
We’re asked to select three ratios that are equivalent to seven to six. So pause this video and see if you can spot the three ratios that are equivalent to seven to six. All right, now let’s work through this together. The main thing to realize about equ…
How Does The Earth Spin?
[Music] If I, uh, apply a force to the globe, I can actually get it spinning in roughly the same way that the Earth spins. But it is tricky. There’s very little friction on the bottom because of it being supported on this thin layer of water. You can see …
BANNED Sega Ads!!!: Mind Blow 8
Meat flavored water and Japanese robot babies will someday rule the earth. Vsauce, Kevin here. This is Milo. Hey, it’s Mario. This is actually a Nintendo parody found in Sega’s Alex Kidd. But Nintendo paid the favor back in Donkey Kong Country 2 by stick…