yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Calculating the equation of a regression line | AP Statistics | Khan Academy


5m read
·Nov 11, 2024

In previous videos, we took this by variant data and we calculated the correlation coefficient. Just as a bit of a review, we have the formula here, and it looks a bit intimidating. But in that video, we saw all it is, is an average of the product of the z-scores for each of those pairs. And as we said, if R is equal to one, you have a perfect positive correlation. If R is equal to negative one, you have a perfect negative correlation, and if R is equal to zero, you don't have a correlation.

But for this particular bivariate data set, we got an R of 0.946, which means we have a fairly strong positive correlation. What we're going to do in this video is build on this notion and actually come up with the equation for the least squares line that tries to fit these points. So before I do that, let's just visualize some of the statistics that we have here for these data points. We clearly have the four data points plotted, but let's plot the statistics for X.

So the sample mean and the sample standard deviation for X are here in red. Actually, let me box these off in red so that you know that's what is going on here. So the sample mean for X, and it's easy to calculate (1 + 2 + 2 + 3 / 4) is (8 / 4), which is (2). So we have (X_{\text{mean}} = 2) right over here.

Then this is one standard deviation above the mean, and this is one sample standard deviation below the mean. We could do the same thing for the Y variables. So the mean is three, and this is one sample standard deviation for Y above the mean, and this is one sample standard deviation for Y below the mean. Visualizing these means, especially their intersection and also their standard deviations, will help us build an intuition for the equation of the least squares line.

So generally speaking, the equation for any line is going to be (Y = mx + b), where this is the slope and this is the Y-intercept for the regression line. We'll put a little hat over it so you would literally say (\hat{Y}). This tells you that this is a regression line that we're trying to fit to these points.

First, what is going to be the slope? Well, the slope is going to be (R) times the ratio between the sample standard deviation in the Y direction over the sample standard deviation in the X direction. This might not seem intuitive at first, but we'll talk about it in a few seconds, and hopefully, it'll make a lot more sense.

But the next thing we need to know is, alright, if we can calculate our slope, how do we calculate our Y-intercept? Well, like you first learned in Algebra 1, you can calculate the Y-intercept if you already know the slope by saying, "What point is definitely going to be on my line?" For a least squares regression line, you're definitely going to have the point ( (\text{sample mean of } X, \text{sample mean of } Y)).

So you're definitely going to go through that point. Before I even calculate for this particular example, where in previous videos we calculated the (R) to be (0.946) or roughly equal to that, let's just think about what's going on. Our least squares line is definitely going to go through that point.

Now, if (R) were one, if we had a perfect positive correlation, then our slope would be the standard deviation of Y over the standard deviation of X. So if you were to start at this point, and if you were to run your standard deviation of X and rise your standard deviation of Y, well, with a perfect positive correlation, your line would look like this.

That makes a lot of sense because you're looking at your spread of Y over your spread of X. If (R) were equal to one, this would be your slope: standard deviation of Y over standard deviation of X. That has parallels to when you first learn about slope, change in Y over change in X. Here, you're seeing the average spread in Y over the average spread in X, and this would be the case when (R) is one.

Let me write that down; this would be the case if (R) is equal to one. What if (R) were equal to negative one? It would look like this; that would be our line if we had a perfect negative correlation. Now, what if (R) were zero? Then your slope would be zero, and your line would just be this line (Y = \text{mean of } Y), so you'd just go through that right over there.

But now let's think about this scenario. In this scenario, our (R) is (0.946), so we have a fairly strong correlation. This is pretty close to one. So if you were to take (0.946) and multiply it by this ratio, if you were to move forward in X by the standard deviation in X, for this case, how much would you move up in Y? Well, you would move up (R) times the standard deviation of Y.

As we said, if (R) was one, you would get all the way up to this perfect correlation line, but here it's (0.946), so you would get up about 95% of the way to that. Our line, without even looking at the equation, is going to look something like this, which we can see is a pretty good fit for those points.

I'm not proving it here in this video, but now that we have an intuition for these things, hopefully, you appreciate this isn't just coming out of nowhere. It's some strange formula; it actually makes intuitive sense. Let's calculate it for this particular set of data.

(m) is going to be equal to (R (0.946) \times \frac{\text{sample standard deviation of Y} (2.60)}{\text{sample standard deviation of X} (0.816)}). We can get our calculator out to calculate that.

So we have (0.946 \times 2.60) divided by (0.816), and it gets us to (2.50). Let's just round to the nearest hundredth for simplicity here, so this is approximately equal to (2.50).

How do we figure out the Y-intercept? Well, remember we go through this point. So, we're going to have (2.50 \times (\text{X mean})). Our X mean is (2), times (2.50) plus (b) is going to be equal to our Y mean. Our Y mean, we see right over here, is (3).

So what do we get? We get (3 = 5 + b). So what is (b)? Well, if you subtract five from both sides, you get (b = -2).

And so there you have it, the equation for our regression line. We deserve a little bit of a drum roll here! We would say (\hat{Y} = 2.50X - 2), and we are done.

More Articles

View All
Analyzing mistakes when finding extrema (example 1) | AP Calculus AB | Khan Academy
Pamela was asked to find where ( h(x) = x^3 - 6x^2 + 12x ) has a relative extremum. This is her solution. So, step one, it looks like she tried to take the derivative. Step two, she tries to find the solution to find where the derivative is equal to zero…
Mad Brad | Wicked Tuna
All right, we’re going to haul up now and come in. Weird fishing, there’s fish around. There’s a couple bites; you don’t mark that many. It’s just very strange. There’s a ton of boats out here; everybody’s trying to get their last licks in before the end …
The Fifth Amendment | The National Constitution Center | US government and civics | Khan Academy
Hi, this is Kim from Khan Academy, and today I’m learning more about the Fifth Amendment to the U.S. Constitution. The Fifth Amendment is one of the better-known constitutional amendments since we frequently hear references to suspects taking the Fifth in…
Nothing is Real
Has anyone ever accused you of acting like you’re the center of the universe? Maybe you were 10 years old, upset that your mom wouldn’t take you to buy candy, or you were so focused on an upcoming project that you totally forgot to wish your coworker cong…
Ireland’s Underwater World | National Geographic
[Music] [Applause] [Music] The first time I saw it, I just thought, “Oh, how my father would have loved this.” Growing up, I was mesmerized by Cousteau films from the underwater world, and I thought, “Well, that couldn’t be Ireland; that must be some exot…
HOW TO INVEST $100 IN 2024 (THE 5 BEST WAYS)
What’s up, you guys? It’s Graham here. So yes, the title you read is correct: how to invest your first $100. Yes, I said it, $100! Everyone else out there has made videos on how to invest your first $1,000, how to invest your first $10,000, how to invest …