yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Introduction to inference about slope in linear regression | AP Statistics | Khan Academy


5m read
·Nov 11, 2024

In this video, we're going to talk about regression lines. But it's not going to be the first time we're talking about regression lines. And so, if the idea of a regression is foreign to you, I encourage you to watch the introductory videos on it. Here, we're going to think about how we can make inferences from a regression line.

So, the idea of statistical inference is new to you or hypothesis testing? Once again, watch those videos as well. But let's say we think there's a positive association between shoe size and height. What we might want to do is, we could here on the horizontal axis that is shoe size. Our sizes could go size 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, and it could keep going up from there.

Then, on this height, on this axis, our y-axis, this would be height: so one foot, two feet, three feet, four feet, five feet, six feet, seven feet. Then, you could, to see if there's an association, you might take a sample. Let's say you take a random sample of 20 people from the population. In future videos, we'll talk about the conditions necessary for making appropriate inferences.

Let's say those 20 people are these 20 data points. So, there's a young child, then maybe there's a grown adult with bigger feet and who's taller, and then 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20. And so, you have these 20 data points.

Then, what you're likely to do is input them into a computer. You could do it by hand, but we have computers now to do that for us usually. The computer could try to fit a regression line, and there are many techniques for doing it. But one typical technique is to try to overall minimize the squared distance between these points and that line.

This regression line will have an equation, as any line would have. We tend to show that as saying (\hat{y}). This hat tells us that this is a regression line, is equal to the y-intercept (a) plus the slope times our (x) variable. So, this right over here would be (a).

Now, to be clear, if you took another sample, you might get different results here. In fact, let's call this (y_1) for our first sample (a_1) (b_1), and this is (a_1). If you were to take another sample of 20 folks, so let's do that. Maybe you get 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20. Then you try to fit a line to that. That line might look something like this. It might have a slightly different y-intercept and a slightly different slope.

So, we could call that for the second sample (\hat{y_2} = a_2 + b_2 \cdot x). Every time you take a sample, you are likely to get different results for these values, which are essentially statistics. Remember, statistics are things that we can get from samples, and we're trying to estimate true population parameters.

Well, what would be the true population parameters we're trying to estimate? Well, imagine a world. Imagine a world where you are able to find out the true linear relationship, or maybe there is some true linear relationship between shoe size and height. You could get it if theoretically you could measure every human being on the planet.

Depending on what you define as a population, it could be all living people or all people who will ever live. This isn't practical, but let's just say that you actually could, and you would have billions of data points here for the true population. Then, if you were to fit a regression line to that, you could view this as the true population regression line.

So, that would be (\hat{y}) is equal to... and to make it clear that here the y-intercept and the slope, this would be the true population parameters. Instead of saying (a), we say (\alpha). Instead of saying (b), we say (\beta) times (x). But it's very hard to come up exactly with what (\alpha) and (\beta) are, and so that's why we estimate them with (a)s and (b)s based on a sample.

Now, what's interesting with this in mind is we can start to make inferences based on our sample. We know that, for example, (b_2) is unlikely to be exactly (\beta). But how confident can we be that there is at least a positive linear relationship, or a non-zero linear relationship? Or can we create a confidence interval around this statistic in order to have a good sense of where the true parameter might actually be?

The simple answer is yes, and to do so, we'll use the same exact ideas that we did when we made inferences based on proportions or based on means. The way that you can make an inference, for example, for your true population slope of your regression line, is you say, "Okay, I took a sample, I got this slope right over here." So, I'll just call that (b_2), and then I could create a confidence interval around that.

That confidence interval is going to be based on some critical value times ideally the standard deviation of the sampling distribution of your sample statistic. In this case, it would be the sample regression line slope. But because we don't know exactly what this is, we can't figure out precisely what this is going to be from a sample. We are going to estimate it with what's known as the standard error of the statistic, and we'll go into more depth in this in future videos.

Since we're estimating here, we're going to use a critical (t) value here, which we have studied before. Based on your confidence level, you want to have, let's say it's 95 percent based on the degrees of freedom, which we'll see will come out of how many data points we have.

We can figure this out, and from our sample, we can create a confidence interval. We'll also see that you could do hypothesis testing here. You could say, "Hey, let's set up a null hypothesis." The null hypothesis is going to be that there's no non-zero linear relationship or that the true population slope of the regression line, or slope of the population regression line, is equal to zero.

The alternative hypothesis is that the true relationship could either be greater than zero, it's a positive linear relationship, or that it's just non-zero. What you could do, assuming this, is see what's the probability of getting a statistic that is at least this extreme or more extreme. If that's below some threshold, you might reject the null hypothesis, which would suggest the alternative.

So, this and this are things that we have done before where you're creating a confidence interval around a statistic or you're doing hypothesis testing, making assumptions about a true parameter. The only difference here is that the parameter that we're trying to estimate are going to be the parameters for a theoretical population regression line, and we're going to do that using sample statistics for a sample regression line.

More Articles

View All
EPIC BATMAN GIRLS And More: IMG! 16
A Nintendo guitar and hey you, it’s episode 16 of IMG. Hey Mario, what’s better than a one-up? Oh, but what power-up causes this? The same artist also gave his spin to Pikachu, a Smurf, and Bambi. We all know this is from Star Wars, but have you ever seen…
Intermediate value theorem example | Existence theorems | AP Calculus AB | Khan Academy
Let F be a continuous function on the closed interval from -2 to 1, where F of -2 is equal to 3 and F of 1 is equal to 6. Which of the following is guaranteed by the intermediate value theorem? So before I even look at this, what do we know about the int…
Slope, x-intercept, y-intercept meaning in context | Algebra I | Khan Academy
We’re told Glenn drained the water from his baby’s bathtub. The graph below shows the relationship between the amount of water left in the tub in liters and how much time had passed in minutes since Glenn started draining the tub. And then they ask us a f…
Talking with Russians | Mikhail Avdeev | EP 217
Hello to all of you listening to my podcast! Your attention is appreciated and never taken for granted. I hope you find it challenging, useful, and engaging. I’ve certainly learned a lot doing it. As you might know, this show is ad-supported. I have a po…
HOT SPIDER COSPLAY .... AND MORE! IMG! #25
In Taiwan, the Subways don’t require pants, and a boy in love—wait, it’s episode 25 of IMG. There is nothing better than sniffing hippo butt, except a jar full of kitty. Put things in front of your face to get a kiss, or a fish face, or just dress up in S…
Ryan Petersen on Building Flexport, a Modern Freight Forwarder
Ryan Peterson: Thanks for coming in for the podcast. Let’s start with a brief explanation of what Flexport is, because many people might not know what a freight forwarder is. Yes, well, Flexport is a freight forwarder first and foremost, and that means w…