yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Confidence interval for the slope of a regression line | AP Statistics | Khan Academy


5m read
·Nov 11, 2024

Musa is interested in the relationship between hours spent studying and caffeine consumption among students at his school. He randomly selects 20 students at his school and records their caffeine intake in milligrams and the amount of time studying in a given week. Here is a computer output from a least squares regression analysis on his sample. Assume that all conditions for inference have been met. What is the 95 percent confidence interval for the slope of the least squares regression line?

So if you feel inspired, pause the video and see if you can have a go at it. Otherwise, we'll do this together.

Okay, so let's first remind ourselves what's even going on. So let's visualize the regression. Our horizontal axis, or our x-axis, would be our caffeine intake in milligrams, and then our y-axis, or our vertical axis, that would be the, I would assume it's in hours, so time, time studying. Musa here, he randomly selects 20 students, and so for each of those students, he sees how much caffeine they consumed and how much time they spent studying and plots them here.

There will be 20 data points: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20. He inputs these data points into a computer in order to fit a least squares regression line, and let's say the least squares regression line looks something like this. A least squares regression line comes from trying to minimize the square distance between the line and all of these points.

This is giving us information on that least squares regression line. The most valuable things here, if we really want to help visualize or understand the line, are what we get in this column. The constant coefficient tells us essentially what is the y-intercept here: 2.544. And then the coefficient on the caffeine, this is one way of thinking about, well, for every incremental increase in caffeine, how much does this time studying increase? You might recognize this as the slope of the least squares regression line.

So this is the slope, and this would be equal to 0.164. Now, this information right over here tells us how well our least squares regression line fits the data. R squared, you might already be familiar with it, says how much of the variance in the y variable is explainable by the x variable. If it was 1 or 100, that means all of it could be explained and it's a very good fit. If it was 0, that means none of it can be explained, and it'll be a very bad fit.

Capital S, this is the standard deviation of the residuals, and it's another measure of how much these data points vary from this regression line. Now, this column right over here is going to prove to be useful for answering the question at hand. This gives us the standard error of the coefficient, and the coefficient that we really care about, the statistic that we really care about, is the slope of the regression line.

This gives us the standard error for the slope of the regression line. You could view this as the estimate of the standard deviation of the sampling distribution of the slope of the regression line. Remember, we took a sample of 20 folks here, and we calculated a statistic, which is the slope of the regression line. Every time you do a different sample, you will likely get a different slope, and this slope is an estimate of some true parameter in the population. This would sometimes also be called the standard error of the slope of the least squares regression line.

Now, these last two columns you don't have to worry about in the context of this video. This is useful if you were saying, "Well, assuming that there is no relationship between caffeine intake and time studying, what is the associated t statistic for the statistics that I actually calculated? And what would be the probability of getting something that extreme or more extreme, assuming that there is no association, assuming that, for example, the actual slope of the regression line is zero?" And this says, well, the probability, if we would assume that, is actually quite low. It's about a one percent chance that you would have gotten these results if there truly was not a relationship between caffeine intake and time studying.

But with all of that out of the way, let's actually answer the question. Well, to construct a confidence interval around a statistic, you would take the value of the statistic that you calculated from your sample, so 0.164, and then it would be plus or minus a critical t value. This would be driven by the fact that you care about a 95 percent confidence interval and by the degrees of freedom, and I'll talk about that in a second.

Then you would multiply that times the standard error of the statistic, and in this case, the statistic that we care about is the slope. So this is 0.057 times 0.057. The reason why we're using a critical t value instead of a critical z value is because our standard error of the statistic is an estimate. We don't actually know the standard deviation of the sampling distribution.

So the last thing we have to do is figure out what is this critical t value. You can figure it out using either a calculator or using a table. I'll do it using a table, and to do that we know what the degrees of freedom are. When you're doing this with a regression slope like we're doing right now, your degrees of freedom are going to be the number of data points you have minus 2.

So our degrees of freedom are going to be 20 minus 2, which is equal to 18. I'm not going to go into a bunch of depth right now—it actually is beyond the scope of this video for sure—as to why you subtract 2 here. But just so that we can look it up on a table, this is our degrees of freedom. We care about a 95 confidence level that's equivalent to having a two and a half percent tail on either side.

Our degrees of freedom is 18, so our critical t value is 2.101. Thus, our 95 percent confidence interval is going to be 0.164 plus or minus our critical t value, 2.101, times the standard error of the statistic—which I’ll just put in parentheses, 0.057.

You could type this into a calculator if you wanted to figure out the exact values here, but the way to interpret a 95 confidence interval is that 95 of the time that you calculate a 95 confidence interval, it is going to overlap with the true value of the parameter that we are estimating.

More Articles

View All
Hated, Ignored, Rejected & Happy: A Video for Outcasts (based on Black Mirror’s ‘Nosedive’)
Do we need a good reputation to be happy? The Black Mirror episode ‘Nosedive’ takes place in a futuristic world in which reputation is the main currency. The story revolves around a young woman named Lacie who desperately wants to raise her social credibi…
An Urgent Warning For Investors | The Coming Recession
What’s up guys, it’s Graham here. So, I think it’s about time that we address a topic that I’m sure a lot of us have considered, and that would be an upcoming recession. After all, in the last few weeks, the yield curve began to flatten as an early recess…
Sam Altman - How to Succeed with a Startup
Okay, today I’m going to talk about how to succeed with a startup. Obviously, more than can be said here in 20 minutes, but I will do the best I can. The most important thing, the number one lesson we try to teach startups, is that the degree to which you…
How To Get Rich
world won’t get there by making a social media platform. You aren’t Mark Zuckerberg. The reason these men got to where they are today is because they took a path that no one else ventured down. They made really stupid decisions that led to better decision…
Proof: parallel lines have the same slope | High School Math | Khan Academy
What I want to do in this video is prove that parallel lines have the same slope. So let’s draw some parallel lines here. So that’s one line, and then let me draw another line that is parallel to that. I’m claiming that these are parallel lines. Now I’m …
Dr. Luis von Ahn (Duolingo) & Sal Khan share tips for effective digital learning | Homeroom with Sal
Hi everyone! Sal Khan here from Khan Academy. Welcome to our daily homeroom. For those of y’all who are new to this, this is something that we started doing a few weeks ago as we started seeing the math school closures. Obviously, Khan Academy is a not-f…