yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Introduction to residuals and least-squares regression | AP Statistics | Khan Academy


3m read
·Nov 11, 2024

Let's say we're trying to understand the relationship between people's height and their weight. So what we do is we go to 10 different people and we measure each of their heights and each of their weights.

And so on. This scatter plot here, each dot represents a person. So, for example, this dot over here represents a person whose height was 60 inches or 5 feet tall. So that's the point (60, ) and whose weight, which we have on the y-axis, was 125 pounds.

And so when you look at this scatter plot, your eyes naturally see some type of a trend. It seems like, generally speaking, as height increases, weight increases as well. But I said, generally speaking, you definitely have circumstances where there are taller people who might weigh less.

But an interesting question is, can we try to fit a line to this data? This idea of trying to fit a line as closely as possible to as many of the points as possible is known as linear regression. Now, the most common technique is to try to fit a line that minimizes the squared distance to each of those points.

And we're going to talk more about that in future videos, but for now, we want to get an intuitive feel for that. So if you were to just eyeball it and look at a line like that, you wouldn't think that it would be a particularly good fit. It looks like most of the data sits above the line.

Similarly, something like this also doesn't look that great. Here, most of our data points are sitting below the line. But something like this actually looks very good. It looks like it's getting as close as possible to as many of the points as possible.

It seems like it's describing this general trend, and so this is the actual regression line. The equation here we would write as, and we'd write y with a little hat over it, and that means that we're trying to estimate a y for a given x.

It's not always going to be the actual y for a given x because, as we see, sometimes the points aren't sitting on the line. But we say y hat is equal to, and our y-intercept for this particular regression line is negative 140 plus the slope 14 over 3 times x.

Now, as we can see, for most of these points, given the x value of those points, the estimate that our regression line gives is different than the actual value. And that difference between the actual and the estimate from the regression line is known as the residual.

So let me write that down. So, for example, the residual at that point is going to be equal to, for a given x, the actual y value minus the estimated y value from the regression line for that same x.

Or another way to think about it is, for that x value when x is equal to 60, we're talking about the residual just at that point. It's going to be the actual y value minus our estimate of what the y value is from this regression line for that x value.

So pause this video and see if you can calculate this residual, and you could visually imagine it as being this right over here. Well, to actually calculate the residual, you would take our actual value, which is 125 for that x value.

Remember, we're calculating the residual for a point, so it's the actual y there minus what would be the estimated y there for that x value. Well, we could just go to this equation and say what would y hat be when x is equal to 60?

What's it going to be equal to? Let's see. We have negative 140 plus 14 over 3 times 60. Let's see, 60 divided by 3 is 20. 20 times 14 is 280. And so all of this is going to be 140.

And so our residual for this point is going to be 125 minus 140, which is negative 15. And residuals indeed can be negative. If your residual is negative, it means for that x value, your data point, your actual y value is below the estimate.

If we were to calculate the residual here or if we were to calculate the original here, our actual for that x value is above our estimate, so we would get positive residuals. And as you will see later in your statistics career, the way that we calculate these regression lines is all about minimizing the square of these residuals.

More Articles

View All
3D Photographs Of Things We Have Lost
Just a few years after this photograph was taken, the quagga, a subspecies of zebra, was hunted to extinction. This is actually one of the final two photographs ever taken of the quagga; the other was taken at the exact same moment, just a few inches to t…
How to sell 2 corporate jets worth a combined value of $85,000,000.
I need two planes. First of all, one that can do real long distance. I’m talking 12 hours, either a 6,000, 6,500, 7X, 8X, or a 650. Okay, if I buy a 6,000, on top of that, it could be another 25 million. So, both put together would be 85. The other optio…
Stoic Lessons People Learn Too Late in Life | You'll Not Regret Watching This Video
Have you ever wondered what lessons many people learn too late in life? Get ready, because in this video I’m going to reveal those lessons from stoicism, offering you powerful tools to face challenges and grow as an individual. Now, if you are new here, p…
Diana Hu on Augmented Reality and Building a Startup in a New Market
All right, Diana! Whoo! Welcome to the podcast. Thank you for having me here. Correct, so maybe we should start from now and then go backward in time. So, you’re working on AR at Niantic after your company, Escher Reality, has been acquired. How did you s…
Homeroom with Sal & Katy Knight - Tuesday, October 13
Hi everyone, Sal here from Khan Academy. Welcome to the Homeroom live stream! We had a little bit of a hiatus, but now we are back. I had a torn calf and other things, but I’m almost fully recuperated. But thanks for joining! We have a really exciting con…
Dipole–dipole forces | Intermolecular forces and properties | AP Chemistry | Khan Academy
So, I have these two molecules here: propane on the left and acetaldehyde here on the right. We’ve already calculated their molar masses for you, and you see that they have very close molar masses. Based on what you see in front of you, which of these do …