yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Impact of removing outliers on regression lines | AP Statistics | Khan Academy


4m read
·Nov 11, 2024

The scatter plot below displays a set of bivariate data along with its least squares regression line. Consider removing the outlier at (95, 1). So, (95, 1) we're talking about that outlier right over there and calculating a new least squares regression line. What effects would removing the outlier have? Choose all answers that apply.

Like always, pause this video and see if you could figure it out. Well, let's see. Even with this outlier here, we have an upward sloping regression line, and so it looks like our R already is going to be greater than zero, and of course, it's going to be less than one. So our R is going to be greater than zero and less than one. We know it's not going to be equal to one because then we would go perfectly through all of the dots. It's clear that this point right over here is indeed an outlier.

The residual between this point and the line is quite high. We have a pretty big distance right over here; it would be a negative residual. So this point is definitely bringing down the R, and it's definitely bringing down the slope of the regression line. If we were to remove this point, we're more likely to have a line that looks something like this, in which case it looks like we would get a much, a much, much better fit. The only reason why the line isn't doing that is it's trying to get close to this point right over here.

So if we remove this outlier, our R would increase. So R would increase, and also the slope of our line would increase. The slope would increase. We'd have a better fit to this positively correlated data, and we would no longer have this point dragging the slope down anymore.

So let's see which choices apply. The coefficient of determination R squared would increase. Well, if R would increase, then squaring that value would increase as well, so I will circle that. The correlation coefficient R would get close to zero? No, in fact, it would get closer to one because we would have a better fit here, and so I will rule that out.

The slope of the least squares regression line would increase? Yes, indeed, this outlier is pulling it down. If you take it out, it'll allow the slope to increase, so I will circle that as well.

Let's do another example. The scatter plot below displays a set of bivariate data along with its least squares regression line. Same idea: consider removing the outlier at (10, 8). So, we're talking about that point there and calculating a new least squares regression line. So what would happen this time?

As is, without removing this outlier, we have a negative slope for the regression line, so we're dealing with an R. We already know that -1 is less than R, which is less than zero. Without even removing the outlier, we know it's not going to be negative 1. If it was negative, if R was exactly negative one, then it would be a downward sloping line that went exactly through all of the points.

But if we remove this point, what's going to happen? Well, this least squares regression is being pulled down here by this outlier. So if you were to remove this point, the least squares regression line could move up on the left-hand side, and so you'll probably have a line that looks more like that. And I'm just hand-drawing it, but even what I hand drew looks like a better fit for the leftover points.

So clearly, the new line that I drew after removing the outlier has a more negative slope. So removing the outlier would decrease R. R would get closer to 1; it would be closer to being a perfect negative correlation, and also it would decrease the slope.

Which choices match that? The coefficient of determination R squared would decrease. So let's be very careful. R was already negative; if we decrease it, it's going to become more negative. If you square something that is more negative, it's not going to become smaller.

Let's say before you remove the data point R was—I'm just going to make up a value—let's say it was 0.4, and then after removing the outlier, R becomes more negative, and it's going to be equal to 0.5. Well, if you square this, this would be positive 0.16, while this would be positive 0.25. So if R is already negative, and if you make it more negative, it would not decrease R squared; it actually would increase R squared, so I will rule this one out.

The slope of the least squares regression line would increase? Nope, it's going to decrease. It's going to be a stronger negative correlation; rule that one out. The Y intercept of the least squares regression line would increase? Yes. By getting rid of this outlier, you could think of it as the left side of this line is going to increase.

Another way to think about it is the slope of this line is going to decrease; it's going to become more negative. We know that the least squares regression line will always go through the mean of both variables, so we're just going to pivot around the mean of both variables, which would mean that the Y intercept will go higher. So I will fill that in.

More Articles

View All
Kevin O'Leary's Predictions for 2022: Are we ready for what's coming next year?
[Music] He is the chairman of O’Leary Financial Group. He is a Shark Tank investor. He is a friend of the show. Mr. Wonderful is back to give us his, uh, I guess wrap up on what has been a pretty impressive year to say the least. Kevin will have, uh, you …
How Much Money I Make Selling Merch
What’s up guys? It’s Graham here. So, about 10 months ago, my buddy and I met up for lunch and came up with a wild original concept that’s never been done before here on YouTube: selling merch. After all, it seems like pretty much every YouTuber is doing …
15 Things You Didn't Know About LONGINES
This is Fashion Fridays! Every Friday, we present you with a fashion icon or topic. Today, we’re looking at 15 things you didn’t know about Longines. Welcome to a Luxe, the place where future billionaires come to get informed. Hello, a Luxors! Today, we’…
Homeroom with Sal & Meaghan Pattani - Tuesday, July 7
Hi everyone! Welcome to the Khan Academy homeroom. For those who are wondering what this is, this is just a forum for all of us to stay together, especially since it was started when schools closed. Obviously, summer has arrived, and I announced that scho…
Common denominators: 1/2 and 1/3 | Math | 4th grade | Khan Academy
You have two fractions: 1⁄4 and 5⁄6, and you want to rewrite them so they have the same denominator and have whole number numerators. What numbers could you use for the denominator? So here’s our fractions: 1⁄4 and 5⁄6, and we want to rewrite these fract…
Expedition Everest: The Science - 360 | National Geographic
[Music] Everest is an iconic place. To be able to search the changes this high up is critically important to science. Once you get to about 5,000 meters or around base camp, you are above where most of the science on the planet has been done. The big goal…