yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Standard deviation of residuals or Root-mean-square error (RMSD)


4m read
·Nov 11, 2024

What we're going to do in this video is calculate a typical measure of how well the actual data points agree with a model—in this case, a linear model. There are several names for it; we could consider this to be the standard deviation of the residuals, and that's essentially what we're going to calculate. You could also call it the root mean square error, and you'll see why it's called this because this really describes how we calculate it.

So what we're going to do is look at the residuals for each of these points, and then we're going to find the standard deviation of them. Just as a bit of review, the ith residual is going to be equal to the y-value for a given x minus the predicted y value for a given x. Now, when I say y-hat right over here, this just says what would the linear regression predict for a given x, and this is the actual y for given x.

So for example—and we've done this in other videos—this is all review. The residual here, when x is equal to 1, we have y equal to 1, but what was predicted by the model is 2.5 times 1 minus 2, which is 0.5. So, 1 minus 0.5; this residual here is equal to 1 minus 0.5, which is equal to 0.5. It's a positive 0.5, and if the actual point is above the model, you're going to have a positive residual.

Now, the residual over here, you also have the actual point being higher than the model, so this is also going to be a positive residual. Once again, when x is equal to 3, the actual y is 6. The predicted y is 2.5 times 3, which is 7.5 minus 2, which is 5.5. So you have 6 minus 5.5; here, I'll write residual is equal to 6 minus 5.5, which is equal to 0.5. So once again, you have a positive residual.

Now for this point that sits right on the model, the actual is the predicted. When x is 2, the actual is 3, and what was predicted by the model is 3. So the residual here is equal to the actual is 3, and the predicted is 3, so it's equal to zero. Last but not least, you have this data point where the residual is going to be the actual. When x is equal to 2, it is 2 minus the predicted.

Well, when x is equal to 2, you have 2.5 times 2, which is equal to 5 minus 2, which is equal to 3. So 2 minus 3 is equal to negative 1. When your actual is below your regression line, you're going to have a negative residual, so this is going to be negative 1 right over there.

Now we can calculate the standard deviation of the residuals. We're going to take this first residual, which is 0.5, and we're going to square it. We're going to add it to the second residual right over here. I'll use this blue with this teal color: that's zero; I'm gonna square that. Then we have this third residual, which is negative 1, so plus negative 1 squared. Finally, we have that fourth residual, which is 0.5 squared; 0.5 squared.

So once again, we took each of the residuals—which you could view as the distance between the points and what the model would predict—we are squaring them. When you take a typical standard deviation, you're taking the distance between a point and the mean. Here, we're taking the distance between a point and what the model would have predicted, but we're squaring each of those residuals and adding them all up together.

Just like we do with the sample standard deviation, we are now going to divide by one less than the number of residuals we just squared and added. We have four residuals; we're gonna divide by four minus one, which is equal to, of course, three. You could view this part as a mean of the squared errors, and now we're going to take the square root of it.

So let's see, this is going to be equal to the square root of—this is 0.25, 0.25—this is just 0—this is going to be positive 1—and then this 0.5 squared is going to be 0.25, 0.25—all of that over 3. Now, this numerator is going to be 1.5 over 3. So this is going to be equal to 1.5, which is exactly half of three, so we could say this is equal to the square root of one half.

This is one over the square root of two; one divided by the square root of two, which gets us 2. If we round to the nearest thousandths, it's roughly 0.707, so approximately 0.707. If you wanted to visualize that, one standard deviation of the residuals below the line would look like this, and one standard deviation above the line for any given x value would go one standard deviation of the residuals above it. It would look something like that, and this is obviously just a hand-drawn approximation.

But you do see that this does seem to be roughly indicative of the typical residual. Now, it's worth noting sometimes people will say it's the average residual, and it depends on how you think about the word average because we are squaring the residuals. So outliers—things that are really far from the line—when you square it are going to have a disproportionate impact.

If you didn't want to have that behavior, we could have done something like find the mean of the absolute residuals. That actually, in some ways, would have been a simpler one, but this is a standard way of people trying to figure out how much a model disagrees with the actual data. You can imagine the lower this number is, the better the fit of the model.

More Articles

View All
Telling History: Behind the Scenes | Killing Reagan
What we strove to do, what any filmmaker should strive to do when they’re doing a period piece, is to be authentic and to be absolutely real. “Get out of here, Road’s okay! Stage Coach rolling! The crow that stage Co are you hit!” “Damn it, Jerry! I thi…
Ron Conway at Startup School SV 2014
He’s back for a day or an hour. There’s lights behind that thing. Um, okay, so I interviewed Ron on this stage. We’re on stage at Startup School in 2012, and the video’s on YouTube. And Ron told a lot of the good stories then, so I’m not gonna ask him abo…
Generation Plastic | Plastic on the Ganges
[Music] Hey, [Music] but it has changed now. Everything has changed. [Music] We used to make everything, like our tools, plates, and cups out of natural materials, but now everything is plastic. [Music] All of this dirtiness is coming from the garbage. It…
Witness to Steve Irwin's Death - Smarter Every Day116
Hey it’s me Destin, welcome back to Smarter Every Day. So I think we will all agree that Steve Irwin was one of the best science communicators that has ever existed. I mean he knew the knowledge and it was like a fire in his bones; he had to share it wit…
GOING SUPERSONIC with U.S. Air Force Thunderbirds! Pulling 7 G's in an F-16 -Smarter Every Day 235
Destin: Hey, it’s me, Destin. Welcome back to Smarter Every Day. Today, we’re going to hang out with the Thunderbirds of the US Air Force. We’re going to see if we can break the sound barrier. The temptation, when you’re making a video about yourself flyi…
Life is a Game: This is how you win it
Most people you know are not aware that life is a game meant to be won. That’s why you see them feeling stuck, tired, and bored. Well, by the end of this video, not only will you understand the purpose of the game, but the rules and how to win it too. Li…