yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Standard deviation of residuals or root mean square deviation (RMSD) | AP Statistics | Khan Academy


4m read
·Nov 11, 2024

So we are interested in studying the relationship between the amount that folks study for a test and their score on a test, where the score is between zero and six.

What we're going to do is go look at the people who took the tests. We're going to plot for each person the amount that they studied and their score. For example, this data point is someone who studied an hour, and they got a one on the test. Then we're going to fit a regression line, and this blue regression line is the actual regression line for these four data points. Here is the equation for that regression line.

Now, there are a couple of things to keep in mind. Normally, when you're doing this type of analysis, you do it with far more than four data points. The reason why I kept this to four is because we're actually going to calculate how good a fit this regression line is by hand, and typically you would not do it by hand; we have computers for that.

The way that we're going to measure how good a fit this regression line is to the data has several names. One name is the standard deviation of the residuals; another name is the root mean square deviation, sometimes abbreviated as RMSD. Sometimes it's called root mean square error.

So what we're going to do is, for every point, we're going to calculate the residual. Then we're going to square it and add up the sum of those squared residuals. We're going to take the sum of the residuals squared, and then we're going to divide that by the number of data points we have minus two. We can talk in future videos or a more advanced statistics class about why you divide by two, but it's related to the idea that what we're calculating here is a statistic, and we're trying to estimate a true parameter as best as possible.

N minus 2 actually does the trick for us. To calculate the root mean square deviation, we would then take the square root of this. Some of you might recognize strong parallels between this and how we calculated sample standard deviation early in our statistics career, and I encourage you to think about it.

But let's actually calculate it by hand, as I mentioned earlier in this video, to see how things actually play out. To do that, I'm going to give ourselves a little table here. Let's say that is our x value in that column. Let's make this our y value. Let's make this y hat, which is going to be equal to 2.5x minus 2.

Then let's make this the residual squared, which is going to be our y value minus our y hat value; our actual minus our estimate for that given x, squared. Then we're going to sum them all up, divide by n minus 2, and take the square root.

So first, let's do this data point: that's the point 1, 1. Now, what is the estimate from our regression line? For that x value, when x is equal to 1, it's going to be 2.5 times 1 minus 2. So it's going to be 2.5 times 1 minus 2, which is equal to 0.5.

Our residual squared is going to be 1 minus 0.5, which is equal to 0.5 squared, which is going to be 0.25. All right, let's do the next data point. We have this one right over here; it is 2, 2. Now our estimate from the regression line when x equals 2 is going to be equal to 2.5 times our x value (which is 2) minus 2, which is going to be equal to 3.

So our residual squared is going to be 2 minus 3, then squared. This is negative 1 squared, which is going to be equal to 1. Then we can go to this point; that's the point 2, 3. Now, our estimate from our regression line is going to be 2.5 times our x value (which is 2) minus 2, which is going to be equal to 3.

So our residual here is going to be zero, and you can see that that point sits on the regression line. It's going to be 3 minus 3, squared, which is equal to 0. Then, last but not least, we have this point right over here: when x is 3, our y value is, this person studied 3 hours, and they got a 6 on the test. So y is equal to 6.

Our estimate from the regression line, based on that regression line, is going to be 2.5 times our x value (which is 3) minus 2, which is equal to 5.5. Our residual squared is going to be 6 minus 5.5, squared, which is 0.5 squared, which is 0.25.

Now, the next step: let me take the sum of all of these squared residuals. So this can be written as follows: the sum of the residuals squared is equal to, if I just sum all of this up, it's going to be 1.5.

If I divide that by n minus 2, that's going to be equal to, I have four data points, so I'm going to divide by 4 minus 2. I'm going to divide by 2, and then I'm going to want to take the square root of that.

This is going to get us 1.5 over 2, which is the same thing as 3/4. So it's the square root of three-fourths or the square root of 3 over 2. You could use a calculator to figure out what that is as a decimal.

But this gives us a sense of how good a fit this regression line is. The closer this is to zero, the better the fit of the regression line; the further away from zero, the worse the fit. What would be the units for the root mean square deviation?

Well, it would be in terms of whatever your units are for your y-axis. In this case, it would be the score on the test, and that's one of the other values of this calculation of taking the square root of the sum of the squares of the residuals divided by n minus 2.

So, big picture: this square root of 3 over 2 can be viewed as the approximate size of a typical or average prediction error between these points and what the regression line would have predicted. Or you could view it as the approximate size of a typical or average residual.

More Articles

View All
2015 AP Chemistry free response 2c | Thermodynamics | Chemistry | Khan Academy
Because the dehydration reaction is not observed to occur at 298 Kelvin, the student claims that the reaction has an equilibrium constant less than 1.00 at 298 Kelvin. Do the thermodynamic data for the reaction support the student’s claim? Justify your an…
15 Luxuries in Life You Have Access To (Are You Using Them?)
You know, luxuries used to be about the things we couldn’t have. They were aspirational, always out of reach, and reserved for the elite. They elevated people’s lives far beyond the ordinary. But our definition of luxury has changed. Those first two facto…
Announcing Work at a Startup
Alright guys, so we are here today to talk about work at a startup. Let’s really quickly do some introductions. So Jared, why don’t you start? Hey, I’m Jared. I’m a partner here at YC. The way I got into YC was I did a YC company in one of the earliest b…
Flying the Piaggio at 41,000 Feet (Max Altitude!)
Hello from beautiful Jackson Hall, Wyoming, one of my all-time favorite airports to fly out of. We’re back in the Piaggio; you guys have been asking for more content with this thing, so here we are. Today, we’re going to push this airplane to its limits, …
Ask me anything with Sal Khan: April 16 | Homeroom with Sal
Hi everyone! Sal Khan here from Khan Academy. Welcome to our daily homeroom livestream. The whole goal of this is for all of us to stay connected during times of school closures. Depending on the day, this is a time for all of y’all to ask questions of my…
The 8 Greatest Philosophical Theories You Need to Know
You are a chicken. Yes, you. You look around and sometimes wonder why your owner takes such good care of you. At first, you’re not sure; you’re skeptical. What if he sends you to the slaughterhouse? You’ve never been there, but you know very well none of …