How parameters change as data is shifted and scaled | AP Statistics | Khan Academy
So I have some data here in a spreadsheet. You could use Microsoft Excel or you could use Google spreadsheets, and we're going to use the spreadsheet to quickly calculate some parameters. Let's say this is the population. Let's say this is—we're looking at a population of students, and we want to calculate some parameters. This is their ages, and we want to calculate some parameters on that.
So first, I'm going to calculate it using the spreadsheet, and then we're going to think about how those parameters change as we do things to the data. If we were to shift the data up or down, or if we were to multiply all the points by some value, what does that do to the actual parameters? The first parameter I'm going to calculate is the mean. Then I'm going to calculate the standard deviation. Then I want to calculate the median, and then I want to calculate, let's say, the interquartile range—I'll call it IQR.
So let's do this. Let's first look at the measures of central tendency. So the mean—the function on most spreadsheets is the average function. Then I could use my mouse and select all of these, or I could press shift with my arrow button and select all of those. Okay, that's the mean of that data. Now let's think about what happens if I take all of that data and if I were to add a fixed amount to it.
So if I took all the data and if I were to add five to it, an easy way to do that in a spreadsheet is you select that, you add five, and then I can scroll down and notice for every data point I had before, I now have five more than that. So this is my new data set, which I'm calling data plus 5, and let's see what the mean of that is.
So the mean of that, notice, is exactly five more, and the same would have been true if I added or subtracted any number. The mean would change by the amount that I add or subtract, and that shouldn't surprise you because when you're calculating the mean, you're adding all the numbers up and you're dividing by the numbers you have. So if all the numbers are five more, you're going to add five in this. How many numbers are there? 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12. You're going to add 12 more fives, and then you're going to divide by 12.
So it makes sense that your mean goes up by five. Let's think about how the mean changes if you multiply. If you take your data and if I were to multiply it times five, what happens? So this equals this times 5. So now all the data points are five times more. Now what happens to my mean? Notice my mean is now five times as much.
So the measures of central tendency—if I add or subtract, well, I'm going to add or subtract the mean by that amount. If I scale it up by five, or if I scaled it down by five, well, my mean would scale up or down by that same amount. If you numerically looked at how you calculate a mean, it would make sense that this is happening mathematically.
Now, let's look at the other measure, the other typical measure of central tendency, and that is the median, to see if that has the same properties. So let's calculate the median here. So once again, you want to order these numbers and just find the middle number, which isn't too hard, but a computer can do it awfully fast. So that's the median for that data set.
What do you think the median is going to be if you take all of the data plus five? Well, the middle number—if you ordered all of these numbers and made them all five more, the order, you could think of it as being the same order, but now the one in the middle is going to be five more. So this should be 10.5, and yes, it is indeed 10.5.
What would happen if you multiply everything by five? Well, once again, you still have the same ordering, and so it should just multiply that by five. Yep, the middle number is now going to be five times larger. So both of these measures of central tendency—if you shift all the data points or if you scale them up, you're going to similarly shift or scale up these measures of central tendency.
Now let's think about these measures of spread and see if that's the same with these measures of spread. So standard deviation—so stdev, I'm going to take the population standard deviation. I'm assuming that this is my entire population. So let me—why is it so? Let me make sure I'm doing—so standard deviation of all of this is going to be 2.99.
Let's see what happens when I shift everything by five. Actually, pause the video! What do you think is going to happen? This is a measure of spread, so if you shift—I’ll tell you what I think. If I shift everything by the same amount, the mean shifts, but the distance of everything from the mean should not change. So the standard deviation should not change, I don't think, in this example, and indeed, it does not change.
So if we shift the data sets—in this case, we shifted it up by five or we shifted it down by one—your measure of spread, in this case, standard deviation, should not change. Or at least the standard deviation, the measure of spread does not change. But if we scale it, well, I think it should change because you can imagine a very simple data set that things that were a certain amount of distance from the mean are now going to be five times further from the mean.
So I think this actually should—we should multiply by five here, and it does look like that is the case if I multiply this by five. So scaling the data set will scale the standard deviation in a similar way. What about the interquartile range, where essentially we're taking the third quartile and subtracting from that the first quartile to figure out kind of the range of the middle 50%?
Let’s do that. We can have the quartile function equals quartile, and then we want to look at our data, and we want the third quartile. So that's going to calculate the third quartile minus quartile—same data set. So now we want to select it again, so same data set, but this is now going to be the first quartile. So this is going to give us our interquartile range. This calculates the third quartile on that data set, and this calculates the first quartile on that data set, and we get 2.75.
Now let's think about whether the interquartile range should change. I don't think it will because remember, everything shifts. Even though the first quartile is going to be five more, the third quartile is going to be five more as well, so the difference shouldn't change. And indeed, look—the distance does not change—or the difference does not change.
But similarly, if we scale everything up, if we were to scale up the first quartile and the third quartile by five, well then their difference should scale up by five, and we see that right over there. So the big takeaway here—and I just use the example of shifting up by five and scaling up by five—but you could subtract by any number, and you could divide by a number as well.
Your typical measures of central tendency—mean and median—they both shift and scale as you shift and scale the data. But your typical measures of spread—standard deviation and interquartile range—they don't change if you shift the data, but they do change, and they scale as you scale the data.