Example: Describing a distribution | AP Statistics | Khan Academy
Sometimes in life, like say on an exam in particular, like an AP exam, you might be asked to describe or compare a distribution. So we're going to get an example of doing that right over here. Sometimes in life, say on an exam, especially on something like an AP exam, you're asked to describe or compare a distribution. What we're going to do in this video is do exactly that. In fact, this one we're going to describe, and then in a future video, we're going to compare distributions.
Now, before we even read about this distribution or look at this distribution, if you're asked to describe a distribution, there are four things that you should be thinking about. You should be thinking about the shape of the distribution, and when we're talking about shape, it's going to be—there could be left skew, there could be right skew, and we'll see examples of these, and we've talked about them in detail in other videos. They could be symmetric. These are the ones that we typically see, although there might be other types of shapes.
You will have your center of distribution, and there are multiple ways of thinking about the center of distribution. We've talked about this before: you have your mean, you have your median—these are the two most typical ones. You have a notion of spread, and for spread, you could use range; you could use interquartile range; you could use something like mean absolute deviation; you could use the standard deviation. These are all measures of spread. Then, you probably should at least comment about outliers—even if you don't see them, it's a good idea to comment just to make sure that you are being relatively comprehensive.
So now given that, let's do—let's describe the distribution right over here. It says, "In the state of Connecticut, the Department of Motor Vehicles (the DMV) requires 16 and 17-year-olds to take a 25-question knowledge test in order to obtain a learner's permit. To pass, prospective drivers must correctly answer at least 20 questions. On one Monday, 22 teenagers took the test. The dot plot below shows their scores."
So why don't you pause this video and see if you can take a shot at describing the shape, the center, the spread, and the outliers? Some of these you might be able to come up with the actual numbers; you might be able to calculate some of these, but really just to get a sense of it, why don't you take a shot at it?
All right, now let's do this together. So first on the shape: what we see is we have—most of the distribution is in this part between 20 and 25, but then we have this fairly long tail to the left. This tells us that we have a left skew, or it is a left-skewed distribution right over here. So we have done the shape—it's a left-skewed distribution because the tail goes to the left.
Now, what about the center of this distribution? There are a few ways to measure center—mean or median. Just for the sake of simplicity, I'll think about the median here. I can actually do—I can eyeball that to some degree; you could also calculate the mean. It would take a little bit more time. I would guess that it's someplace—not even calculating it, I would guess that it's someplace in this range right over there.
But let me actually calculate it. So the median—there are 22 data points, so the median is whatever number has 11 on to the right of it and 11 to the left—half of 22. So let's see. We have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. So the median here is going to be—let's see, this is 23. We have a bunch of 23s: 1, 2, 3, 4, 5, 6—six 23s.
If we were to just order all of the data points, 11 of the data points would be 23 or less and then 11 would be 23 or more. So our median here—I could say our center is 23 if we use the median. Actually, let me write that down. So our median is 23. That's the measure of center that I decided to use.
Now, what about spread? Well, the simplest measure of spread is just the range, which is the highest value minus the lowest value. So our range here would be 25 minus 4. 25 minus 4 is equal to 21. So that is a measure of range. You could have others, but this one is very easy to calculate.
Then if we think about outliers, well, there are a few outliers I would consider, and that's very subjective. People can debate—you know, if there's a dot right over there, is that an outlier or not? But I would say that these four right over here—I would consider outliers. So I would say approximately four outliers. But once again, this is subjective.
The main point of this exercise is to just get in the habit of thinking about these things. Statistics is all about creating—engineering, one could say—different measurements for center, for spread, different ways to describe the shape. But the point is to just think about these various dimensions.