Significance test for a proportion free response example | AP Statistics | Khan Academy
We're told that some boxes of a certain brand of breakfast cereal include a voucher for a free video rental. Inside the box, the company that makes the cereal claims that a voucher can be found in 20% of boxes. However, based on their experiences eating the cereal at home, a group of students believes that the proportion of boxes with vouchers is less than 20%. This group of students purchased 65 boxes of the cereal to investigate the company's claim. The students found a total of 11 vouchers for free video rentals in the 65 boxes.
Suppose it is reasonable to assume that the 65 boxes purchased by the students are a random sample of all boxes of this cereal. Based on this sample, is there support for the students' belief that the proportion of boxes with vouchers is less than 20 percent? Provide statistical evidence to support your answer. And, like always, pause this video and see if you can answer it by yourself. This actually is a question from an AP Statistics exam.
All right, now let's work through this together, and I'm going to try to model some of what you might want to do if you're actually trying to answer this on an exam. So, the first thing you might want to say is, well, what's our null and our alternative hypothesis? Well, our null hypothesis would be, well, the reality is what the breakfast brand claims, that 20% of the boxes contain a voucher. So that would be our null hypothesis.
And our alternative hypothesis would be what we suspect: that the true proportion of boxes that contain a voucher is actually less than 20%. Now, if you're going to do a significance test, it's good practice to set up your significance level that you're going to eventually compare your p-value to ahead of time. So let's say we would want to assume a significance level. So let me write this: significance level alpha. Let's just go with 0.05.
Then, we'll want to think about the sample, and we're going to figure out if we assume that the null hypothesis is true, what's the probability that we get the sample proportion that we do? If that is below this significance level, then we would reject the null hypothesis. What we know about the sample: we know that we took 65 boxes of cereal, and n is equal to 65. They tell us that right over there.
From that, we can calculate what the sample proportion is. It's going to be 11 out of 65. We can get our calculator out; calculators are allowed on this part of the exam. What is 11 divided by 65? It gives us, and I'll just round to the nearest thousandth, 0.169. I'll say approximately, because I rounded it there.
Now, the next thing we want to do before we make an inference is to make sure we're meeting the conditions for inference. So I'll write this down over here: Conditions for inference. This is to feel good that we are properly sampling the population and that our sampling distribution is going to be roughly normal.
The first one is random sample. That is truly a random sample, and here they tell us it is reasonable to assume that the 65 boxes purchased by the students are a random sample. So that checks that off. I will just point that to that right over there, so that checks that off.
The next one is the normal condition, that the shape is roughly normal and isn't skewed dramatically one way or the other. In order to meet that condition, the sample size times the true assumed proportion—we're going to assume that the null hypothesis is true—and so we could say that this is the proportion assumed in the null hypothesis. That's what that notation would imply.
If you're doing this on the actual test, you should explain your use of notation a little bit more than I might do for the sake of time. But this needs to be greater than or equal to 10, and n times 1 minus the assumed proportion needs to be greater than or equal to 10. Well, let's see: n is 65, so 65 times the assumed proportion is 0.2. That is going to be equal to 13.
Thirteen is indeed greater than or equal to 10, so that checks off. Then, we would take n (65) times 1 minus the assumed proportion (0.8), and that is going to be equal to, let's see, that would just be 65 minus 13, which is going to be equal to 52, and that indeed is also greater than or equal to 10. So we've met that condition right over there.
Then the last one is the independence. We aren't sampling these boxes with replacement, so we need to feel good that they are less than 10% of the population of boxes. They don't tell us that explicitly, but it would be good practice to say going to assume more than, let's see, 10 times that—650 boxes in the population—which would imply that n is less than 10% of the population. This would allow us to check off the independence condition.
So, given that we've met our conditions for inference, now let's think about the sampling distribution. The sampling distribution of the sample proportions, because that's what we're going to use to calculate our p-value. So we know a few things about the sampling distribution of the sample proportions. We know that the mean of the sampling distribution of the sample proportions is just going to be the assumed true proportion, so that's the proportion from the null hypothesis.
We know that the standard deviation of the sampling distribution of the sample proportions is going to be equal to, and we've seen this in multiple videos already: this is the assumed proportion times 1 minus the assumed proportion from our null hypothesis, divided by n. In this case, this is going to be equal to 0.2 times 0.8, all of that over 65.
Once again, let's get our calculator out. We're going to have the square root of (0.2 times 0.8) divided by 65, and then close my parentheses. I get approximately 0.0496.
Now, the next step is to figure out the p-value, which we can then compare to our significance level to decide whether or not to reject the null hypothesis. In order to calculate the p-value, let's figure out our z-statistic, which is how many standard deviations above or below the mean of the sampling distribution is the sample statistic that we happen to get for this sample of 65. We have seen this in previous videos.
This would be equal to our sample proportion minus the assumed proportion for the population in the null hypothesis, so the difference between those, and then divided by the standard deviation of the sampling distribution of the sample proportions. This would tell us how many standard deviations are we above or below the mean of the sampling distribution.
So, in this particular situation, this is going to be 0.169 minus 0.2, all of that over this value right over here, which is approximately 0.0496. I can get the calculator out again, and so we have 0.169 minus 0.2. That's how far below our sample proportion is from the mean of the sampling distribution, which is the assumed proportion from the null hypothesis.
We divide that by the standard deviation of the sampling distribution of the sample proportions, so divide that by 0.0496. We get a z value of approximately—because remember, this is using a bunch of approximations—about negative 0.625. So, z is approximately negative 0.625.
Now we can think about the actual p-value. Our p-value is equal to the probability of getting a sample proportion that is at least as low as the one that we got, so a sample proportion that is less than or equal to the one that we got (0.169), assuming the null hypothesis is true. Thus, we could say: assuming the null hypothesis is true, this is equal to the probability of getting a z statistic that is less than or equal to this value right over here, negative 0.625.
Now, we can use our calculator to actually calculate this. What we can do is go to second distribution. We want to do normal cdf, so go to normalcdf. Our lower bound is actually going to be—we could say negative infinity. Our upper bound is going to be negative 0.625. This is where this is—you could say a normalized normal distribution here.
So we'll just go with all of this, because we're just thinking about the z statistic right over here. Click enter and then click enter. We get approximately 0.266.
So, this is approximately 0.266. Let's just make sure what we just did. If this right over here is the assumed sampling distribution of the sample proportions where we are assuming that our null hypothesis is true, the mean of our sampling distribution is going to be our assumed proportion. What we're saying is, look, we got a result over here; this is where our p-hat happened to be.
Right over here, what's the probability of getting a result that far below the true proportion or further? So this is what we calculated just now. Now, when you look at this, this is almost a 27% probability. When you compare our p-value, we're going to compare our p-value to our significance level.
We see that our p-value is clearly greater than our significance level. 0.266 is clearly greater than our significance level of 0.05. What we were saying is, if there was less than a 5% chance of getting the sample proportion that we got, then we would reject the null hypothesis, which would suggest the alternative.
But here, the probability of getting the sample proportion that we got, assuming that the null hypothesis is true, is almost 27%, and so that's well above our significance level. Therefore, we fail to reject our null hypothesis. From that, we can say there is not enough evidence to suggest our alternative hypothesis.
If you have time, you might want to say there's not enough evidence to suggest that less than 20 percent of the boxes have the free video rental voucher that they talk about in the original problem description.