Examples identifying conditions for inference on two proportions | AP Statistics | Khan Academy
A sociologist suspects that men are more likely to have received a ticket for speeding than women are. The sociologist wants to sample people and create a two-sample z interval. In other videos, we introduce what that idea is: to estimate the difference between the proportion of men who have received a speeding ticket and the proportion of women who have received a speeding ticket.
Which of the following are conditions for this type of interval? Choose all answers that apply. So, like always, pause this video and see if you can answer it on your own.
All righty, let's review our conditions for inference. So, you have your random condition. These are the same ones that we have talked about when we were dealing with one sample, but now we just have to make sure that it applies to both samples. That both samples, we feel good, are randomly selected.
The second one is the normal condition, and this is to feel good that the sampling distribution of the sample proportion for each of the samples is roughly normal. What you have to do is take the sample size of the first sample times the sample proportion of the first sample, and that needs to be greater than or equal to 10. You take the sample size of the first sample times one minus the sample proportion of the first sample; that should also be greater than or equal to 10.
Another way to think about it is your best sense of the expected number of successes and failures should be greater than or equal to 10. Then, you do this with the second sample. So, the sample size of the second sample—these don't have to be the same—times the sample proportion of the second sample should be greater than or equal to 10 as well. The sample size of the second sample times one minus the sample proportion of the second sample that needs to be greater than or equal to 10.
This has to be, and all of this needs to be true. The final one is the independence condition, and we meet that if individual observations in these samples are done with replacement, or even if they're not done with replacement, but if the samples are no more than 10% of the population, then we meet the independence condition.
Once again, you've seen this before; we're now just doing it with two samples. So, let's see which of the following are conditions for this type of interval.
So, the samples both include at least 10 people who have received a speeding ticket and at least 10 people who haven't. Yeah, that's right! You could view this as the expected number of people who have received a speeding ticket, and this is the expected number of people who haven't received a speeding ticket—or our estimate of the expected number—because we're using the sample proportion instead of the true proportion. So, these need to be greater than or equal to 10 in both samples. This is absolutely true.
The people in each sample can be considered independent. Yeah, we have that independence condition! Either they're sampled with replacement, or we are sampling no more than 10 percent of the population, so this is important.
And then, last but not least, they take separate random samples of men and women. Yeah, that's the random condition right over here. So, they have all three of them right over here: we have our normal condition, our independent condition, and our random condition.
Let's do another example. A biologist is studying a certain disease affecting oak trees in a forest. They are curious if there is a difference in the proportion of trees that are infected in the north and south sections of the forest. They want to take a sample of trees from each section and do a two-sample z-test to test their hypotheses. Which of the following are conditions for this type of test? So, pause the video again and see if you can answer this.
Okay, so we've already reviewed our conditions for inference. So, let's see which of these are the actual conditions for inference.
So, both samples include at least 30 trees. This might have been tempting because this 30 number shows up when we're thinking about conditions for inference when we're dealing with means, but this does not come up when we're dealing with proportions. Both samples do not need to include at least 30 trees. So, this would not be one of our choices.
They sample an equal number of trees from each region of the forest. This is a very common misconception that when you're doing a two-sample z-test or when you're doing a two-sample z interval or confidence interval that both samples have to have the same sample size, but that is actually not the case. So, we can rule that one out.
They observe at least 10 trees with the disease and at least 10 trees without the disease in each sample. Yes, this is the normal condition that we just looked at, so this would be our only choice, and we're done.