Confidence interval for hypothesis test for difference in proportions | AP Statistics | Khan Academy
A university offers a certain course that students can take in person or in an online setting. Teachers of the course were curious if there was a difference in the passing rate between the two settings. Data from a recent semester showed that 80 percent of students passed the in-person setting and 75 percent of students passed the online setting. They were willing to treat these as representative samples of all students who may take each setting of the course.
The teachers used those results to make a 95% confidence interval to estimate the difference between the proportion of students who pass in each setting of the course. So, this is a 95% confidence interval for the difference between the proportion who passed the in-person course and the online course. The resulting interval was approximately from negative 0.04 to 0.14.
Just to make sure we understand what this is saying, this is saying 95% of the time that you go through this, because we're talking about a 95% confidence interval, 95% of the time you take these samples and then you construct a confidence interval for the difference in proportions, that it will actually contain the true proportion. They want to use this interval to test their null hypothesis that the true proportions are the same versus their alternative hypothesis that their true proportions are different.
Assume that all conditions for inference have been met. Based on the interval, what do we know about the corresponding p-value and conclusion to their test? So pause this video and try to figure out on your own.
All right, so what's interesting here is we're going to use a confidence interval to think about a hypothesis test. Remember, in a hypothesis test, we assume that our null hypothesis is true. We'll assume this, and if there's another way we could write it, we could write it like this: that the difference between the in-person and the online true proportions is equal to zero. These are equivalent statements.
In a hypothesis test, we will assume that this is true, and then in a traditional hypothesis test, we set some significance level. So let's say we set that significance level at five percent, and that is a very typical significance level. If the results that we get, if the probability of the results of getting the results that we do get for the sample, the difference in the sample proportions is less than five percent, we say, “Hey, that's pretty unlikely.” We're going to reject the null hypothesis, which will suggest the alternative.
But here we have something interesting: we have a confidence interval. It turns out that if the sum of your confidence level and your significance level is equal to 100, and you're doing a two-sided hypothesis test—so you're thinking about, well, our alternative hypothesis isn't just that the in-person is greater than the online or that it's less than the online; it's that they are different—so we have a two-sided hypothesis test.
In these situations, you can actually make some inferences about your p-value from your confidence interval. Think about it this way: we are assuming our null hypothesis is true when we do this hypothesis test. So when we construct a 95% confidence interval, we would expect that 95% of confidence intervals would overlap with zero. Where did I get zero from? Remember this is a confidence interval for the difference in proportions, and our null hypothesis is that the true difference in proportions is zero.
So, 95% of the time that we do this, if we assume that the null hypothesis is true, we will overlap with zero. Another way you could think about it is that five percent of confidence intervals would not overlap with zero. So if you are in a situation where you go through this process, you try to construct a 95% confidence interval, and you don't overlap with your assumed difference of the true proportions from your null hypothesis, well, in this situation, your p-value is going to be less than your five percent significance level.
So in this situation, you would reject your null hypothesis, and in this first situation, your p-value is going to be greater than or equal to your alpha level, and you would fail to reject. So what's the situation here? Well, our interval actually does include the assumed difference in true proportions from the null hypothesis. So that means, assuming the null hypothesis, we are in this first scenario.
This is one of the 95% of confidence intervals where we actually did overlap with the true parameter that we are trying to estimate. In that situation, our p-value is going to be greater than or equal to our alpha, which in this case is five percent, and so we fail to reject the null hypothesis. There isn't evidence to suggest that there is a true difference in passing rates between the in-person and the online exam.