Confidence intervals for the difference between two proportions | AP Statistics | Khan Academy

6m read

·Nov 11, 2024

Let's review calculating confidence intervals for proportions.

So, let's say I have a population and I care about some proportion. Let's say I care about the proportion of folks that are left-handed. I don't know what that is, and so I take a sample of size n. Then, from that sample, I can calculate a sample proportion. That's why I put that little hat on top of it; it's a sample proportion that's estimating our true proportion.

Now I want to construct a confidence interval, but before I go down the path, I need to actually set up my conditions for inference. Make sure that I meet them. We've done this many times.

So, the first condition for inference is the random condition. I need to feel good that this is truly a random sample from the population. The second one is often known as the normal condition, and that's the condition that, hey, in order to feel like the sampling distribution for the sample proportions is roughly normal, n times our sample proportion should be greater than or equal to 10, and n times 1 minus our sample proportion should be greater than or equal to 10. We've seen that multiple times before.

Then, the third one is the independence condition. There are two ways to meet this: either the individual observations in our sample should be done with replacement, or if it's not done with replacement, we can feel pretty confident about this if our sample size is no more than 10 percent of the size of the entire population.

But, let's say that we meet these conditions for inference. What do we do? Well, we come up, we set up a confidence level for our confidence interval that we're about to construct. Let's say we said it was a 95 percent confidence level. That would mean that 95 percent of the time that we went through this exercise, the confidence interval that we get would actually overlap with the true population proportion. A 95 percent confidence level is actually a fairly typical one.

From that confidence level, you can calculate a critical value. The way that you do that is you just look up in a z table. Once again, all of this is review. You would say, "Hey, how many standard deviations above and below the mean of a normal distribution would you need to go in order to get, say, 95 percent?" That confidence level of the distribution.

Now we're ready to calculate the confidence interval. The confidence interval is going to be equal to our sample proportion plus or minus our critical value times the standard deviation of the sampling distribution of the sample proportion.

Now, there is a way to calculate this exactly if we knew what p is. If we knew what p is, this would be the square root of p times 1 minus p over n. But if we knew what p is, we wouldn't even have to do this business of constructing confidence intervals.

So instead, we estimate this. We say, "Look, an estimate of the standard deviation of the sampling distribution, often known as the standard error, an estimate of this is going to be the square root of, instead of the true population parameter, we could use the sample proportion." So, p hat times 1 minus p hat all of that over n.

Now, the whole reason why I did this, this is covered in much more detail and much slower in other videos, is to see the parallels between this and a situation when we're constructing a two-sample confidence interval, or z interval for the difference between proportions. What am I talking about?

Well, let's say that you have two different populations. So, this is the first population, and it has some true proportion of the folks that let's say are left-handed. Then, there's another population. Let's call that p2. You know, maybe this is a freshman in your high school or college, and maybe this is sophomores.

So, two different populations, and you want to see if there's a difference between the proportion that are left-handed. Say so, what you could do, just like we've done here is, for each of these populations, you will take a sample. Here, we'll call that n1. Then, from that sample, you calculate a sample proportion. Let's call that p1.

From this second population, we do the same thing. This is n2. Notice n1 and n2 do not have to be the same sample size. That's a common misconception when doing these things. These could be different sample sizes, and then from that sample, you calculate the sample proportion.

Now, after you do that, you would want to check your conditions for inference. It turns out that the conditions for inference would be exactly the same. Do both of these samples meet the random condition? Do both of these samples meet the normal condition? And do both of these samples meet the independence condition?

If both samples meet these conditions for inference, then we would have to calculate our critical value, and you would do it the exact same way. I'll just write it down again. So first, you need to check all of these. Then you would take your confidence level — confidence level — and from that, get a critical z.

Then you're ready to say what your confidence interval is going to be. So, your confidence interval for p1 minus p2 (so it's a confidence interval for the difference between these true population proportions) that is going to be equal to the difference between your sample proportions, so p hat 1 minus p hat 2, plus or minus your critical value times the standard deviation of the sampling distribution of the difference between the sample proportions.

So it would be p hat 1 minus p hat 2. And so we already know how to calculate this. How do we calculate that? Well, I will just give you the formula first, but then we just have to appreciate that this just comes out of the properties of standard deviations and variances that we have studied in the past.

So, the standard deviation of the sampling distribution of the difference between the sample proportions — it is a mouthful! This is going to be approximately equal to the square root of p hat 1 times 1 minus p hat 1 over n1 plus p hat 2 times 1 minus p hat 2 over n2.

Then you put that there; you have constructed your confidence interval. And once again, how would you interpret that? Well, let's say your confidence level is ninety percent, and from that, you're able to construct this confidence interval. That would mean that ninety percent of the time that you go through this exercise, your confidence interval would overlap with the true difference between these population parameters — the true difference between these population proportions.

Now, where did this thing come from? Well, you might notice some similarities here. This part over here is an estimate, or it's approximately equal to the variance of the sampling distribution of the sample proportion for our first population.

Then this right over here, once again, is approximately going to be equal to the variance of the sampling distribution for the sample proportions for this population for p2. How did I know that? Well, look, if this is approximately the standard deviation, you square that; you approximately get the variance.

So, the big takeaway is that the variance for the sampling distribution of the difference is just the sum of the variances of each of those sampling distributions. That's a lot of big mouthful! I know it can get confusing, but hopefully, that makes sense.

That's where this formula comes from. And so, it's really not that much more to remember. In the next few videos, we're going to do many more examples, both looking at these conditions and calculating confidence intervals and critical values.

Confidence intervals for the difference between two proportions | AP Statistics | Khan Academy

More Articles