yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Confidence intervals for the difference between two proportions | AP Statistics | Khan Academy


6m read
·Nov 11, 2024

Let's review calculating confidence intervals for proportions.

So, let's say I have a population and I care about some proportion. Let's say I care about the proportion of folks that are left-handed. I don't know what that is, and so I take a sample of size n. Then, from that sample, I can calculate a sample proportion. That's why I put that little hat on top of it; it's a sample proportion that's estimating our true proportion.

Now I want to construct a confidence interval, but before I go down the path, I need to actually set up my conditions for inference. Make sure that I meet them. We've done this many times.

So, the first condition for inference is the random condition. I need to feel good that this is truly a random sample from the population. The second one is often known as the normal condition, and that's the condition that, hey, in order to feel like the sampling distribution for the sample proportions is roughly normal, n times our sample proportion should be greater than or equal to 10, and n times 1 minus our sample proportion should be greater than or equal to 10. We've seen that multiple times before.

Then, the third one is the independence condition. There are two ways to meet this: either the individual observations in our sample should be done with replacement, or if it's not done with replacement, we can feel pretty confident about this if our sample size is no more than 10 percent of the size of the entire population.

But, let's say that we meet these conditions for inference. What do we do? Well, we come up, we set up a confidence level for our confidence interval that we're about to construct. Let's say we said it was a 95 percent confidence level. That would mean that 95 percent of the time that we went through this exercise, the confidence interval that we get would actually overlap with the true population proportion. A 95 percent confidence level is actually a fairly typical one.

From that confidence level, you can calculate a critical value. The way that you do that is you just look up in a z table. Once again, all of this is review. You would say, "Hey, how many standard deviations above and below the mean of a normal distribution would you need to go in order to get, say, 95 percent?" That confidence level of the distribution.

Now we're ready to calculate the confidence interval. The confidence interval is going to be equal to our sample proportion plus or minus our critical value times the standard deviation of the sampling distribution of the sample proportion.

Now, there is a way to calculate this exactly if we knew what p is. If we knew what p is, this would be the square root of p times 1 minus p over n. But if we knew what p is, we wouldn't even have to do this business of constructing confidence intervals.

So instead, we estimate this. We say, "Look, an estimate of the standard deviation of the sampling distribution, often known as the standard error, an estimate of this is going to be the square root of, instead of the true population parameter, we could use the sample proportion." So, p hat times 1 minus p hat all of that over n.

Now, the whole reason why I did this, this is covered in much more detail and much slower in other videos, is to see the parallels between this and a situation when we're constructing a two-sample confidence interval, or z interval for the difference between proportions. What am I talking about?

Well, let's say that you have two different populations. So, this is the first population, and it has some true proportion of the folks that let's say are left-handed. Then, there's another population. Let's call that p2. You know, maybe this is a freshman in your high school or college, and maybe this is sophomores.

So, two different populations, and you want to see if there's a difference between the proportion that are left-handed. Say so, what you could do, just like we've done here is, for each of these populations, you will take a sample. Here, we'll call that n1. Then, from that sample, you calculate a sample proportion. Let's call that p1.

From this second population, we do the same thing. This is n2. Notice n1 and n2 do not have to be the same sample size. That's a common misconception when doing these things. These could be different sample sizes, and then from that sample, you calculate the sample proportion.

Now, after you do that, you would want to check your conditions for inference. It turns out that the conditions for inference would be exactly the same. Do both of these samples meet the random condition? Do both of these samples meet the normal condition? And do both of these samples meet the independence condition?

If both samples meet these conditions for inference, then we would have to calculate our critical value, and you would do it the exact same way. I'll just write it down again. So first, you need to check all of these. Then you would take your confidence level — confidence level — and from that, get a critical z.

Then you're ready to say what your confidence interval is going to be. So, your confidence interval for p1 minus p2 (so it's a confidence interval for the difference between these true population proportions) that is going to be equal to the difference between your sample proportions, so p hat 1 minus p hat 2, plus or minus your critical value times the standard deviation of the sampling distribution of the difference between the sample proportions.

So it would be p hat 1 minus p hat 2. And so we already know how to calculate this. How do we calculate that? Well, I will just give you the formula first, but then we just have to appreciate that this just comes out of the properties of standard deviations and variances that we have studied in the past.

So, the standard deviation of the sampling distribution of the difference between the sample proportions — it is a mouthful! This is going to be approximately equal to the square root of p hat 1 times 1 minus p hat 1 over n1 plus p hat 2 times 1 minus p hat 2 over n2.

Then you put that there; you have constructed your confidence interval. And once again, how would you interpret that? Well, let's say your confidence level is ninety percent, and from that, you're able to construct this confidence interval. That would mean that ninety percent of the time that you go through this exercise, your confidence interval would overlap with the true difference between these population parameters — the true difference between these population proportions.

Now, where did this thing come from? Well, you might notice some similarities here. This part over here is an estimate, or it's approximately equal to the variance of the sampling distribution of the sample proportion for our first population.

Then this right over here, once again, is approximately going to be equal to the variance of the sampling distribution for the sample proportions for this population for p2. How did I know that? Well, look, if this is approximately the standard deviation, you square that; you approximately get the variance.

So, the big takeaway is that the variance for the sampling distribution of the difference is just the sum of the variances of each of those sampling distributions. That's a lot of big mouthful! I know it can get confusing, but hopefully, that makes sense.

That's where this formula comes from. And so, it's really not that much more to remember. In the next few videos, we're going to do many more examples, both looking at these conditions and calculating confidence intervals and critical values.

More Articles

View All
What is a tangent plane
Hey everyone, so here and in the next few videos, I’m going to be talking about tangent planes. Tangent planes of graphs. I’ll specify that this is tangent planes of graphs and not of some other thing because in different contexts of multivariable calculu…
Zeros of polynomials: plotting zeros | Polynomial graphs | Algebra 2 | Khan Academy
We’re told we want to find the zeros of this polynomial, and they give us the polynomial right over here, and it’s in factored form. They say plot all the zeros or the x-intercepts of the polynomial in the interactive graph. This is a screenshot from Khan…
9 Stocks Super Investors are Buying! (2023)
So, I’m about to let you in on one of the biggest secrets when it comes to investing. Listen closely because this advice could help you make a ton of money. If you want to know what stocks you should be buying, pay attention to what great investors are p…
Alan Watts and the Illusion of Time
When I started this YouTube channel, I became fixated on the day it would succeed. I stopped going out with friends and spent almost every waking moment working towards and dreaming about the future. When I did manage to go out with friends, I spent all m…
He Tastes Water Like Some Taste Wine. Meet a Water Sommelier | Short Film Showcase
People always think there is no value to water, and what motivates me is that I want to give whatever value as a water. So, McGee, I’m an advocate for water, our most important beverage on this planet. What a lot of people always say, “What? Are so many i…
Transformations, part 1 | Multivariable calculus | Khan Academy
So I have talked a lot about different ways that you can visualize multi-variable functions. Functions that will have some kind of multi-dimensional input or output. These include three-dimensional graphs, which are very common, contour maps, vector field…