yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Confidence intervals for the difference between two proportions | AP Statistics | Khan Academy


6m read
·Nov 11, 2024

Let's review calculating confidence intervals for proportions.

So, let's say I have a population and I care about some proportion. Let's say I care about the proportion of folks that are left-handed. I don't know what that is, and so I take a sample of size n. Then, from that sample, I can calculate a sample proportion. That's why I put that little hat on top of it; it's a sample proportion that's estimating our true proportion.

Now I want to construct a confidence interval, but before I go down the path, I need to actually set up my conditions for inference. Make sure that I meet them. We've done this many times.

So, the first condition for inference is the random condition. I need to feel good that this is truly a random sample from the population. The second one is often known as the normal condition, and that's the condition that, hey, in order to feel like the sampling distribution for the sample proportions is roughly normal, n times our sample proportion should be greater than or equal to 10, and n times 1 minus our sample proportion should be greater than or equal to 10. We've seen that multiple times before.

Then, the third one is the independence condition. There are two ways to meet this: either the individual observations in our sample should be done with replacement, or if it's not done with replacement, we can feel pretty confident about this if our sample size is no more than 10 percent of the size of the entire population.

But, let's say that we meet these conditions for inference. What do we do? Well, we come up, we set up a confidence level for our confidence interval that we're about to construct. Let's say we said it was a 95 percent confidence level. That would mean that 95 percent of the time that we went through this exercise, the confidence interval that we get would actually overlap with the true population proportion. A 95 percent confidence level is actually a fairly typical one.

From that confidence level, you can calculate a critical value. The way that you do that is you just look up in a z table. Once again, all of this is review. You would say, "Hey, how many standard deviations above and below the mean of a normal distribution would you need to go in order to get, say, 95 percent?" That confidence level of the distribution.

Now we're ready to calculate the confidence interval. The confidence interval is going to be equal to our sample proportion plus or minus our critical value times the standard deviation of the sampling distribution of the sample proportion.

Now, there is a way to calculate this exactly if we knew what p is. If we knew what p is, this would be the square root of p times 1 minus p over n. But if we knew what p is, we wouldn't even have to do this business of constructing confidence intervals.

So instead, we estimate this. We say, "Look, an estimate of the standard deviation of the sampling distribution, often known as the standard error, an estimate of this is going to be the square root of, instead of the true population parameter, we could use the sample proportion." So, p hat times 1 minus p hat all of that over n.

Now, the whole reason why I did this, this is covered in much more detail and much slower in other videos, is to see the parallels between this and a situation when we're constructing a two-sample confidence interval, or z interval for the difference between proportions. What am I talking about?

Well, let's say that you have two different populations. So, this is the first population, and it has some true proportion of the folks that let's say are left-handed. Then, there's another population. Let's call that p2. You know, maybe this is a freshman in your high school or college, and maybe this is sophomores.

So, two different populations, and you want to see if there's a difference between the proportion that are left-handed. Say so, what you could do, just like we've done here is, for each of these populations, you will take a sample. Here, we'll call that n1. Then, from that sample, you calculate a sample proportion. Let's call that p1.

From this second population, we do the same thing. This is n2. Notice n1 and n2 do not have to be the same sample size. That's a common misconception when doing these things. These could be different sample sizes, and then from that sample, you calculate the sample proportion.

Now, after you do that, you would want to check your conditions for inference. It turns out that the conditions for inference would be exactly the same. Do both of these samples meet the random condition? Do both of these samples meet the normal condition? And do both of these samples meet the independence condition?

If both samples meet these conditions for inference, then we would have to calculate our critical value, and you would do it the exact same way. I'll just write it down again. So first, you need to check all of these. Then you would take your confidence level — confidence level — and from that, get a critical z.

Then you're ready to say what your confidence interval is going to be. So, your confidence interval for p1 minus p2 (so it's a confidence interval for the difference between these true population proportions) that is going to be equal to the difference between your sample proportions, so p hat 1 minus p hat 2, plus or minus your critical value times the standard deviation of the sampling distribution of the difference between the sample proportions.

So it would be p hat 1 minus p hat 2. And so we already know how to calculate this. How do we calculate that? Well, I will just give you the formula first, but then we just have to appreciate that this just comes out of the properties of standard deviations and variances that we have studied in the past.

So, the standard deviation of the sampling distribution of the difference between the sample proportions — it is a mouthful! This is going to be approximately equal to the square root of p hat 1 times 1 minus p hat 1 over n1 plus p hat 2 times 1 minus p hat 2 over n2.

Then you put that there; you have constructed your confidence interval. And once again, how would you interpret that? Well, let's say your confidence level is ninety percent, and from that, you're able to construct this confidence interval. That would mean that ninety percent of the time that you go through this exercise, your confidence interval would overlap with the true difference between these population parameters — the true difference between these population proportions.

Now, where did this thing come from? Well, you might notice some similarities here. This part over here is an estimate, or it's approximately equal to the variance of the sampling distribution of the sample proportion for our first population.

Then this right over here, once again, is approximately going to be equal to the variance of the sampling distribution for the sample proportions for this population for p2. How did I know that? Well, look, if this is approximately the standard deviation, you square that; you approximately get the variance.

So, the big takeaway is that the variance for the sampling distribution of the difference is just the sum of the variances of each of those sampling distributions. That's a lot of big mouthful! I know it can get confusing, but hopefully, that makes sense.

That's where this formula comes from. And so, it's really not that much more to remember. In the next few videos, we're going to do many more examples, both looking at these conditions and calculating confidence intervals and critical values.

More Articles

View All
Why You're Doomed to the 9-5 Trap | Charles Bukowski
People simply empty out their bodies with fearful and obedient minds. The color leaves the eye. The voice becomes ugly, and the body, the hair, the fingernails, the shoes, everything does. Does this sound familiar? A long day looking in front of the compu…
Analyzing mosaic plots | Exploring two-variable data | AP Statistics | Khan Academy
We’re told that administrators at a school are considering a policy change. They survey a group of students, staff members, and parents about whether or not they agree with the new policy. The following mosaic plot summarizes their results. Which of the f…
Warren Buffett's Hidden Warning to Investors for 2024
This is Warren Buffett, the best investor the world has ever seen. This is the list of his top 10 stock holdings as of our last update on the 30th of June 2024. As we know, we get these updates every 3 months thanks to a very handy SEC filing called the 1…
Artificial selection and domestication | Natural selection | AP Biology | Khan Academy
Most of us are familiar with dogs, oftentimes known as man’s best friend. What’s fascinating about them is that they are one species, even though different types of dogs, different breeds, could look very, very different. The fact that they’re one species…
Introduction to verb aspect | The parts of speech | Grammar | Khan Academy
Hello grammarians. So, I’ve talked about the idea of verb tense, which is the ability to situate words in time. But today, I’d like to talk about verb aspect, which is kind of like tense but more. Let me explain what that means. So, with basic verb tens…
How Does Kodak Make Film? (Kodak Factory Tour Part 1 of 3) - Smarter Every Day 271
Hey, it’s me, Destin. Welcome back to Smarter Every Day. I love analog film photography. There’s something to me about being able to capture a memory in a physical object with light and physics and chemistry. It’s just beautiful. In a previous episode of…