yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Hypothesis test for difference in proportions | AP Statistics | Khan Academy


6m read
·Nov 11, 2024

We're now going to explore hypothesis testing when we're thinking about the difference between proportions of two different populations.

So here it says, here are the results from a recent poll that involved sampling voters from each of two neighboring districts: District A and District B. Folks were asked whether they support a new law or not. From each district, we took a sample of 100 voters, and then we were able to calculate the proportion from that sample that supported the law. Here we have the combined data, including the combined proportion, and we're asked, does this suggest a significant difference between the two districts?

This is asking for a hypothesis test, and the way we would do that is we would set up our null hypothesis. Remember, our null hypothesis is the one that we would assume that there is no difference. So we would assume that the true proportion of folks in District A that support the new law is equal to the proportion in District B that support the law. Another way to think about it is that the difference would be equal to zero. Our alternative hypothesis is that the absolute difference between the proportions is not equal to zero.

If we were doing an all-out hypothesis test, we would set a significance level, which we usually denote with an alpha. Oftentimes it might be a ten percent significance level or a five percent significance level. Let's say we set it at a five percent significance level. What we would do is say, all right, let's assume that the null hypothesis is true. Assuming the null hypothesis is true, what is the probability of getting a difference between our sample proportions this extreme or more? If that probability is less than our significance level, then we reject the null hypothesis, which would suggest the alternative.

Now, before we go deeper into our inference, we want to test our conditions for inference. We've seen these many times before. You have the random condition, where you would need to feel good that both of these samples are truly random. You would have your normal condition, which is that you would have at least 10 successes and failures in each of these samples. We see that we do indeed have at least 10 successes and at least 10 failures in each of those samples.

Then you have your independence condition. In the independence condition, you are either sampling with replacement, or you need to feel good that each of these sample sizes are no more than 10% of the entire population. So, I guess we will assume that there's at least a thousand folks in District A and at least a thousand folks in District B. That would allow us to meet the independence condition.

With that out of the way, let's assume the null hypothesis and start thinking about the sampling distribution of the difference between the sample proportions, assuming that null hypothesis. The first thing I want to think about is what is going to be the standard deviation of the difference in the sampling distributions?

Well, we have seen in a previous video when we talked about differences of proportions that we could think about the variance. The variance of the sampling distribution—there's a lot of notation here—so the variance is going to be equal to the variance of the sampling distribution of the sample proportion from District A plus the variance of the sampling distribution of the sample proportion from District B.

Now, in general, you can figure out the variance of the sampling distribution of a sample proportion with the following formula. We've seen this before. The variance of the sampling distribution of the sample proportion is going to be equal to our true proportion times 1 minus our true proportion, all of that over your sample size.

In either situation, we don't know the true proportions for District A or District B. That's why we're in this; that's why we're even doing this hypothesis test to begin with. But we can try to estimate it. Remember, we're assuming that the true proportions are equal, even though we might not know what they are. What is going to be our best estimate of that true proportion? If we assume that District A and District B have no difference in terms of the number of people who support the new law, the best estimate would actually be the combined sample, the combined sample proportion right over here.

To estimate these values, we use this combined sample proportion in the place of p over here. So we could say that this is going to be our combined sample proportion times 1 minus our combined sample proportion, all of that over our sample size. Since we're assuming that there's no difference between District A and District B, this would also apply to that, right over there.

So let me rewrite this again. The standard deviation of the sampling distribution of the difference of the sample proportions from District A and District B is going to be roughly, remember we weren't able to calculate it exactly, but we're using this combined proportion as our best estimate. Let me do a big square root right over here, a big radical.

So underneath that, we're going to have our estimate of this, which is 0.55 times 1 minus 0.55, so 0.45 over 100, plus our estimate of this, which is 0.55—it's the same thing again—times 0.45. Remember that's because we're assuming the null hypothesis is true, all of that over this sample size, all of that over 100.

Now we can get our calculator out to actually calculate it. So we get the square root of 0.55 times 0.45 divided by 100. Now, I could add that whole thing again, or I could just multiply by 2. So times 2 is equal to approximately 0.07. So this is going to be approximately equal to 0.07.

Using this, we can calculate a z-score and then we can think about what's the probability of getting a z-score that extreme or more. Our z-score or our z-value would be equal to the difference that we got (p hat sub A - p hat sub B) all of that over our estimate of the standard deviation of the sampling distribution of this difference between the sample proportions, so all of that over zero point.

Now, this up here in yellow, it's 0.58 minus 0.52. This is going to be equal to 0.06 over 0.07. We can get our calculator out for this again. So we have 0.06 divided by 0.07 is going to be approximately 0.86.

So this is approximately 0.86. Now, what's the probability of getting something this extreme or more extreme? Let me just make sure we can visualize it properly. So if this is our sampling distribution of the difference between our sample proportions and we're assuming the null hypothesis, the mean of our sampling distribution is going to be 0. It’s going to be 0 right there.

We just got a result that is less than a standard deviation above the mean. So we just got a result that puts us right there. If we ask ourselves what's the probability of getting a result at least that extreme, we would say, okay, it would be what’s the probability of getting a result—all of this area right over here—and it would also be what’s this area on the other side of the mean.

We know that this is over 30 because even if you just exclude one standard deviation above and below the mean, if you see anything more extreme than that—so if you put this area and this area—you’re thinking you're looking at roughly 31 or 32 percent.

So the probability of getting something at least this extreme is going to be over 30. It's definitely going to be higher than our significance level. It's actually completely reasonable to get a difference this extreme if we assume the null hypothesis is true.

In future videos, we can go even deeper, where we can actually just look this up on a z-table to calculate these areas more precisely, and we can compare them to the significance level. But here, it's not even close. We're nowhere close to being able to reject the null hypothesis.

So to answer the question, does this suggest a significant difference between the two districts? No, no, it doesn't.

More Articles

View All
6 things you probably need to hear
Here are six things you probably need to hear. Number one: Nobody is on their way. This is something that everybody has to realize at some point in their life, and some people realize it when it’s far too late. And that is that nobody is on their way to …
Introduction to solubility equilibria | Equilibrium | AP Chemistry | Khan Academy
Let’s say we have a beaker of distilled water at 25 degrees Celsius, and to the beaker, we add some barium sulfate. Barium sulfate is a white solid. A small amount of the barium sulfate dissolves in the water and forms barium 2 plus ions in solution and s…
Generating Power on Mars | MARS: How to Get to Mars
So, power on Mars is going to be very important, and it will have to have the ability to run the microwave oven, along with the oxygenator and everything else that we’re going to need to survive. You need power; every civilization needs power. It’s what w…
What Is the 'Gray Zone' Border Between the U.S. and Canada? | National Geographic
The United States and Canada share the longest undefended border in the world. Most of the time, it’s as peaceful as it sounds, but not always. Since the 1700s, a tiny turf war has been smoldering between the two countries. The grand prize: an uninhabited…
Two Friends + 24 Hours = One Great Adventure in Croatia | Short Film Showcase
This is my friend Alistair Humphries. He’s an adventurer and writer, and in the summer, he invited me on a micro-adventure in Croatia. The idea was to fit in as much as we possibly could in 24 hours and to make a short film about it. So first, we made a …
VMware Cofounder Diane Greene with Jessica Livingston at the Female Founders Conference
Let’s, I’m going to stay here. Oh, okay, right there. Come over. Alright, well, we’ll just get into this because I have a whole list of questions and how many we’ll get through. So I’m going to selfishly ask a question because I’m very interested. When yo…