yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Hypothesis test for difference in proportions example | AP Statistics | Khan Academy


6m read
·Nov 11, 2024

We are told that researchers suspect that myopia, or nearsightedness, is becoming more common over time. A study from the year 2000 showed 132 cases of myopia in 400 randomly selected people. A separate study from 2015 showed 228 cases in 600 randomly selected people.

So what we're going to do in this video is do a hypothesis test to see if we have evidence to suggest the researchers' suspicion that myopia is becoming more common over time. If at any point you are inspired, I encourage you to pause the video and try to work through things on your own. But here I go; I'm going to do it with you.

Let's just start off by setting our null and alternative hypothesis. So remember our null hypothesis—this would be that the "no news here." So that would be that, contrary to their suspicions, myopia is not becoming more common over time. The way that we're measuring "more common over time" is we could look at the proportion of folks who have myopia in 2015 and compare that to the proportion in 2000. So our null hypothesis is that there's no difference, that the true proportion of folks who have myopia in 2015 is equal to the proportion of folks who have myopia in 2000.

And then our alternative hypothesis—remember they suspect it's becoming more common over time. So that would be a situation where our true proportion in 2015 is greater than the true proportion in 2000. In this scenario, myopia would be becoming more common over time because 2015 happens after 2000.

So before we even go about testing our null hypothesis—seeing if we can reject it or not, which would suggest our alternative—we have to look at your conditions for inference, and we've done this many times before. You have your random condition, and it looks like we meet that because, in both of the samples, we have 400 randomly selected people. So that looks good.

Then you have your normal condition. To meet your normal condition, your number of successes and failures in each of the samples has to be at least 10. And we see that that is the case; we have 132 successes, so to speak— not that it's a success for someone to have myopia, but the way this has been constructed that would be a success—and then 400 minus 132 failures in each case. Either of those numbers would be greater than 10, and the same thing for the sample from 2015.

So we're meeting both of those. And then the last condition that we always talk about is the independence condition. Two ways to get there: either you are sampling with replacement or you feel good that your sample size is no more than 10% of the population. I think it is safe to say that, even with this larger sample of 600, there are more than 6,000 people out there. I think it's reasonable to say that we're meeting that independence condition, even though they're not making it explicit here. But it's good to always think about this.

Now the next thing you want to do in a hypothesis test is set your significance level, your alpha. I'll set my significance level to 0.05. So we're now going to assume the null hypothesis and say, "Well, what is the probability of getting a difference between 2015 and 2000 that is at least as large as the one that we got?" If that probability is less than our significance level, then we would reject our null hypothesis, and that would suggest the alternative. If that probability is greater than our significance level, then we fail to reject the null hypothesis, and we fail to have evidence for the researchers' suspicion.

So let's move ahead with that. What we want to do is let's come up with a z-value or a z-score. So our z is going to be equal to the sample proportion in 2015 minus our sample proportion in 2000, all of that over the standard deviation of the sampling distribution of the difference between the sample proportions in 2015 and 2000.

Now this is going to be—and I will say approximately equal to—we can calculate this numerator exactly, but this denominator we are going to estimate. So this numerator is going to be—let's see—in 2015, I'll use some different colors: 2015 we have 228 cases out of 600, so it's 228 out of 600. And then in 2000 we have 132 cases out of 400. So minus 132 over 400.

All of that over the square root—and what we use in the denominator here under the radical sign is we use the combined proportion, and we could write that as p hat sub c. The reason why we use the combined proportion— we've talked about this in previous videos—is remember when we do a hypothesis test, we assume that our null hypothesis is true. If our null hypothesis is true, there's no difference between the proportions in 2015 and 2000.

So to get a better estimate of the true proportion, we should just add up our samples. Our sample size would be 600 plus 400. The number of cases of myopia would be 228 plus 132, which would get us to—what is this?—360 over 1000, which is equal to 0.36. We can use that inside the expression when we're trying to estimate our standard deviation of this sampling distribution.

So this is going to be 0.36 times 1 minus 0.36, which would be 0.64, over the sample size in 2015, which is 600, plus 0.36 times 0.64 over the sample size in 2000, which is equal to 400.

Let's see—before I even get my calculator out, I think I can simplify this a little bit. 228 over 600: 228 divided by 6 is going to be equal to 38, so this would be 0.38. Let's see, 132 divided by 4 would be 33, so this would be 0.33. And so our entire numerator is going to be 0.05.

So now I could put this into my calculator, and I will get 0.05 divided by the square root of—let's see. I'm going to have 0.36 times 0.64 divided by 600 plus 0.36 times 0.64 divided by 400, which is going to get me approximately 1.61.

So this is going to be approximately 1.61. One way to think about it is, the difference that we got between our sample proportions between 2015 and 2000 of 0.05—that is 1.61 standard deviations above our mean of our sampling distribution if we assume that the null hypothesis is true.

From this, we can calculate our p-value. Remember our p-value is equal to the probability that our z-score is at least that big, greater than or equal to 1.61. One way to think about it: if we look at the sampling distribution—or really we could just look at any normal distribution—now since we have normalized for z, we're looking at 1.61 standard deviations above the mean. So z is equal to 1.61, and we're thinking about this area right over here—that would be our p-value.

To help us with that, we can get out a z table, and we see this z table gives us the cumulative area up to some z-score. We would just have to—whatever this gives us—we would just have to do 1 minus that. If we go to 1.61, we get 0.9463.

So it'll be 1 minus 0.9463 is equal to 1 minus 0.9463, which is equal to—let's see—0.0537. Notice this p-value is ever so slightly higher than our significance level. But this is why we want to set our significance level ahead of time. We don't want to get tempted to say, "Oh, I'm so close; let me just raise my significance level a little bit more so that I can reject my null hypothesis, and then I can have something that I can tell my friends about."

No, that would not be good science. That would not be good statistics. We have to be disciplined. So here, because our p-value is greater than our significance level, even though it varies by a very small amount, we fail to reject our null hypothesis.

Another way to think about it, in terms of the context of the question, we can say that there is not enough evidence to suggest that myopia is becoming more common over time. Myopia becoming more common over time, and we're done.

More Articles

View All
6 Millionaire Habits I Wish I Knew At 20
What’s up you guys, it’s Graham here. So I know a lot of people say your 20s are the most transformative and influential years of your entire life, and I have to say it, but that is absolutely a load of truth. Because looking back over my last 10 years, I…
Checking bus fares with if statements | Intro to CS - Python | Khan Academy
Let’s design a program using Boolean expressions and if statements. The public transit system wants to build an app that determines a passenger’s bus fare. The standard bus fare is $4.25; however, they offer discounts for certain age groups. Kids under fi…
Sun Tzu | How to Fight Smart (The Art of War)
This video doesn’t condone violence or war of any kind, but simply explores the tactics from an ancient text, and how these might work in everyday (non-military) settings in the modern world. Nevertheless, some information and graphics in this video could…
Introduction to the Crusades
We are in the year 1095. Just for context, this is roughly half a century after the Great Schism between the Eastern Orthodox Church, centered in Constantinople, and what eventually gets known as the Roman Catholic Church, or the Latin Church, centered in…
Proof of the derivative of cos(x) | Derivative rules | AP Calculus AB | Khan Academy
What I’m going to do in this video is make a visual argument as to why the derivative with respect to X of cosine of x is equal to sin of X. We’re going to base this argument on a previous proof we made that the derivative with respect to X of sin of X is…
The 5 Step Process for Getting What You Want From Life
Like I say, you can have practically anything you want in life, but you can’t have everything you want in life. So that means you have to prioritize what are the things you’re going after that has to do with the earlier part of, you know, knowing what you…