yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Hypothesis test for difference in proportions example | AP Statistics | Khan Academy


6m read
·Nov 11, 2024

We are told that researchers suspect that myopia, or nearsightedness, is becoming more common over time. A study from the year 2000 showed 132 cases of myopia in 400 randomly selected people. A separate study from 2015 showed 228 cases in 600 randomly selected people.

So what we're going to do in this video is do a hypothesis test to see if we have evidence to suggest the researchers' suspicion that myopia is becoming more common over time. If at any point you are inspired, I encourage you to pause the video and try to work through things on your own. But here I go; I'm going to do it with you.

Let's just start off by setting our null and alternative hypothesis. So remember our null hypothesis—this would be that the "no news here." So that would be that, contrary to their suspicions, myopia is not becoming more common over time. The way that we're measuring "more common over time" is we could look at the proportion of folks who have myopia in 2015 and compare that to the proportion in 2000. So our null hypothesis is that there's no difference, that the true proportion of folks who have myopia in 2015 is equal to the proportion of folks who have myopia in 2000.

And then our alternative hypothesis—remember they suspect it's becoming more common over time. So that would be a situation where our true proportion in 2015 is greater than the true proportion in 2000. In this scenario, myopia would be becoming more common over time because 2015 happens after 2000.

So before we even go about testing our null hypothesis—seeing if we can reject it or not, which would suggest our alternative—we have to look at your conditions for inference, and we've done this many times before. You have your random condition, and it looks like we meet that because, in both of the samples, we have 400 randomly selected people. So that looks good.

Then you have your normal condition. To meet your normal condition, your number of successes and failures in each of the samples has to be at least 10. And we see that that is the case; we have 132 successes, so to speak— not that it's a success for someone to have myopia, but the way this has been constructed that would be a success—and then 400 minus 132 failures in each case. Either of those numbers would be greater than 10, and the same thing for the sample from 2015.

So we're meeting both of those. And then the last condition that we always talk about is the independence condition. Two ways to get there: either you are sampling with replacement or you feel good that your sample size is no more than 10% of the population. I think it is safe to say that, even with this larger sample of 600, there are more than 6,000 people out there. I think it's reasonable to say that we're meeting that independence condition, even though they're not making it explicit here. But it's good to always think about this.

Now the next thing you want to do in a hypothesis test is set your significance level, your alpha. I'll set my significance level to 0.05. So we're now going to assume the null hypothesis and say, "Well, what is the probability of getting a difference between 2015 and 2000 that is at least as large as the one that we got?" If that probability is less than our significance level, then we would reject our null hypothesis, and that would suggest the alternative. If that probability is greater than our significance level, then we fail to reject the null hypothesis, and we fail to have evidence for the researchers' suspicion.

So let's move ahead with that. What we want to do is let's come up with a z-value or a z-score. So our z is going to be equal to the sample proportion in 2015 minus our sample proportion in 2000, all of that over the standard deviation of the sampling distribution of the difference between the sample proportions in 2015 and 2000.

Now this is going to be—and I will say approximately equal to—we can calculate this numerator exactly, but this denominator we are going to estimate. So this numerator is going to be—let's see—in 2015, I'll use some different colors: 2015 we have 228 cases out of 600, so it's 228 out of 600. And then in 2000 we have 132 cases out of 400. So minus 132 over 400.

All of that over the square root—and what we use in the denominator here under the radical sign is we use the combined proportion, and we could write that as p hat sub c. The reason why we use the combined proportion— we've talked about this in previous videos—is remember when we do a hypothesis test, we assume that our null hypothesis is true. If our null hypothesis is true, there's no difference between the proportions in 2015 and 2000.

So to get a better estimate of the true proportion, we should just add up our samples. Our sample size would be 600 plus 400. The number of cases of myopia would be 228 plus 132, which would get us to—what is this?—360 over 1000, which is equal to 0.36. We can use that inside the expression when we're trying to estimate our standard deviation of this sampling distribution.

So this is going to be 0.36 times 1 minus 0.36, which would be 0.64, over the sample size in 2015, which is 600, plus 0.36 times 0.64 over the sample size in 2000, which is equal to 400.

Let's see—before I even get my calculator out, I think I can simplify this a little bit. 228 over 600: 228 divided by 6 is going to be equal to 38, so this would be 0.38. Let's see, 132 divided by 4 would be 33, so this would be 0.33. And so our entire numerator is going to be 0.05.

So now I could put this into my calculator, and I will get 0.05 divided by the square root of—let's see. I'm going to have 0.36 times 0.64 divided by 600 plus 0.36 times 0.64 divided by 400, which is going to get me approximately 1.61.

So this is going to be approximately 1.61. One way to think about it is, the difference that we got between our sample proportions between 2015 and 2000 of 0.05—that is 1.61 standard deviations above our mean of our sampling distribution if we assume that the null hypothesis is true.

From this, we can calculate our p-value. Remember our p-value is equal to the probability that our z-score is at least that big, greater than or equal to 1.61. One way to think about it: if we look at the sampling distribution—or really we could just look at any normal distribution—now since we have normalized for z, we're looking at 1.61 standard deviations above the mean. So z is equal to 1.61, and we're thinking about this area right over here—that would be our p-value.

To help us with that, we can get out a z table, and we see this z table gives us the cumulative area up to some z-score. We would just have to—whatever this gives us—we would just have to do 1 minus that. If we go to 1.61, we get 0.9463.

So it'll be 1 minus 0.9463 is equal to 1 minus 0.9463, which is equal to—let's see—0.0537. Notice this p-value is ever so slightly higher than our significance level. But this is why we want to set our significance level ahead of time. We don't want to get tempted to say, "Oh, I'm so close; let me just raise my significance level a little bit more so that I can reject my null hypothesis, and then I can have something that I can tell my friends about."

No, that would not be good science. That would not be good statistics. We have to be disciplined. So here, because our p-value is greater than our significance level, even though it varies by a very small amount, we fail to reject our null hypothesis.

Another way to think about it, in terms of the context of the question, we can say that there is not enough evidence to suggest that myopia is becoming more common over time. Myopia becoming more common over time, and we're done.

More Articles

View All
The Deutsch Files IV
I can only start with what understanding I want, right? And I know I’ve asked you this before, but I want to be pedantically exhaustive about connecting the four theories of the fabric of reality. The reason I bring that up is because I think most people …
Hear Kids' Honest Opinions on Being a Boy or Girl Around the World | National Geographic
Um, my name is Hil Kack. I’m 9 years old, and I’m 9 years old. The best thing about being a boy is like a boy, being very sporty. The best thing about being a girl is because girls can do a little bit more things than boys. [Music] The best thing about …
Comparative roles of women in Rome and Han China | World History | Khan Academy
I’m here with Iman L. Sheikh, Khan Academy’s World History fellow, and the question I have, Iman, is: history often focuses on men, but clearly women were playing a significant role. How much can we know about women, say, 2,000 years ago? When we talk ab…
5 Money Lessons I Wish I Learnt Sooner
Hey guys! Welcome back to day three of the new money advent calendar. We’ve started off strong, three videos in a row. Um, I’m going to get real tested at like the 20th and the 21st of December, 22nd of December. Yeah, it’s going to be tough. I have a fee…
It's all about talking to your users.
Most people in the world have the idea on how new startups are formed completely wrong. They think ideas of new products is something the founders come up with on a lazy Sunday or a late night coding session. You probably know it doesn’t work this way. Th…
Flying from Japan to Turkey during Pandemic🇯🇵🇹🇷✈️~19 hours long flight vlog✌🏻📸
Hi guys, it’s me, Judy again. I’m back with another vlog! In this vlog, I will show you what it’s like to fly during a pandemic from Japan to Turkey. Before my flight, I decided to get coffee from a convenience store which is in the airport. Because I’m a…