yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Introduction to the chi-square test for homogeneity | AP Statistics | Khan Academy


5m read
·Nov 11, 2024

We've already been introduced to the chi-squared statistic in other videos. Now, we're going to use it for a test for homogeneity. In everyday language, this means how similar things are, and that's what we're essentially going to test here.

We're going to look at two different groups and see whether the distributions of those groups for a certain variable are similar or not. So, the question I'm going to think about, or we're going to think about together in this video, is let's say we were thinking about left-handed versus right-handed people. We're wondering, do they have the same preferences for subject domains? Are they equally inclined to science, technology, engineering, math, humanities, or neither?

We can set up our null and alternative hypotheses. Our null hypothesis is that there is no difference in the distribution between left-handed and right-handed people in terms of their preference for subject domains. So, no difference in subject preference for left and right-handed folks. And then the alternative hypothesis? Well, no, there is a difference. So, there is a difference.

How would we go about testing this? Well, we've done hypothesis testing many times in many videos already, but here we're going to sample from two different groups. Let's say that this is the population of right-handed folks and this is the population of left-handed folks. Let's say, from that sample of right-handed folks, I take a sample of 60, and then I do the same thing for the left-handed folks. These don't even have to be the same sample sizes, so the left-handed folks? Let's say I sample 40 folks.

Here is the data that I actually collect. For those 60 right-handed folks, 30 of them prefer the STEM subjects: science, technology, engineering, math. 15 preferred humanities, and 15 were indifferent; they liked them equally. Then, for the 40 left-handed folks, I got 10 preferring STEM, 25 preferring humanities, and five viewed them equally.

Then you see the total number of right-handed folks, the total number of left-handed folks, and then you have the total number from both groups that preferred STEM, total number from both groups that preferred humanities, total number from both groups that had no preference.

So, let's just start thinking about what the expected data would be if we're assuming that the null hypothesis is true, that there's no difference in preference between right and left-handed folks. This is the right-handed column; this is the left-handed column.

Assuming that the null hypothesis is true, that there's no difference between right and left-handed people in terms of their preference, our best estimate of what the distribution of preference would be in the population generally would come from this total column. Since we're assuming no difference, we would assume that in either group, 40 out of every 100 would prefer STEM, or 40 percent. 40 percent would prefer humanities, and 20 percent would have no preference.

Our expected would be that 40 percent of the 60 right-handed folks would prefer STEM. So, what's 40 percent of 60? Point four times 60 is twenty-four. Similarly, we would expect 24 preferring humanities; 40 times 60 is 24 again. And then we would expect 20 of the right-handed group to have no preference, so 20 of 60 is 12. These once again add up to 60.

For the left-handed folks, we would go through the same process. We would expect that 40 percent of them prefer STEM; 40 percent of 40 is sixteen. On the humanities, again 40 percent of 40 is sixteen, and equal twenty percent of 40 is eight. All of these add up to forty.

Once you calculate these expected values, it's a good time to make sure you're meeting your conditions for conducting a chi-squared test. The first is the random condition; these need to be truly random samples. So hopefully we met that condition. The second is that the expected value for any of these data points have to be at least equal to five, and so we have met that condition; these are all at least equal to five.

The last condition is the independence condition: that we are either sampling with replacement, or if we're not sampling with replacement, we have to feel good that our samples are no more than 10 percent of the population. So, let's assume that that is the case as well, and now we're ready to calculate our chi-squared statistic.

Our chi-squared statistic is going to be equal to the difference between what we got and the expected, squared, divided by the expected. So, 30 minus 24 squared divided by the expected, divided by 24. We'll do it for all six of these data points.

Then I will go to the next one. So, then this is going to be plus, and if I look at this, and this here, I'm going to have 10 minus 16 squared over expected 16. Then I'm going to have—I would look at that data point and that expected, and I would get 15 minus 24 squared over expected over 24.

I'm running out of colors! Then we would look at those two numbers, and we would say plus 25 minus 16 squared divided by expected. Then we would get—look at these two—plus 15 minus 12 squared over expected over 12.

Last but not least, let me find a color I haven't used. We would look at that and that, and we would say plus 5 minus 8 squared over expected over 8.

Now, once you get that value for the chi-square statistic, the next question is what are the degrees of freedom? A simple rule of thumb is to just look at your data and think about the number of rows and the number of columns. We have three rows and two columns, and so your degrees of freedom are going to be the number of rows minus one (three minus one) times the number of columns minus one (two minus one).

So, this is going to be equal to 2 times 1, which is equal to 2. The reason why that makes intuitive sense is think about it: if you knew two of these data points, and if you knew all of the totals, then you could figure out the other data points. If you knew these two data points and you knew the total, you could figure out that. And if you figured out that and that, then you could figure out this right over here.

That’s why this rule of thumb works. The number of rows minus one times the number of columns minus one gives you your degrees of freedom. Now, given this chi-squared statistic that I haven't calculated but you could type this into a calculator and figure it out, and this degrees of freedom, we could then figure out the p-value.

We could figure out the probability of getting a chi-squared statistic this extreme or more extreme. If this is less than our significance level, which we should have set ahead of time, then we would reject the null hypothesis, and it would suggest the alternative. If this is not less than our significance level, then it does not allow us to reject the null hypothesis.

More Articles

View All
15 Services That Will Never Go Out Of Business
According to the World Economic Forum Future of Jobs report, as many as 85 million jobs worldwide are expected to be replaced by artificial intelligence by 2025. Considering how fast this sector is evolving, it’s not far-fetched to say that this number is…
Changing the narrative with Nat Geo Photographer Sofia Jaramillo | Hispanic Heritage Month
I first started with photography on a college road trip with my dad, and I took this picture. I remember looking at the back of my camera and just being like, “This is it, this is what I’m gonna do for the rest of my life.” My name is Sophia Jaramillo. I…
Meet the $250,000,000 man
As many of you know, I’m an avid YouTube connoisseur. Now, even though I’ve only been making videos here on YouTube for about 24 months, I have been on here as a loyal viewer since about 2010. Every now and then, someone comes across your screen that gets…
Exclusive: A Conversation with Alex Honnold and Co-Directors of “Free Solo” | National Geographic
I definitely have a fear of death, same as anybody else, and I would very much like to not die while climbing. You know, I was this huge, huge wall. But all it takes is one move that doesn’t feel right for you not to be able to do it. Maybe in 2015, I st…
How to sell a $3,500,000 private jet.
We need something for short distance: half million, 1,500 naal miles. I’m looking to improve the quality of the place. Now, I understand you’re working with a bigger corporate jet, but it’s my first one. No, no, I understand there’s nothing wrong with th…
Why Mosquitoes Bite Some People More Than Others
Are you the person in the group who’s always getting bitten by mosquitoes? Because I certainly am, and science has shown that this is a thing—that mosquitoes are more attracted to some people than others. And the reason for that is at least partially gene…