Two-sample t test for difference of means | AP Statistics | Khan Academy
Kaito grows tomatoes in two separate fields. When the tomatoes are ready to be picked, he is curious as to whether the sizes of his tomato plants differ between the two fields. He takes a random sample of plants from each field and measures the heights of the plants. Here is a summary of the results.
So, what I want you to do is pause this video and conduct a two-sample t-test here. Let's assume that all of the conditions for inference are met: the random condition, the normal condition, and the independence condition. Also, let's assume that we are working with a significance level of 0.05. So pause the video and conduct the two-sample t-test to see whether there's evidence that the sizes of tomato plants differ between the fields.
All right, now let's work through this together. So, like always, let's first construct our null hypothesis. That's going to be the situation where there is no difference between the mean sizes. So, that would be that the mean size in Field A is equal to the mean size in Field B.
Now, what about our alternative hypothesis? Well, he wants to see whether the sizes of his tomato plants differ between the two fields. He's not saying whether A is bigger than B or whether B is bigger than A. So, his alternative hypothesis would be around his suspicion that the mean of A is not equal to the mean of B; that they differ.
To do this two-sample t-test, we assume the null hypothesis. Remember, we're assuming that all of our conditions for inference are met. Then, we want to calculate a t-statistic based on this sample data that we have.
Our t-statistic is going to be equal to the differences between the sample means, all of that over our estimate of the standard deviation of the sampling distribution of the difference of the sample means. This will be the sample standard deviation from sample A squared over the sample size from A, plus the sample standard deviation from the B sample squared over the sample size from B.
Let's see, we have all the numbers here to calculate it. The numerator is going to be equal to 1.3 minus 1.6. 1.3 minus 1.6, all of that over the square root of... let's see, the sample standard deviation from sample A is 0.5. If you square that, you're going to get 0.25, and then that's going to be over the sample size from A, over 22, plus 0.3 squared. So, that is 0.3 squared is 0.09, all of that over the sample size from B, all that over 24.
The numerator is just going to be negative 0.3. Negative 0.3 divided by the square root of 0.25 divided by 22 plus 0.09 divided by 24, and that gets us negative 2.44 approximately. Negative 2.44.
If you think about a t-distribution, we'll use our calculator to figure out this probability. So, this is a t-distribution right over here. This would be the assumed mean of our t-distribution. We got a result that is negative; we get a t-statistic of negative 2.44.
So, we're right over here. This is negative 2.44. We want to find out what the probability from this t-distribution of getting something at least this extreme is. It would be this area, and it would also be this area. If we got 2.44 above the mean, it would also be this area.
What I could do is use my calculator to figure out this probability right over here and then multiply that by 2 to get this one as well. The probability of getting a t-value, I guess I could say, where its absolute value is greater than or equal to 2.44 is going to be approximately equal to... I'm going to go to Second Distribution, and I'm going to the cumulative distribution function for our t-distribution, click that.
Since I want to think about this tail probability here, I'm just going to multiply by 2. The lower bound is a very, very, very negative number. You could view that as functionally negative infinity. The upper bound is negative 2.44. Negative 2.44.
Now, what's our degrees of freedom? Well, if we take the conservative approach, it'll be the smaller of the two samples minus one. The smaller of the two samples is 22, so 22 minus 1 is 21.
So, put 21 in there, 2, 21, and now I can paste, and I get that number right over there. If I multiply that by 2 (because this just gives me the probability of getting something lower than that), I also want to think about the probability of getting something 2.44 or more above the mean of our t-distribution. So, times 2 is going to be equal to approximately 0.024.
So, approximately 0.024. What I want to do then is compare this to my significance level. You can see very clearly this right over here; this is equal to our p-value. Our p-value in this situation is clearly less than our significance level.
Because of that, we said, "Hey, assuming the null hypothesis is true, we got something that's a pretty low probability below our threshold." We are going to reject our null hypothesis, which tells us that there is... this suggests the alternative hypothesis that there is indeed a difference between the sizes of the tomato plants in the two fields.