Sampling distribution of the difference in sample means | AP Statistics | Khan Academy
What we're going to do in this video is explore the sampling distribution for a difference in sample means, and we'll use this example right over here. So it tells us a large bakery makes thousands of cupcakes daily in two shifts: shift A and shift B.
Suppose that on average, cupcakes from shift A weigh 130 grams, with a standard deviation of 4 grams. For shift B, the mean and standard deviation are 125 grams and 3 grams, respectively. Assume independence between shifts. Every day, the bakery takes a simple random sample of 40 cupcakes from each shift. They calculate the mean weight for each sample, then look at the difference A minus B between the sample means.
Find the probability that the mean weights from the samples are more than six grams apart from each other. So I'm actually not going to tell you immediately to pause this video and try to work through this on your own. First, I'm going to think about how we could break this down, and then I'll ask you to pause and try to tackle each of those parts by itself.
So, in order to tackle this eventual question, we're going to have to think about the mean of the sampling distribution for the difference in sample means: sample mean from group A minus sample mean for group B. We're going to have to think about the standard deviation of the sampling distribution for the difference in sample means, and we're going to think about if this distribution is normal.
If we're able to figure out these three things, then we just have to figure out how many standard deviations away from the mean this is. We could use your standard z-table to figure out the probability. So now I encourage you to pause this video and try to tackle this first part: what is the mean of the sampling distribution for the difference in sample means?
All right, now let's work through this together. The mean of the sampling distribution for the difference in sample means— and we have seen this before—this is going to be equal to the difference between the means of the sampling distribution for each of the sample means. So that mean minus this mean.
We also know that the mean of the sampling distribution for each of these sample means, that's just going to be the mean of the population that we are sampling from. So this mean right over here is just going to be the population mean for shift A, which is going to be 130 grams. I'll just write that there.
Then the mean of the sampling distribution for the sample means from shift B, we can see that that's just going to be the population mean for shift B, which is right over here, so minus 125 grams. And of course, this is just going to be equal to 5 grams. So we have answered the first part: we know the mean of the sampling distribution of the difference in sample means.
Now, what about the standard deviation? So for that, let's think actually about variances because the math's a little bit easier with variances, and then from that we can derive standard deviations. So we know that the variance of the sampling distribution for the difference in sample means, assuming that your two samples are independent and you're sampling with replacement, if you're sampling with replacement, it's actually going to be the sum of the variances of the sampling distribution for each of the sample means.
So it's going to be that plus this right over here. Now, you might be saying, wait, we're not sampling with replacement. Well, we also know that if each of the sample sizes are less than 10 percent of the population, then the difference is negligible, and so we could still use this formula.
You could see that the simple random sample here is 40 from each shift, and they say that a large bakery makes thousands of cupcakes daily in two shifts. So even if it was a thousand, ten percent of that would be a hundred. This is less than ten percent, so we meet that condition. We can use the same formula that you would use if you were sampling with replacement.
So this first variance right over here of the sampling distribution for the sample means from shift A, this is going to be equal to the variance of shift A, the population variance of shift A, divided by your sample size. Then this over here is going to be the same thing for shift B: it's going to be the variance of shift B divided by your sample size.
So this is going to be equal to what? Well, the variance from shift A is going to be the square of the standard deviation from shift A. The standard deviation's right over there, and so that's going to be 16. We could write grams squared if we want to keep the units there, and then we're going to divide by the sample size.
We know that the sample size in each case is 40 cupcakes at a time for each sample. And then for shift B, we know that the standard deviation, the population standard deviation for shift B is 3 grams. You square that, and you get 9 grams squared. A gram squared is kind of an interesting idea, but that's what the units are working out to be right now, and our sample size is still equal to 40.
And so this is going to be equal to, let's see, 16 plus 9 is 25, a common denominator of 40. So it's 25 over 40, which is the same thing as 5/8, 5/8 of a gram squared, which is a little bit strange for units. But this now tells us what the standard deviation is going to be because it's just going to be the square root of all of this business.
So the standard deviation of the sampling distribution for the difference in sample means over here is going to be the square root of 5/8, and now of course the units are back to grams, which makes sense. This is approximately going to be equal to... get my calculator out. 5 divided by 8 equals... and then we take the square root of that, and it's going to be approximately 0.79.
So the next question before we try to figure out the probability is: are we dealing with a normal distribution here? When we think about the sampling distribution for the difference in sample means? So I encourage you to pause the video again and think about that.
So there's two ways that we can assume that the sampling distribution for the difference in sampling means is normal. If the original populations that each of the sample means are being calculated from are normal, then that means that the sampling distribution for each of the sample means is going to be normal, and that means that the difference of the sampling distributions are going to be normal.
Now, we don't know for a fact that the weights of the cupcakes from each shift are normal distributions, but we also know that the sampling distribution of the sampling means can be modeled as being approximately normal if the two sample sizes are greater than or equal to 30. We know that each of these samples are definitely greater than or equal to 30; they are 40.
So that tells us that the sampling distribution of the difference in sample means is also normal. We've established the things that we need to then calculate the probability. So I encourage you to pause the video and see if you can use that information to calculate that probability, and we will then do that in the next video.