Conditions for valid t intervals | Confidence intervals | AP Statistics | Khan Academy

4m read

·Nov 11, 2024

Flavio wanted to estimate the mean age of the faculty members at her large university. She took an SRS, or simple random sample, of 20 of the approximately 700 faculty members, and each faculty member in the sample provided Flavio with their age. The data were skewed to the right, with a sample mean of 38.75. She's considering using her data to make a confidence interval to estimate the mean age of faculty members at her university.

Which conditions for constructing a t-interval have been met? So pause this video and see if you can answer this on your own.

Okay, now let's try to answer this together. So there's 700 faculty members over here. She's trying to estimate the population mean, the mean age. She can't talk to all 700, so she takes a sample, a simple random sample of 20. So the n is equal to 20 here. From this 20, she calculates a sample mean of 38.75.

Now, ideally, she wants to construct a t-interval, a confidence interval using the t-statistic. That interval would look something like this: it would be the sample mean plus or minus the critical value times the sample standard deviation divided by the square root of n. We use a t-statistic like this and a t-table and a t-distribution when we are trying to create confidence intervals for means where we don't have access to the standard deviation of the sampling distribution, but we can compute the sample standard deviation.

Now, in order for this to hold true, there are three conditions, just like what we saw when we thought about z-intervals. The first is that our sample is random. Well, they tell us here that she took a simple random sample of 20, and so we know that we are meeting that constraint. And that's actually choice A: the data is a random sample from the population of interest.

So we can circle that in.

The next condition is the normal condition. Now, the normal condition when we're using, when we're doing a t-interval, is a little bit more involved because we do need to assume that the sampling distribution of the sample means is roughly normal. Now, there are a couple of ways that we can get there. Either our sample size is greater than or equal to 30. The central limit theorem tells us that, then our sampling distribution, regardless of what the distribution is in the population, that the sampling distribution actually would then be approximately normal.

She didn't meet that constraint right over here. Here, her sample size is only 20. So so far, this isn't looking good.

Now, that's not the only way to meet the normal condition. Another way to meet the normal condition, if we have a smaller sample size, smaller than 30, is one: if the original distribution of ages is normal, so original distribution normal, or even if it's roughly symmetric around the mean, so approximately symmetric. But if you look at it, they tell us that it has a right skew. They say the data were skewed to the right with a sample mean of 38.75. So that tells us that the data set that we're getting in our sample is not symmetric, and the original distribution is unlikely to be normal.

Think about it. It's not going to be. You're likely to have people who are, you could have faculty members who are 30 years older than this 68 and three-quarters, but you're very unlikely to have faculty members who are 30 years younger than this, and that's actually what's causing that skew to the right. So this one does not meet the normal condition. We can't feel good that our sampling distribution of the sample means is going to be normal, so I'm not going to fill that one in.

Choice C: individual observations can be considered independent. So there are two ways to meet this constraint. One is if we sample with replacement. Every faculty member we look at after asking them their age, we say, "Hey, go back into the pool," and we might pick them again until we get our sample of 20. It does not look like she did that. It doesn't look like she sampled with replacement.

Even if you're sampling without replacement, the 10% rule says that, look, as long as this is less than 10, or less than or equal to 10, of the population, then we're good. And the 10% of this population is 70. 70 is 10% of 700, and so this is definitely less than or equal to 10, and so it can be considered independent.

Thus, we can actually meet that constraint as well.

So the main issue where our t-interval might not be so good is that our sampling distribution—we can't feel so confident that that is going to be normal.

Conditions for valid t intervals | Confidence intervals | AP Statistics | Khan Academy

More Articles