yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Conditions for valid confidence intervals | Confidence intervals | AP Statistics | Khan Academy


5m read
·Nov 11, 2024

What we're going to do in this video is dig a little bit deeper into confidence intervals. In other videos, we compute them; we even interpret them. But here we're going to make sure that we are making the right assumptions so that we can have confidence in our confidence intervals, or that we are even calculating them in the right way or in the right contexts.

So just as a bit of review, a lot of what we do in confidence intervals is we are trying to assume—we're trying to estimate some population parameter. Let's say it's the proportion; maybe it's the proportion that will vote for a candidate. We can't survey everyone, so we take a sample, and from that sample, maybe we calculate a sample proportion.

Then using this sample proportion, we calculate a confidence interval on either side of that sample proportion. What we know is that if we do this many, many times, every time we do it, we are very likely to have a different sample proportion. So that'd be sample proportion one, sample proportion two, and every time we do it, we might get this—maybe this is sample proportion two. Not only will we get a different, I guess you say, center of our interval, but the margin of error might change because we are using the sample proportion to calculate it.

But the first assumption that has to be true, and even to make any claims about this confidence interval with confidence, is that your sample is random—that you have a random sample. If you're trying to estimate the proportion of people that are going to vote for a certain candidate, but you are only surveying people at a senior community, well, that would not be a truly random sample. Or if you were only surveying people on a college campus.

So like with all things with statistics, you really want to make sure that you're dealing with a random sample and take great care to do that. The second thing that we have to assume, and this is sometimes known as the normal condition, is the normal condition. Remember, the whole basis behind confidence intervals is we assume that the distribution of the sample proportions—the sampling distribution of the sample proportions—has roughly a normal shape like that.

But in order to make that assumption that it's roughly normal, we have this normal condition, and the rule of thumb here is that you would expect per sample more than 10 successes, successes, and failures each—each. So for example, if your sample size was only 10, let's say the true proportion was 50-50 or 0.5, then you wouldn't meet that normal condition because you would expect five successes and five failures for each sample.

Now, because usually when we're doing confidence intervals, we don't even know the true population parameter. What we would actually just do is look at our sample and count how many successes and how many failures we have. If we have less than 10 on either one of those, then we are going to have a problem.

So you want to expect—you want to have at least greater than or equal to 10 successes or failures on each. And you actually don't even have to say expect because you're going to get a sample, and you could just count how many successes and failures you have. If you don't see that, then the normal condition is not met, and the statements you make about your confidence interval aren't necessarily going to be as valid.

The last thing we want to really make sure is known as the independence condition—the independence condition—and this is the 10% rule. If we are sampling without replacement, and sometimes it's hard to do replacement if you're surveying people who are exiting a store, for example, you can't ask them to go back into the store, or it might be very awkward to ask them to go back in the store.

So the independent condition is that your sample size—let me just say n—n is less than 10 percent of the population size. And so let's say your population were 100,000 people. If you surveyed 1,000 people, well, that was one percent of the population. So you'd feel pretty good that the independence condition is met.

Once again, this is valuable when you are sampling without replacement. Now, to appreciate how our confidence intervals don't do what we think they're going to do when any of these things are broken, and I'll focus on these latter two—the random sample condition—that's super important, frankly, in all of statistics.

So let's first look at a situation where our independence condition breaks down. So right over here, you can see that we are using our little gumball simulation. In that gumball simulation, we have a true population proportion, but someone doing these samples might not know that we're trying to construct confidence intervals with a 95 percent confidence level.

What we've said up here is we aren't replacing, so every member of our sample—we're not looking at it then putting it back in. We're just going to take a sample of 200, and I've set up the population so that it's far larger than 10 percent of the population. When I drew a bunch of samples—so this is a situation where I did almost 1,500 samples here of size 200.

What you can see here is the situations where our true population parameter was contained in the confidence interval that we calculated for that sample, and then you see in red the ones where it's not. As you can see, we are only having a hit, so to speak—the overlap between the confidence interval that we're calculating and the true population parameter—is happening about 93 percent of the time. This is a pretty large number of samples.

If this truly is at a 95 percent confidence level, this should be happening 95 percent of the time. Similarly, we can look at a situation where our normal condition breaks down, and our normal condition—we can see here that our sample size right here is 15. And actually, if I scroll down a little bit, you can see that the simulation even warns me there are fewer than 10 expected successes.

You can see that when I do, once again, I did a bunch of samples here—I did over 2,000 samples. Even though I'm trying to set up these confidence intervals that have that every time I compute it, that have over time that there's kind of a 95 percent hit rate, so to speak, here there's only a 94 hit rate.

I've done a lot of samples here, and so the big takeaway is that not being random will really skew things. But if you don't feel good about how normal the actual sampling distribution of the sample proportions is, or if your sample size is a fairly large chunk of your population and you're not replacing and you're violating the independence condition, then your confidence level that you think you're computing for when you make your confidence intervals might not be valid.

More Articles

View All
She Sails the Seas Without Maps or Compasses | Podcast | Overheard at National Geographic
Foreign, I like to think of the voyage and canoes as taking us back in time on the ocean. The Hua Kamalu is a navigator with the Polynesian Voyaging Society. I’ll often ask my crew, like, what do you think it would have been like to show up in Hawaii as t…
THE 18-YEAR-OLD who sold $10-MILLION in Real Estate his FIRST YEAR (How He Did It)
That’s how I got my first open houses. They send an office-wide email, I was on my phone, I was found in a minute. Later, a few weeks later, I closed my first deal just under 3.2 million. Since then, I’ve closed six deals; the seventh will be closing in t…
Limits of composite functions: external limit doesn't exist | AP Calculus | Khan Academy
So, over here I have two functions that have been visually or graphically defined. On the left here, I have the graph of g of x, and on the right here, I have the graph of h of x. What I want to do is figure out what is the limit of g of h of x as x appro…
The Truth Is, You're Not a Self-Improvement Project
What if I told you that you’re an addict and you don’t even know it? Don’t worry, you’re not alone. We all are, or most of us at least. And here’s a little experiment to prove it: once this video ends, turn off your phone and leave it in a drawer for the…
Node voltage method (steps 1 to 4) | Circuit analysis | Electrical engineering | Khan Academy
We’re going to talk about a really powerful way to analyze circuits called the node voltage method. Before we start talking about what this method is, we’re going to talk about a new term called a node voltage. So far, we already have the idea of an elem…
Chef Wonderful - How To Make Crepe Recipe | So Yummy Inspired Desserts
Okay, chef wonderful, here we’re starting the crepes Nambe, the amazing flambe. Now look, if you have to, every bite him. This is not an easy dish, but if you really want to get into the most incredible dessert on earth, everything has to be fresh. I like…