yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Introduction to sampling distributions


5m read
·Nov 11, 2024

So let's say I have a bag of colored balls here, and we know that 40 of the balls are orange. Now imagine defining a random variable X, and X is based on a trial where we stick our hand in this bag, we don't look around, and we randomly pick a ball, look at its color, we record it, and then we're going to put it back. So we're going to assume that we have replacement here, and we're going to say that our random variable X is going to be equal to 1 if we pick an orange ball, and it's going to be equal to 0 otherwise.

You might already recognize this as a Bernoulli random variable, and we can construct a probability distribution for X. In fact, let's do this. So X is going to be discrete; it can only take on two different values. So X can take on zero, or it can take on one. If there are forty percent of the balls are orange, the one has a forty percent chance of happening. So let me do that. So there's going to be a forty percent chance of getting a 1. So that's 0.4 right over there, and there would be a 60 chance, or a 0.6 probability of getting a 0.

So this right over here, just trying to hand-draw it, so this would be 0.6 probability of getting a zero. So we could call this the probability distribution. Probability distribution for X, this is all review so far. But the reason why I did this is because we're now going to introduce ourselves to the notion of a sampling distribution, and it can be a little bit confusing because in our brains, we tend to think in terms of probability distributions and not as much in terms of sampling distributions.

So what you do in a sampling distribution is you still start with a population here, but then you take a sample of that population. So let me label things. So this is our population. This is our population here. We take a sample; we take a sample from that population, and it could have a certain sample size, sample size n. Then we'll calculate some statistic for that sample. So we will calculate a statistic, and then we're going to think about the distributions of these statistics that we can get from these samples.

One way to think about this is keep doing this. So this is our first sample of sample size n; we calculated statistics. Then we take another sample of sample size n, and then we calculate the statistic again. Then we take another sample, and we just keep doing this. We take another sample of sample size n, and we calculate the statistic again. And let's say we were to do this an infinite number of times, and we were to plot the distribution of the statistic that we're calculating; well then we have our sampling distribution.

Let's try to make this a little bit more tangible by going back to our colored balls example and calculate or think about a sampling distribution for that. So let's say we have our population here. Population, and we know that the parameter for this population; we know that the proportion of balls that are orange, forty percent are orange. We don't always know the parameters; oftentimes we're estimating the parameters by looking at samples. But let's say we then take sample sizes of ten, so sample size 10.

Every time we calculate the statistic for our sample of what percentage are orange. So let's say the first time we take a sample, this time over here we get three oranges. Three oranges. Let's say the next time we get two oranges. Actually, let me do these as a proportion. So if my sample size is 10, I get three oranges, which is thirty percent, and then if I do it again, I get two oranges, and that is twenty percent.

And I just keep doing this, and eventually, I can plot a distribution of these sample proportions. You would end up with some type of a discrete distribution. The way to read this discrete distribution is, let's say this right over here ends up, and I'm just going to make up a number. This isn't going to be the actual number, but let's say that this is 0.15. The way to read that is you have a 15 chance of getting a sample where 50 of your balls are orange. Or if this right over here is 0.07, that would mean that you have a 7 chance where 20 of your balls are orange.

Now, to make this even a little bit more tangible, let's run a simulation that actually does this. This right over here is a simulation created on Khan Academy on our computer programming scratch pads by Charlotte Owen. It's a simulation to construct a sampling distribution. So, let's say here she's using candies instead of just colored balls, but these candies are essentially colored balls. And so here we can set the population proportion. So let's say that the actual proportion, as we saw in our example, of let's say it's green as opposed to orange here is 40 percent.

And so let's say in each sample, just as we said, our sample size is ten, so we're gonna take a sample size of ten. And let's just do one sample first. So let's just draw a sample. And so what we did is we took 10 of these gumballs out, and we are counting how many of them are green. So in this first sample of 10, we see that 1, 2, 3, 4, 5, 6 of them are green. So in the out of the possible outcomes, we're now going to tally one of our outcomes having, hey, we got six of our 10 to be green.

And if we want to show the proportion instead of just the count, we can just pick percentage here. And so here we've had one scenario already where 60 were green. But we don't want to just do one sample; we just want to keep drawing samples. Let's draw another sample. So in this last sample, we have fifty percent are green. So now that we have one was fifty percent green, one was sixty percent green, let's try another sample.

Now we have another sample where we got sixty percent green. So there are two situations where we had sixty percent green. And so I can keep doing this over and over and over. And so what we're creating right over here is a sampling distribution. If we were to do this an infinite number of times, we would get the true sampling distribution of the sample proportion given the actual population proportion that is green.

And so this is after 77 samples. Notice this is saying that out of 77 of our samples, 22 of those samples resulted in 40 percent of our gumballs being green. Only one of our samples had 80 percent of our gumballs being green. And if we just want to do a ton more samples, I'll go all the way to drawing 50 samples at a time. So let me just keep increasing this. Notice we have 17 samples now where we had zero percent that are green.

We have 91 of the 2200 samples where 10 were green, where one out of the 10 in our sample were green. And we could convert any of these numbers; 17, 91, 256, we could turn these into percentages by just dividing by the number of samples. But this is fun; we could just keep going and making this larger and larger and larger. I encourage you to play with this; I'll provide a link for it in the description of this video and on Khan Academy.

But the main idea is to get an intuition for how a sampling distribution is different from just a traditional probability distribution; that in a sampling distribution, you're taking samples from a population, calculating some statistic for that sample, and what you're plotting in the sampling distribution are the various probabilities, the various likelihoods of the outcomes for those statistics in those samples.

More Articles

View All
Subtracting 1 vs. subtracting 10 | Addition and subtraction within 100 | 2nd grade | Khan Academy
What I want you to do is pause the video and think about what 27 minus one is, and then think about what 27 minus 10 is. Alright, you might have found it pretty straightforward, but I want to think about it in terms of place value. So let’s focus on 27.…
The Future of Artificial Intelligence | StarTalk
I think for a lot of people, the word robot conjures up a humanoid robot. I think that’s a little bit different. I try to disavow people of that, because human body—why does nothing—why? Right, we can do that stuff. We’re not some model of anything, right…
The Call of the Land: Meet The Next Generation of Farmers | Short Film Showcase
Well, there’s no other real choice, is there, but to fix what we have? It’s kind of like you don’t have that much control over what you’re passionate about. We’re not really used to hard work, a lot of people. We didn’t grow up on farms; we didn’t grow wi…
Why Is Yawning Contagious?
Hey, Vsauce. Michael here. And today we’re going to talk about yawning. Why do we yawn and why is yawning contagious? How come when I see someone yawn or even think about it, it makes me kinda of want to yawn? First things first, definitions. When you y…
Scaling Culture | Jason Kilar, former Hulu CEO
So my name is Jason. Um, uh, I was asked to, uh, speak about culture, and I’m going to do it through two lenses: my observations about culture and then, really importantly for this day, my observations of how to efficiently scale culture. I wanted to sha…
How he made $100,000 his first year as a Real Estate Agent
What’s up you guys? It’s Graham here. So I’m actually all the way in London, Ontario for the next week visiting family, and I got linked up with Jeff. Why vote here? And Jeff and I actually go back pretty far. Almost like, yeah, it’s been good. It’s been …