yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Introduction to sampling distributions


5m read
·Nov 11, 2024

So let's say I have a bag of colored balls here, and we know that 40 of the balls are orange. Now imagine defining a random variable X, and X is based on a trial where we stick our hand in this bag, we don't look around, and we randomly pick a ball, look at its color, we record it, and then we're going to put it back. So we're going to assume that we have replacement here, and we're going to say that our random variable X is going to be equal to 1 if we pick an orange ball, and it's going to be equal to 0 otherwise.

You might already recognize this as a Bernoulli random variable, and we can construct a probability distribution for X. In fact, let's do this. So X is going to be discrete; it can only take on two different values. So X can take on zero, or it can take on one. If there are forty percent of the balls are orange, the one has a forty percent chance of happening. So let me do that. So there's going to be a forty percent chance of getting a 1. So that's 0.4 right over there, and there would be a 60 chance, or a 0.6 probability of getting a 0.

So this right over here, just trying to hand-draw it, so this would be 0.6 probability of getting a zero. So we could call this the probability distribution. Probability distribution for X, this is all review so far. But the reason why I did this is because we're now going to introduce ourselves to the notion of a sampling distribution, and it can be a little bit confusing because in our brains, we tend to think in terms of probability distributions and not as much in terms of sampling distributions.

So what you do in a sampling distribution is you still start with a population here, but then you take a sample of that population. So let me label things. So this is our population. This is our population here. We take a sample; we take a sample from that population, and it could have a certain sample size, sample size n. Then we'll calculate some statistic for that sample. So we will calculate a statistic, and then we're going to think about the distributions of these statistics that we can get from these samples.

One way to think about this is keep doing this. So this is our first sample of sample size n; we calculated statistics. Then we take another sample of sample size n, and then we calculate the statistic again. Then we take another sample, and we just keep doing this. We take another sample of sample size n, and we calculate the statistic again. And let's say we were to do this an infinite number of times, and we were to plot the distribution of the statistic that we're calculating; well then we have our sampling distribution.

Let's try to make this a little bit more tangible by going back to our colored balls example and calculate or think about a sampling distribution for that. So let's say we have our population here. Population, and we know that the parameter for this population; we know that the proportion of balls that are orange, forty percent are orange. We don't always know the parameters; oftentimes we're estimating the parameters by looking at samples. But let's say we then take sample sizes of ten, so sample size 10.

Every time we calculate the statistic for our sample of what percentage are orange. So let's say the first time we take a sample, this time over here we get three oranges. Three oranges. Let's say the next time we get two oranges. Actually, let me do these as a proportion. So if my sample size is 10, I get three oranges, which is thirty percent, and then if I do it again, I get two oranges, and that is twenty percent.

And I just keep doing this, and eventually, I can plot a distribution of these sample proportions. You would end up with some type of a discrete distribution. The way to read this discrete distribution is, let's say this right over here ends up, and I'm just going to make up a number. This isn't going to be the actual number, but let's say that this is 0.15. The way to read that is you have a 15 chance of getting a sample where 50 of your balls are orange. Or if this right over here is 0.07, that would mean that you have a 7 chance where 20 of your balls are orange.

Now, to make this even a little bit more tangible, let's run a simulation that actually does this. This right over here is a simulation created on Khan Academy on our computer programming scratch pads by Charlotte Owen. It's a simulation to construct a sampling distribution. So, let's say here she's using candies instead of just colored balls, but these candies are essentially colored balls. And so here we can set the population proportion. So let's say that the actual proportion, as we saw in our example, of let's say it's green as opposed to orange here is 40 percent.

And so let's say in each sample, just as we said, our sample size is ten, so we're gonna take a sample size of ten. And let's just do one sample first. So let's just draw a sample. And so what we did is we took 10 of these gumballs out, and we are counting how many of them are green. So in this first sample of 10, we see that 1, 2, 3, 4, 5, 6 of them are green. So in the out of the possible outcomes, we're now going to tally one of our outcomes having, hey, we got six of our 10 to be green.

And if we want to show the proportion instead of just the count, we can just pick percentage here. And so here we've had one scenario already where 60 were green. But we don't want to just do one sample; we just want to keep drawing samples. Let's draw another sample. So in this last sample, we have fifty percent are green. So now that we have one was fifty percent green, one was sixty percent green, let's try another sample.

Now we have another sample where we got sixty percent green. So there are two situations where we had sixty percent green. And so I can keep doing this over and over and over. And so what we're creating right over here is a sampling distribution. If we were to do this an infinite number of times, we would get the true sampling distribution of the sample proportion given the actual population proportion that is green.

And so this is after 77 samples. Notice this is saying that out of 77 of our samples, 22 of those samples resulted in 40 percent of our gumballs being green. Only one of our samples had 80 percent of our gumballs being green. And if we just want to do a ton more samples, I'll go all the way to drawing 50 samples at a time. So let me just keep increasing this. Notice we have 17 samples now where we had zero percent that are green.

We have 91 of the 2200 samples where 10 were green, where one out of the 10 in our sample were green. And we could convert any of these numbers; 17, 91, 256, we could turn these into percentages by just dividing by the number of samples. But this is fun; we could just keep going and making this larger and larger and larger. I encourage you to play with this; I'll provide a link for it in the description of this video and on Khan Academy.

But the main idea is to get an intuition for how a sampling distribution is different from just a traditional probability distribution; that in a sampling distribution, you're taking samples from a population, calculating some statistic for that sample, and what you're plotting in the sampling distribution are the various probabilities, the various likelihoods of the outcomes for those statistics in those samples.

More Articles

View All
Warren Buffett, Brian Moynihan Speak at Georgetown
(bell rings) [Announcer] Ladies and gentlemen, please welcome to the stage Lindsay Bruinsma, an MBA candidate at the McDonough School of Business, John J. DeGioia, President of Georgetown University, Brian T. Moynihan, CEO of Bank of America, and Warren …
The Future of Cyberwarfare | Origins: The Journey of Humankind
NARRATOR: September 11, 2001, terror strikes set the tone for warfare in the 21st century. But the 21st century has also seen the rise of another kind of warfare— warfare that lets nations and loners do battle without guns or bombs. These days, the bigges…
Determining angle of rotation
We’re told that triangle A’B’C’ (so that’s this red triangle over here) is the image of triangle ABC (so that’s this blue triangle here) under rotation about the origin. So, we’re rotating about the origin here. Determine the angle of rotation. So, like …
Congratulations Kendrick Lamar and Dave Free of pgLang on winning a Webby
Man, bro, let me tell you what had went down. I was two beds away from getting, bro, whole barbershop, bro. Yeah, oh my mama, bro, Peanut gonna call my phone talking about I just got paid. I looked at the phone, “You just got paid?” What, man? What the di…
Teaching Math with Khanmigo
Meet Conmigo, your aid-driven companion who’s revolutionizing teaching for a more engaging and efficient experience. Kigo has many exciting features that support teachers, and this video will showcase ways you can use Kigo to create course-specific mathem…
The Assassin's Water Bottle
This water bottle allows you to carry two different liquids and dispense them from the same nozzle separately or together at your command. It’s a collaboration between myself and Steve Mold that you can pre-order now. It all started when Steve and I were…