Systematic random sampling | AP Statistics | Khan Academy
In this video, we're going to talk about random sampling, which we've already talked about in other videos. We're going to compare what we already know about simple random sampling to a new type of random sampling that we're going to introduce in this video, and that is systematic random sampling.
So let's look at an example. Let's say that there is a concert that is happening and we expect approximately 10,000 people to attend the concert. We want to randomly sample people at the concert. Maybe we want to do a study on how do people get to the concert? Do they drive and park? Do they ride with a friend? Do they take an Uber or a cab of some kind? We want to find a random sample, ideally without bias, to survey people.
There are a couple of ways you could do it. You could try to do a simple random sample, and that might be a case of if you could somehow get the names of all 10,000 people and put them into a big bowl like this. Then, let's say you want to sample 100 people. You could just mix up all the names, maybe on these little pieces of paper, ten thousand of them, and then pull them out, and pull out a random sample of a hundred of them. That would be a simple random sample.
But you could already imagine there might be some logistic difficulties in doing this. How are you going to get the 10,000 names? Are you going to write them on a piece of paper? You’d have to really mix some good so it's truly random who you're picking out. So, are there other ways of doing a random sample? And as you can imagine, yes, there are, and that's where systematic random sampling is useful.
One way to think about systematic random sampling is that you’re going to randomly sample a subset of the people who are maybe walking into the concert. So, let's say people get to the concert and they start forming a line to get into the concert. What you want to do in systematic random sampling is randomly pick your first person. There are a bunch of ways that you could do that.
Let’s say you have a random number generator that’ll generate a number from 1 to 100, and that’s going to be the first person you survey. If that random number generator generates a 37, then you're going to start with the 37th person in line. So you pick that first person randomly; you survey them. Remember, our goal is to sample about a hundred people out of ten thousand, so we want to roughly sample one out of every hundred people.
What you do there is, once you have that first person that you're sampling, you then sample every 100th person after that. That's called sometimes the sample interval. The reason why 100 people is because, if you sample every 100th person after that, you're going to roughly get 100 people in your sample out of a total of 10,000.
So this is going to be after 100, you’re going to sample someone else, and then after another 100, you’re going to sample someone else. Now, the reason why this is useful is that you could say, okay, that first person was random, and then every person after that it doesn't seem like there would be any bias for why they would be the hundredth person after that first person.
You don't want to just do the first hundred people because those might be the early birds, the people who maybe disproportionately parked or planned early or had some bias in some way. So you do want to make sure that you’re getting, you know, both the beginning, the middle, and the end of the line, which this method helps.
Now, we have to be careful; even systematic random sampling is not foolproof. There's a situation where inadvertently even this system has bias. Let’s say that this is the arena; this is a top view of the arena right over here, and this is the line of people coming in. This is where you are standing, and you are counting every hundredth person.
But maybe, let’s say there’s a tree right over here, and maybe there’s a road; I'm making this quite elaborate, so maybe there is a road right over here. A lot of people, maybe all of the people who are walking or taking a cab, are coming from this direction, and maybe all the people from the parking lot are coming from this direction.
And maybe you have a police officer right over here who is doing crowd control, who lets 50 people—50 of these people—in, followed by 50 of these people in. Well, in that situation, every 100th you might end up just sampling one side or the other. So you have to make sure that there isn't some bias that's being introduced into this line somehow that might distort your sample.