yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Techniques for random sampling and avoiding bias | Study design | AP Statistics | Khan Academy


6m read
·Nov 11, 2024

Let's say that we run a school, and in that school, there is a population of students right over here. That is our population, and we want to get a sense of how these students feel about the quality of math instruction at this school. So we construct a survey, and we just need to decide who are we going to get to actually answer this survey.

One option is to just go to every member of the population, but let's just say it's a really large school. Let's say we're a college and there's 10,000 people in the college. We say, "Well, we can't just talk to everyone." So instead we say, "Let's sample this population to get an indication of how the entire school feels." We are going to sample it; we're going to sample that population.

Now, in order to avoid having bias in our response, in order for it to have the best chance of being indicative of the entire population, we want our sample to be random. So our sample could either be random or not random. It might seem at first pretty straightforward to do a random sample, but when you actually get down to it, it's not always as straightforward as you would think.

One type of random sample is just a simple random sample. So simple, simple random, random, random sample. This is saying, "All right, let me maybe assign a number to every person in the school. Maybe they already have a student ID number, and I'm just going to get a computer, a random number generator to generate the 100 people, the 100 students." So let's say there's a sample of 100 students that I'm going to apply the survey to. So that would be a simple random sample. We are just going into this whole population and randomly—let me just draw this—so this is the population. We are just randomly picking people out, and we know it's random because of a random number generator or we have a string of numbers or something like that that is allowing us to pick these students.

Now, that's pretty good. It's unlikely that you're going to have bias from this sample, but there is some probability that, just by chance, your random number generator just happened to select maybe a disproportionate number of boys over girls or a disproportionate number of freshmen or a disproportionate number of engineering majors versus English majors, and that's a possibility. So even though you're taking a simple random sample and it is truly random, once again, it's some probability that's not indicative of the entire population.

And so to mitigate that, there are other techniques at our disposal. One technique is a stratified sample. Stratified! This is the idea of taking our entire population and essentially stratifying it. So let's say we want to—we take that same population, we take that same population. I'll draw it as a square here just for convenience. We're going to stratify it by, let's say,—we're concerned that we get an appropriate sample of freshmen, sophomores, juniors, and seniors.

So we'll stratify it by freshmen, sophomores, juniors, and seniors. Instead of just sampling a 100 out of the entire pool, we sample 25 from each of these groups. So just like that! And so that makes sure that you are getting indicative responses from at least all of the different group—all of the different age groups or levels within your university.

Now, there might be another issue where you say, "Well, I'm actually more concerned that we have accurate representation of males and females in the school." There is some probability that if I do 100 random people, it's very likely that it's close to 50/50, but there's some chance, just due to randomness, that it's disproportionately male or disproportionately female. That's even possible in the stratified case.

So what you might say is, "Well, you know what I'm going to do?" There's a technique called a clustered sample. Let me write this right over here: clustered— a clustered sample! And what we do is we sample groups. Each of those groups we feel confident has a good balance of males and females.

For example, we might— instead of sampling individuals from the entire population, we might say, "Look, you know, on Tuesdays and Thursdays..." Even there, as you can tell, this is not a trivial thing to do. Let's just say that we can split our—let's say we can split our population into groups. Maybe these are classrooms, and each of these classrooms has an even distribution of males and females, or pretty close to even distributions.

So what we do is we sample the actual classrooms. That's why it's called a cluster, or cluster technique, or clustered random sample. We're going to randomly sample our classrooms, each of which have a close or maybe an exact balance of males and females. So we know that we're going to get good representation.

But we are still sampling; we're sampling from the clusters. Then we're going to survey every single person in each of these clusters—every single person in one of these classrooms. So once again, these are all forms of random surveys, or random samples. You have the simple random sample, you can stratify, or you can cluster, and then randomly pick the clusters and then survey everyone in that cluster.

Now, if these are all random samples, what are the non-random things? Well, one case of non-random you could have is a voluntary survey or voluntary sample. This might just be you tell every student at the school, "Hey, here's a web address! If you're interested, come and fill out this survey." That's likely to introduce bias because you might have maybe the students who really like the math instruction at their school more likely to fill it out.

Maybe the students who really don't like it are more likely to fill it out. Maybe it's just the kids who have more time are more likely to fill it out. So this has a good chance of introducing bias. The students who fill out the survey might just be skewed one way or the other because, you know, they volunteered for it.

Another non-random sample would be called a convenience sample, introducing bias because of convenience is a term that's often used. This might say, "Well, let's just sample the 100 first students who show up in school." That's just convenient for me because I didn't have to do these random numbers or do the stratification or do any of this clustering. But you can understand how this also would introduce bias. The first 100 students who show up at school might be the most diligent students.

Maybe they all take an early math class that has a very good instructor, or they're all happy about it. Or it might go the other way; the instructor there isn't the best one, and so it might introduce bias the other way. So if you let people volunteer or you just say, "Oh, let me go to the first N students," or you say, "Hey, let me just talk to all the students who happen to be in front of me right now," they might be in front of you out of convenience, but they might not be a true random sample.

Now there are other reasons why you might introduce bias, and it might not be because of the sampling. You might introduce bias because of the wording of your survey. You could imagine a survey that says, "Do you consider yourself lucky to get a math education that very few other people in the world have access to?" Well, that might bias you to say, "Well, yeah, I guess I feel lucky."

While if the wording was, "Do you like the fact that a disproportionate number of students at your school tend to fail algebra more than students at our surrounding schools?" Well, that might bias you negatively. So the wording really, really, really matters in surveys, and there's a lot that would go into this.

The other one is just people's—it's called response bias. And once again, this isn't about response. Response bias. This is just people not wanting to tell the truth, or maybe not wanting to respond at all. Maybe they're afraid that somehow their response is going to show up in front of their math teacher or the administrators, or if they're too negative it might be taken out on them in some way. Because of that, they might not be truthful, and so they might be overly positive or not fill it out at all.

So anyway, this is a very high-level overview of how you could think about sampling. You want to go random because it lowers the probability of introducing some bias into it. And then these are some techniques. Also, think about whether you're falling into some of these pitfalls that have a good chance of introducing bias.

More Articles

View All
Meta's Creepy AI Celebrities
What if you were able to have your loved ones live on with you long after they’re gone, to hear their voice, experience their laugh, get their advice, and tell inside jokes that only the two of you know? If someone told you they could make that happen, wo…
What Happens If A Star Explodes Near The Earth?
What would happen if a star exploded near the earth? Well, the nearest star to Earth, of course, is the sun, and it is not going to explode, but if it had eight times the mass, then it would go supernova at the end of its life. So what would that look lik…
Secant line with arbitrary difference (with simplification) | AP Calculus AB | Khan Academy
A secant line intersects the curve ( y ) is equal to ( 2x^2 + 1 ) at two points with ( x ) coordinates ( 4 ) and ( 4 + h ), where ( h ) does not equal zero. What is the slope of the secant line in terms of ( h )? Your answer must be fully expanded and sim…
Setting Up Camp in a Tree | The Great Human Race
2.4 million years ago, before humans had weapons or fire, Homo habilis retreated into the safety of trees to escape predators at night. Sounds almost like a hyena. “We have like minutes left really. I think it’s high enough.” “I mean, are you stable tho…
We Traveled Back in Time. Now Physicists Are Angry.
You’re going forward through time one second every second. Congratulations, you’re a time traveler! A bit lame, but let’s start here to get to the fun, real time travel to ride on dinosaurs and high-five Einstein. Time isn’t really a thing that passes bu…
Which is Cheaper: BUYING or RENTING a house? (DEBUNKED)
What’s up you guys! It’s Graham here. So let’s answer the age-old debate: is it cheaper to buy a house or rent a house? Now, I think there’s a common misconception out there that renting is just automatically throwing money out the window, but you can’t d…