yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Types of statistical studies | Study design | AP Statistics | Khan Academy


7m read
·Nov 11, 2024

About the main types of statistical studies, so you can have a sample study, and we've already talked about this in several videos, but we'll go over it again in this one. You can have an observational study or you can have an experiment. So let's go through each of these, and always pause this video and see if you can think about what these words likely mean, if you or you might already know.

Well, sample study we have looked at; this is really where you're trying to estimate the value of a parameter for a population. So, what's an example of that? So let's say we take the population of people in a city, and so that could be hundreds of thousands of people. The parameter that you care about is how much time, on average, do they spend on a computer? So the parameter would be for the entire population. If it was possible, you would go talk to maybe there's a million people in the city, you would talk to all million of those people and ask them how much time they spend on a computer, and you would get the average, and then that would be the parameter.

So population parameter, population parameter would be average time on a computer per day, average daily time, time on a computer. Now, you determine that it's impractical to go talk to everyone, so you're not going to be able to figure out the exact population parameter, average daily time on a computer. So instead, you do a sample study. You randomly sample, and there's a lot of thought in thinking about how whether your sample is truly random. So, you randomly sample, and there's also different techniques of randomly sampling.

So you randomly sample people from your population, and then you take the average daily time on a computer for your sample, and that is going to be an estimate for the population parameter. So that's your classic sample study.

Now, in an observational study, you're not trying to estimate a parameter; you're trying to understand how two parameters in a population might move together or not. So let's say that you have a population of, let's say, you have a population of 1,000 people, and you're curious about whether average daily time on a computer relates to people's blood pressure. So, average computer time—or actually, let me write it this way instead of average computer time; it should just be computer time versus blood pressure.

So, what you do is you apply a survey to all 1,000 people, and you ask them how much time you spent on a computer and what is your blood pressure, or maybe you measure it in some way. Then, you plot it all; you look at the data and you see if those two variables move together. So, what does that mean? Well, let me draw.

So, if this axis is, let's say, this is computer time, and this axis is blood pressure, so let's say that there's one person who doesn't spend a lot of time on a computer, and they have relatively low blood pressure. There's another person who spends a lot of time and has high blood pressure. There could be someone who doesn't spend much time on a computer but has reasonably high blood pressure. But you keep doing this, and you get all these data points for those 1,000 people, and I'm not going to sit here and draw 1,000 points, but you see something like this.

So you see, hey look, it looks like there's definitely some outliers, but it looks like these two variables move together. It looks like, in general, the more computer time, the higher the blood pressure, or the higher the blood pressure, the more computer time.

And so you can make a conclusion here about these two variables correlating that they're positively correlated. A reasonable conclusion, if you did the study appropriately, would be that more computer time correlates with higher blood pressure, or that higher blood pressure correlates with more computer time.

Now, when you do these observational studies, or when you interpret these observational studies when you read about someone else's, it’s very important not to say, "Oh, well, this shows me that computer time causes blood pressure." Because this is not showing causality, and you also can't say, maybe you might say somehow people's blood pressure causes more people to spend time in front of a computer. That seems even a little bit sillier, but they're actually the same because all you're saying is that there's a correlation; these two variables move together.

You can't make a conclusion about causality that computer time causes blood pressure or that blood pressure causes some time; high blood pressure causes more computer time. Why can't you make that? Well, there could be what's called a confounding variable, sometimes called a lurking variable, where let's say that—so this is computer time, and this is blood pressure. I'll just write like that: blood looks like building, so blood pressure.

And it looks like these two things move together, we saw that, right, over here in our data. But, uh, there could be a root variable that drives both of these: a confounding variable, and that could just be the amount of physical activity someone has. So there could just be a lack of physical activity driving both; lack of activity—people who are less active spend more time in front of a computer, and people who are less active have higher blood pressure.

If you were to control for this, if you were to take a bunch of people who had a similar lack of activity or had a similar level of activity, you might see that computer time does not correlate with blood pressure, that these are just both driven by the same thing. What you're really seeing here is like, okay, uh, people with high lack of act, or who aren't active, well, it drives both of these variables.

So, once again, when you do this observational study, and if you do it well, you can draw correlations, and that might give you decent hypotheses for causality, but this does not show causality because you could have these confounding variables.

Now, experiments, and experiments are the basis of the scientific method. Experiments are all about trying to establish causality. So what you would do is if you wanted to do an experiment, you would take, and you probably wouldn't be able to do it with a thousand people. Experiments, in some way, are the hardest to do of all of these; maybe you take a hundred people, hundred people, and to avoid having this confounding variable introduce error into your experiment, you randomly assign these hundred people into two groups.

So, random assign, it's very important that they're randomly assigned, and that's nice. You might not know all of the confounding variables there, but it makes it likely that each group will have the same amount of people with lack of activity or that the lack of activity or the activity levels, on average, in each of the groups, when they're randomly assigned, it gives you a better chance that, you know, one group doesn't have a significantly different activity level than the other.

And then what you do is you have a control group, and you have a treatment group. Once again, you've randomly assigned them so control and then treatment. And what you might say is, "Okay, for some amount of time, all of you in the control group can only spend, you know, Max of 30 minutes in front of a computer." Or maybe if you really wanted to do it, you say you have to spend exactly 30 minutes on a computer, and that's maybe a little unrealistic.

And then the treatment group, you have to say, "You have to spend exactly two hours in front of a computer." I'm making up these numbers at random. It would be nice to see, okay, what was everyone's blood pressure before the experiment? You can say, "Okay, well, the averages are similar going into the experiment," and then you go some amount of time and you measure blood pressure.

If you see that, wow, this group definitely has a higher blood pressure, this group has a higher blood pressure. So the blood pressure is higher here. Once again, some of this might have just happened randomly. It might have been, you know, the people you happen to put in there, etc., etc. But depending if this was a large enough experiment and you conducted it well, this says, "Hey look, I'm feeling like there's a causality here that by making these people spend more time in front of a computer, that actually raised their blood pressure."

So once again, a sample study, you're trying to estimate a population parameter; observational study, you are seeing if there is a correlation between two things, and you have to be careful not to say, "Hey, one is causing the other," because you could have confounding variables. Experiment, you're trying to establish or show causality, and you do that by taking your group, randomly assigning to a control or treatment that should evenly or hopefully evenly distribute—not always, there's some chance it doesn't—but distribute the confounding variables.

Then you change how much of one of these variables they get and see if it drives the other variable. So anyway, in the next few videos, we'll do some examples of identifying these types of sample studies and thinking about what we can conclude from them or these types of statistical studies and see what we can conclude from them.

More Articles

View All
Photo Evidence: Glacier National Park Is Melting Away | National Geographic
All the glaciers are shrinking. In the 1800s, they were estimated to be about 150 glaciers here; however, today we only have 25 glaciers. The glaciers are measured by a number of different ways. One of the most obvious ones is using repeat photography, wh…
Face-to-Face With Wildlife in Florida’s Hidden Wilderness | Best Job Ever
When you swim into one of these Springs and then a manatee comes around the corner, it’s like everything slows down and takes a breath. It sometimes will swim right up to you; you can count the whiskers on its face or see the propeller marks on its back. …
I am making Axe Ghost
Hey, my name’s Thomas. This is unusual content for this channel. I realize I’ve been working on this video game called Ax Ghost. Just recently, I’ve published a demo of it on Steam, and I’m just going to play it here—play the current build—and let you see…
Safari Live - Day 182 | National Geographic
This program features live coverage of an African safari and may include animal kills and carcasses. Viewer discretion is advised. Good afternoon ladies and gentlemen, welcome to another Sunday sunset safari here with us in Duma in the Sabi Sands. It is …
How To Improve Cohort Retention | Startup School
[Music] Hi everyone! I’m David Lee. I am a group partner here at YC. YC has a famously simple motto: make something people want. I think it’s the purest statement of the job of startup Founders, and we talk about this a lot. But what gets talked about a l…
Meet the Founder of Stoicism | ZENO OF CITIUM
We have two ears and one mouth, so we should listen more than we say. Zeno of Citium, around 300 BC, founded the Stoic school of philosophy. He published a list of works on ethics, physics, logic, and other subjects, including his most famous work: Zeno’…