yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Idea behind hypothesis testing


6m read
·Nov 10, 2024

What we're going to do in this video is talk about hypothesis testing, which is the heart of all of inferential statistics. Statistics that allow us to make inferences about the world. So, to give us the gist of this, let's start with a tangible example.

Let's say hypothetically you run a website that has the mission of giving everyone on the planet a free education. You want to think about how you might change the amount of time people spend on the site. Ideally, you want to increase the amount of time people spend on the site so there's more learning on the planet.

Well, currently the website has a white background like this, and the mean amount of time people spend when you have a white background is 20 minutes. You or someone on your team—maybe you—read some type of study that says people like to spend more time on yellow backgrounds. I don't actually think that's true, but let's just go with that for the sake of this video.

And so you have a hypothesis that if you actually have a yellow background, if you change your background to yellow, the mean amount of time that people spend on a yellow background is going to be different; it is not going to be equal to the mean amount of time people spend on a white background.

So the question is, how do you test this? How do you feel good about your inferences that you make from your test? That is the heart of hypothesis testing, and medical research actually almost all research involves some form of hypothesis testing.

So how would you do this? Well, the standard way to do this is to set up a couple of hypotheses—hypotheses, I should say. The first one is known as your null hypothesis, and I often think about this as the skeptic's hypothesis. Skeptics think that, "Hey, it's hard to make a difference in this world," or cynics feel like it's hard to make a difference in the world, and so they always have this null hypothesis that's saying, "Hey, you think you're making a difference, but you aren't."

So the null hypothesis is that the mean amount of time people spend on the yellow site is going to be equal to the mean amount of time that people spend on the current site—or the existing site, or on a white site. While the people who are thinking about, "Hey, how do we make change? How do I make improvements in the world?" they had some type of hypothesis, and we call that the alternative hypothesis.

The alternative hypothesis, A for alternative, is that the mean time on the yellow site is actually different; it's actually different. It's not equal to the mean amount of time on the white site.

So how do we think about this now that we've set up these hypotheses? Well, what we're going to do is assume the null hypothesis. Then we build this yellow site, and then we take a sample of the people using the yellow site. We say, "What is the probability of getting that sample mean, which is an approximation of the parameter of the true mean?"

What is the probability of getting that sample mean if we assume the null hypothesis? If the probability of getting that sample mean on the yellow site, assuming the null hypothesis, is really low, then we reject the null hypothesis, which suggests the alternative. On the other hand, if we get a sample mean that seems pretty reasonable to get if you assume the null hypothesis, then you fail to reject the null hypothesis, and that would not suggest the alternative.

Now, to make this a little bit more tangible—and we'll go over this in a lot of videos—if you assume the null hypothesis, then there are a few things you can think about. You can think about just the general distribution of the amount of time people spend on the site. It would look something like this.

We will, for this sake, assume that it's a normal distribution, and normal distributions are very important, or things that are close to normal distributions for hypothesis testing. But let's say that it's a normal distribution of the amount of time people spend on the site. There is some mean—we know that mean. So the mean that people spend on that white site is equal to 20 minutes.

And remember, we're assuming the null hypothesis. So we're assuming that this is also the amount of time that people spend on the yellow site. We've assumed the null hypothesis, and you could view this as time or distribution of time spent.

Now, one of the things we're going to talk about in future videos is, if you have this distribution, you can actually come up with another distribution of the means of samples you might get. So there's something else called the sampling distribution.

I know it's very confusing at first—the sampling distribution of the sample mean—and it'll be for a given sample size. For sample size, let's say this is sample size 1000. I'm just making things up; I could have said n, but I'm just going to make this a little bit more tangible.

Well, we're going to get statistical methods for how you can think about this distribution assuming this distribution we have on the left, and it turns out this distribution is going to look like the one on the left, but it's going to be narrower around that mean. It's going to look something like this, and actually, the larger your sample sizes are, the narrower it's going to get.

Now, remember, this isn't just the distribution of the amount of time people spend on the site; this is the distribution that if I were to take a sample of the amount of time people spend on the site and calculate the means, this is the distribution of those sample means I might get.

Now, the center of this distribution is still our mean for white, which is equal to the mean for yellow. Remember we're assuming the null hypothesis—the mean for yellow. But each of these points, like for example if I think about this, this is the amount of time that someone might spend, and you can see that there's a low probability about it.

This over here, this would be a sample mean you might get for a time that you sampled a thousand people and you calculated the mean, and you see that there's a low probability for it. So then what you would do is, if you were able to statistically generate these things assuming the null hypothesis—and don't worry too much; we'll find out the techniques for doing this and the assumptions we need to make for doing this—what we do is then take a sample of a thousand.

So you take your sample of a thousand—sample size 1000—and then from that you are able to calculate a sample mean. You are able to calculate that, and let's say you get a sample mean of 30 minutes. And let's say actually that that is right over here; this is 30 minutes right over here. The center was 20 minutes.

The next thing you do is you say, "What's the probability of getting a result at least that extreme, assuming the null hypothesis?" And that high probability on these curves would be this right tail here, and it would be a left tail that is equally far on the left side. So it would be like that.

What you do is look at this probability, which would be these yellow areas there, and then we think about the probability of getting a result at least as extreme as 30 minutes. So the probability of getting a sample mean at least as extreme as the sample mean equaling 30 minutes, assuming your null hypothesis.

And that's exactly what those yellow areas are all about, and you compare that to some pre-specified threshold. So that threshold is oftentimes five percent; sometimes it's one percent. But if this probability is less than or equal to your threshold—if it's less than or equal to your threshold, and the threshold is oftentimes denoted by the Greek letter alpha—well, we say, "Hey, that was a very low probability of getting a result at least this extreme if we assume the null hypothesis."

So that will allow us to reject the null hypothesis, which would suggest the alternative.

Notice we haven't proven the alternative; we also haven't proven that the null hypothesis is for sure false. We've just said if we assume the null hypothesis, there's a very low probability of getting a result at least as extreme as what we just got, so we will reject the null.

Now if it's the other way around—if the probability of getting a sample mean at least as extreme as this is still reasonable—if it's greater than your pre-specified threshold, then you fail to reject the null. You fail to reject your null hypothesis.

So I'll leave you there. In future videos, we'll go into much more depth into all of this. But this is to give you a sense of how hypothesis testing allows science—or all of us in the world—to start making inferences that we can feel good about.

More Articles

View All
Why Is This Field Full of Huge Presidents? | Short Film Showcase
[Music] [Applause] [Music] [Music] It was an outdoor walking park with descriptions of each president on sign boards. The park was spotless; very nice place for the family and stroll your little babies around in their strollers. Pretty neat. It wasn’t in…
Was Nero the Antichrist? | The Story of God
But why might early Christians have called Nero the Antichrist? Kim brings me to the very heart of the Vatican, St. Peter’s Square, to show me the answer. So, we know that the code 666 refers to the emperor Nero. Why? Emperor Nero was despised for many t…
A Conversation with Elizabeth Iorns - Advice for Biotech Founders
All right, guys, we’re gonna get started. Sorry for being late. So I have up here Elizabeth Irons. Is it Dr. Elizabeth Irons? No, you’re Professor Elizabeth Irons. So Elizabeth is a cancer biologist by training. You got your PhD in cancer biology from the…
Seeing Sound, Tasting Color: Synesthesia
One of the things I study in my lab is called synesthesia, and it represents a blending of the senses. So we’ve all heard the word anesthesia, which means no feeling; synesthesia means joined feeling. Somebody with synesthesia might hear music, and it c…
The Psychology of Human Aggression | J. D. Haltigan | EP 464
Why do we see a generation growing up in the way they are with sort of an undeniable, um, I guess, less of an ability to regulate their emotions than previous generations? How much of it is due to their inborn temperament? How much of it is due to being d…
Why Believing in Aliens Is Religion in Disguise | Michael Shermer
[Music] So, one of my recent columns in Scientific American is what was called “Sky Gods for Skeptics,” or as I used to call it, “Aliens for Atheists.” Basically, the idea is that aliens and extraterrestrials, in our imagination—we haven’t found any yet,…