yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

The method that can "prove" almost anything - James A. Smith


3m read
·Nov 8, 2024

In 2011, a group of researchers conducted a scientific study to find an impossible result: that listening to certain songs can make you younger. Their study involved real people, truthfully reported data, and commonplace statistical analyses. So how did they do it?

The answer lies in a statistical method scientists often use to try to figure out whether their results mean something or if they’re random noise. In fact, the whole point of the music study was to point out ways this method can be misused. A famous thought experiment explains the method: there are eight cups of tea, four with the milk added first, and four with the tea added first.

A participant must determine which are which according to taste. There are 70 different ways the cups can be sorted into two groups of four, and only one is correct. So, can she taste the difference? That’s our research question. To analyze her choices, we define what’s called a null hypothesis: that she can’t distinguish the teas.

If she can’t distinguish the teas, she’ll still get the right answer 1 in 70 times by chance. 1 in 70 is roughly .014. That single number is called a p-value. In many fields, a p-value of .05 or below is considered statistically significant, meaning there’s enough evidence to reject the null hypothesis. Based on a p-value of .014, they’d rule out the null hypothesis that she can’t distinguish the teas.

Though p-values are commonly used by both researchers and journals to evaluate scientific results, they’re really confusing, even for many scientists. That’s partly because all a p-value actually tells us is the probability of getting a certain result, assuming the null hypothesis is true. So if she correctly sorts the teas, the p-value is the probability of her doing so assuming she can’t tell the difference.

But the reverse isn’t true: the p-value doesn’t tell us the probability that she can taste the difference, which is what we’re trying to find out. So if a p-value doesn’t answer the research question, why does the scientific community use it? Well, because even though a p-value doesn’t directly state the probability that the results are due to random chance, it usually gives a pretty reliable indication.

At least, it does when used correctly. And that’s where many researchers, and even whole fields, have run into trouble. Most real studies are more complex than the tea experiment. Scientists can test their research question in multiple ways, and some of these tests might produce a statistically significant result, while others don’t.

It might seem like a good idea to test every possibility. But it’s not, because with each additional test, the chance of a false positive increases. Searching for a low p-value, and then presenting only that analysis, is often called p-hacking. It’s like throwing darts until you hit a bullseye and then saying you only threw the dart that hit the bull’s eye.

This is exactly what the music researchers did. They played three groups of participants each a different song and collected lots of information about them. The analysis they published included only two out of the three groups. Of all the information they collected, their analysis only used participants’ fathers’ age—to “control for variation in baseline age across participants.”

They also paused their experiment after every ten participants and continued if the p-value was above .05, but stopped when it dipped below .05. They found that participants who heard one song were 1.5 years younger than those who heard the other song, with a p-value of .04.

Usually, it’s much tougher to spot p-hacking, because we don’t know the results are impossible: the whole point of doing experiments is to learn something new. Fortunately, there’s a simple way to make p-values more reliable: pre-registering a detailed plan for the experiment and analysis beforehand that others can check, so researchers can’t keep trying different analyses until they find a significant result.

And, in the true spirit of scientific inquiry, there’s even a new field that’s basically science doing science on itself: studying scientific practices in order to improve them.

More Articles

View All
Worked examples: Definite integral properties 1 | AP Calculus AB | Khan Academy
We want to evaluate the definite integral from 3 to 3 of f of x dx. We’re given the graph of f of x and of y equals f of x, and the area between f of x and the x-axis over different intervals. Well, when you look at this, you actually don’t even have to …
Q&A with Destin - Smarter Every Day 148
Hey, it’s me Destin. Welcome back to Smarter Every Day. I get a lot of questions because of Smarter Every Day. Some that are personal, some that are about the channel, all different kinds of things, and I’ve never really addressed them in a formal way. So…
The Search for History’s Lost Slave Ships | Podcast | Overheard at National Geographic
When you dive, it’s a completely different world. The first time I ever saw a National Geographic explorer and storytelling fellow, Tara Roberts, wasn’t at headquarters; it was on YouTube last year. Tara was in a Nacho video about a group of Black scuba d…
2-Hour Study With Me📚 6AM EDITION⏰[Chill Lo-Fi Music🎧](50/10 Pomoro🍅)
[Music] So [Music] [Music] just [Music] don’t stop [Music] let me explain something [Music] [Music] my [Music] you [Music] so [Music] so [Music] so [Music] so [Music] so [Music] foreign [Music] [Music] so [Music] so [Music] [Music] so [Music] [Music] so […
General multiplication rule example: independent events | Probability & combinatorics
We’re told that Maya and Doug are finalists in a crafting competition. For the final round, each of them spins a wheel to determine what star material must be in their craft. Maya and Doug both want to get silk as their star material. Maya will spin first…
Darwinism vs. Social Darwinism part 2 | US History | Khan Academy
So Emily and I have been talking about how natural selection, Darwin’s theory of evolution, has differed from some of the ways that people have interpreted evolution over time. I was specifically interested in this group known as the social Darwinists, wh…