Let's pool our medical data - John Wilbanks

12m read

·Nov 8, 2024

So I have bad news, I have good news, and I have a task.

So the bad news is that we all get sick. I get sick; you get sick, right? Every one of us gets sick. And the question really is: how sick do we get? Is it something that kills us? Is it something that we survive? Is it something that we can treat? We've gotten sick as long as we've been people, and so we've always looked for reasons to explain why we get sick.

For a long time, it was the gods, right? The gods are angry with me, or the gods are testing me, right? Or God, singular, more recently is punishing me or judging me. And as long as you look for explanations, we've wound up with something that gets closer and closer to science, which is hypotheses as to why we get sick. And as long as you've had hypotheses about why we get sick, we've tried to treat it as well.

This is Avicenna. He wrote a book over a thousand years ago called "The Canon of Medicine," and the rules he laid out for testing medicines are actually really similar to the rules we have today: that the disease and the medicine must be the same strength, the medicine needs to be pure, and in the end, we need to test it in people.

So if you put together these themes of a narrative or a hypothesis and human testing, right, you get some beautiful results. Even if we didn't have very good technologies, this is a guy named Carlos Finlay. He had a hypothesis that was way outside the box for his time, in the late 1800s. He thought yellow fever was not transmitted by dirty clothing; he thought it was transmitted by mosquitoes. And they laughed at him for 20 years. They called this guy the mosquito man. But he ran an experiment in people, right? He had his hypothesis and he tested it in people.

So he got volunteers to go move to Cuba and live in tents and be voluntarily infected with yellow fever. Some of the people in some of the tents had dirty clothes, and some of the people were in tents that were full of mosquitoes that had been exposed to yellow fever. It definitively proved that it wasn't this magic dust called fomites and your clothes that caused yellow fever.

It wasn't until we tested it in people that we actually found this out. And this is what those people signed up for. This is what it looked like to have yellow fever in Cuba at that time. You suffered in a tent in the heat, alone, and you probably died. But people volunteered for this, and it's not just a cool example of a scientific design of an experiment.

They also did this beautiful thing: they signed this document, and it's called an informed consent document. An informed consent is an idea that we should be very proud of as a society, right? It's something that separates us from the Nazis at Nuremberg and forced medical experimentation. It's the idea that agreement to join a study without understanding isn't agreement. It's something that protects us from harm, from hucksters, from people that would try to hoodwink us into a clinical study that we don't understand or that we don't agree to.

So you put together the thread of narrative, hypothesis, experimentation, and human and informed consent, and you get what we call a clinical study. It's how we do the vast majority of medical work. It doesn't really matter if it's the North, the South, the East, or the West; clinical studies form the basis of how we investigate.

So if we're going to look at a new drug, all right, we test it in people; we draw blood; we do experiments, and we gain consent for that study to make sure that we're not screwing people over as part of it. But the world is changing around the clinical study, which has been fairly well established for tens of years, if not 50 to 100 years.

Now we are able to gather data about our genomes, but as we saw earlier, our genomes aren't dispositive. We're able to gather information about our environment, and more importantly, we're able to gather information about our choices. Because it turns out that what we think of as our health is more like the interaction of our bodies, our genomes, our choices, and our environment.

And the clinical methods that we've got aren't very good at studying that because they are based on the idea of person-to-person interaction. You interact with your doctor, and you get enrolled in the study.

So this is my grandfather. I actually never met him, but he's holding my mom, and his genes are in me, right? His choices ran through to me. He was a smoker, like most people. This is my son, so my grandfather's genes go all the way through to him, and my choices are going to affect his health.

The technology between these two pictures cannot be more different, but the methodology for clinical studies has not radically changed over that time period. We just have better statistics. The way we gain informed consent was formed, in large part, after World War II, around the time that picture was taken. That was 70 years ago, and the way we gain informed consent—this tool—is created to protect us from harm, now creates silos.

So the data that we collect for prostate cancer or for Alzheimer's trials goes into silos where it can only be used for prostate cancer or for Alzheimer's research, right? It can't be networked; it can't be integrated; it cannot be used by people who aren't credentialed. So a physicist can't get access to it without filing paperwork; a computer scientist can't get access to it without filing paperwork. Computer scientists aren't patient; they don't file paperwork.

This is an accident. These are tools that we created to protect us from harm. What they're doing is protecting us from innovation now, and that wasn't the goal; it wasn't the point, right? It's a side effect, if you will, of a power recreated to take us for good.

If you think about it, the depressing thing is that Facebook would never make a change to something as important as an advertising algorithm with a sample size as small as a phase 3 clinical trial. We cannot take the information from past trials and put them together to form statistically significant samples, and that sucks, right?

So 45 percent of men develop cancer; 38 percent of women develop cancer. One in four men dies of cancer; one in five women dies of cancer, at least in the United States. And three out of the four drugs we give you if you get cancer fail. And this is personal to me. My sister is a cancer survivor; my mother-in-law is a cancer survivor. Cancer sucks, and when you have it, you don't have a lot of privacy in the hospital. You're naked the vast majority of the time. People you don't know come in and look at you and poke you and prod you.

When I tell cancer survivors that this tool we created to protect them is preventing their data from being used, especially when only three to four percent of people who have cancer ever even sign up for a clinical study, their reaction is not, "Thank you, God, for protecting my privacy." It's outrage. We have this information that we can't use, and it's an accident.

So the cost in blood and treasure of this is enormous. $226 billion a year is spent on cancer in the United States; 1,500 people a day die in the United States, and it's getting worse.

So the good news is that some things have changed, and the most important thing that's changed is that we can now measure ourselves in ways that used to be the domain of the health system. A lot of people talk about it as digital exhaust. I like to think of it as the dust that runs along behind my kid. We can reach back and grab that dust, and we can learn a lot about health from it.

So if our choices are part of our health, what we eat is a really important aspect of our health. So you can do something very simple and basic and take a picture of your food, and enough people do that we can learn a lot about how our food affects our health.

One interesting thing that came out of this—this is an app for iPhones called "The Eatery"—is that we think our pizza is significantly healthier than other people's pizza. It seems like a trivial result, but this is the sort of research that used to take the health system years and hundreds of thousands of dollars to accomplish. It was done in five months by a startup company of a couple of people.

I don't have any financial interest in it, but more non-trivially, we can get our genotypes done. And although our genotypes aren't dispositive, they give us clues. So I could show you mine. It's just a few simple letters: st CS and G's. This is the interpretation of it. As you can see I carry a 32 percent risk of prostate cancer, a 22 percent risk of psoriasis, and a 14 percent risk of Alzheimer's disease.

So that means if you're a geneticist, you're freaking out, going, "Oh my God! You're told everyone to carry the ApoE4 allele! What's wrong with you?" All right? When I got these results, I started talking to doctors, and they told me not to tell anyone. My reaction is, "Is that going to help anyone cure me when I get the disease?" And no one could tell me yes.

And I live in a web world where when you share things, beautiful stuff happens, not bad stuff. So I started putting this in my slide decks, and I got even more obnoxious. I went to my doctor and said, "I like tech; please give me back my data."

So this is my most recent blood work. As you can see, I have high cholesterol; I have particularly high bad cholesterol, and I have some bad liver numbers, but those are because we had a dinner party with a lot of good wine the night before we ran the test, all right? But look at how non-computable this information is. This is like the photograph of my granddad holding my mom from a data perspective, and I had to go into the system and get it out.

So the thing that I'm proposing we do here is that we reach behind us and we grab the dust; we reach into our bodies and we grab the genome; and we reach into the medical system and somehow extract our medical record, and we use it to build something together, which is a commons.

There's been a lot of talk about commons, right? A commons is nothing more than a public good that we build out of private goods. We do it voluntarily; we do it through standardized legal tools; we do it through standardized technologies. That's all the commons is.

It's something that we build together because we think it's important. And a commons of data is something that's really unique because we make it from our own data. Although a lot of people like privacy as their methodology of control around data, at least some of us really like to share as a form of control.

What's remarkable about digital commons is that you don't need a big percentage; if your sample size is big enough, you can generate something massive and beautiful. Not that many programmers write free software, but we have the Apache web server. Not that many people who read Wikipedia edit, but it works. So as long as some people like to share as their form of control, we can build a commons.

As long as we can get the information out, in biology, the numbers are even better. So Vanderbilt ran a study asking people, "We'd like to take your bio samples, your blood, and share them in a biobank," and only 5 percent of the people opted out. I'm from Tennessee; it's not the most science-positive state in the United States of America, but only 5 percent of people opted out. So people like to share if you give them the opportunity and the choice.

The reason that I got obsessed with this, besides the obvious family aspects, is that I spend a lot of time around mathematicians. Mathematicians are drawn to places where there's a lot of data because they can use it to tease signals out of noise. Those correlations they can tease out, they're not necessarily causal agents, but math in this day and age is like a giant set of power tools that we're leaving on the floor, not plugged in.

And health, while we use hand saws, if we have a lot of shared genotypes and a lot of shared outcomes and a lot of shared lifestyle choices and a lot of shared environmental information, we can start to tease out the correlations between subtle variations in people, the choices they make, and the health they create as a result of those choices.

As an open-source infrastructure to do all of this, Sage by Our Networks is a non-profit that's built a giant math system that's waiting for data, but there isn't any. So that's what I do. I've actually started what we think is the world's first fully digital, fully self-contributed, unlimited in scope, global, and participation ethically approved clinical research study where you contribute the data.

So if you reach behind yourself and you grab the dust, if you reach into your body and grab your genome, if you reach into the medical system and somehow extract your medical record, you can actually go through an online informed consent process. Because the donation of the commons must be voluntary and it must be informed, you can actually upload your information and have it indicated to the mathematicians who will do this sort of big data research.

The goal is to get a hundred thousand in the first year and a million in the first five years so that we have a statistically significant cohort that you can use to take smaller sample sizes from traditional research and map it against. So you can use it to tease out those subtle correlations between the variations that make us unique and the kinds of health that we need to move forward as a society.

I spent a lot of time around other commons; I've been around the early web; I've been around the early Creative Commons world, and there's four things that all of these share, which is they're all really simple.

So if you were to go to the website and enroll in this study, you're not going to see something complicated, but it's not simplistic. These things are weak intentionally, right? Because you can always add power and control to a system, but it's very difficult to remove those things if you put them in at the beginning.

Being simple doesn't mean being simplistic, and being weak doesn't mean weakness, right? Those are strengths in the system, and open doesn't mean that there's no money. Closed systems—corporations—make a lot of money on the open web, and one of the reasons why the open web exists is that corporations have a vested interest in the openness of the system.

So all of these things are part of this clinical study that we've created, and you can actually come in. All you have to be is 14 years old, willing to sign a contract that says, "I'm not going to be a jerk," basically, and you're in. You can start analyzing the data; you do have to solve a CAPTCHA as well.

Right? And if you'd like to build corporate structures on top of it, that's okay too; that's all in the consent. So if you don't like those terms, you don't come in. And it's very much the design principles of a commons that we're trying to bring to health data.

The other thing about these systems is that it only takes a small number of really unreasonable people working together to create it. It didn't take that many people to make Wikipedia, Wikipedia, or to keep it Wikipedia. And we're not supposed to be unreasonable in health.

So I hate this word "patient." I don't like being patient when systems are broken, and healthcare is broken. I'm talking about the politics of health care; I'm talking about the way we scientifically approach health care. So I don't want to be patient, and the task I'm giving to you is to not be patient.

So I'd like you to actually try, when you go home, to get your data. You'll be shocked and offended, and I would bet outraged at how hard it is to get it. But it's a challenge that I hope you'll take. And maybe you'll share it; maybe you won't. If you don't have anyone in your family who's sick, maybe you wouldn't be unreasonable. But if you do, if you've been sick, then maybe you would.

We're going to be able to do an experiment in the next several months that lets us know exactly how many unreasonable people are out there. This is the Athena Breast Health Network. It's a study of 150,000 women in California, and they're going to return all the data to the participants of the study in a computable form with one-click ability to load it into the study that I've put together.

So we'll know exactly how many people are willing to be unreasonable. So what I'd end with is: the most beautiful thing I've learned since I quit my job almost a year ago to do this is that it really doesn't take very many of us to achieve spectacular results. You just have to be willing to be unreasonable.

And the risk we're running is not the risk those 14 men who got yellow fever ran, right? It's to be naked digitally in public. So you know more about me and my health than I know about you. It's asymmetric now, and being naked and alone can be terrifying. But to be naked in a group voluntarily, it can be quite beautiful.

So it doesn't take all of us; it just takes all of some of us. Thank you.

Let's pool our medical data - John Wilbanks

More Articles