Estimating actual COVID 19 cases (novel corona virus infections) in an area based on deaths
The goal of this video is to help us all estimate the actual new COVID-19 cases per day in your area, and it's based on analysis by Thomas Pueyo. He wrote an incredible blog post on Medium; this is the link, and I'll also include it in the description below. This is the data that he uses to do some of his analysis.
Now, some of you might be thinking, "Hey, I know the number of COVID cases in my area; they're reporting it on the news every day!" But that's the reported number of cases, and that's based on the people that happen to get the test. There are a lot of people who might not have symptoms yet, or their symptoms are not severe enough to get the test yet. So, the actual cases are likely far larger than the number of confirmed cases, and we can see that in graphical form.
Once again, this is a diagram put together by Thomas Pueyo; it's a screenshot from his blog post, which once again could be found here. This is all his analysis or based off of his analysis. But this shows you what was happening in Hubei province, which is the province where Wuhan is. There are several interesting things here. The vertical axis is the number of cases, and what we see in the horizontal axis is per day.
So, for example, we could pick January 23rd. The yellow bar tells us the number of confirmed new cases that day, so these are people who would have been tested, and then they tested positive. It looks like that number is about 300. But then we have this gray bar. This gray bar is the actual number of new cases that day, which is close to 2,500, so roughly eight times as high.
Now you might be saying, "How did they know the actual number of cases if they didn't test everyone?" Well, the way they did that is when someone tested positive, they asked them, "When did you first get the symptoms?" And if they said, "Hey, I first got the symptoms 10 days ago," they would be included as a true new case—an actual new case 10 days before that on January 13.
So, the Chinese officials were able to actually make these gray bars in hindsight based on when people said they first got the symptoms, and there's a lot of really interesting information here. First of all, we can see that Wuhan was shut down on January 23rd. So, let's draw a line between the pre-shutdown and post-shutdown, and you can see just as the city officials were starting to see confirmed cases, the actual cases were far higher.
But then they shut down the city, essentially significantly slowing down the spread rate. A few days later, the actual cases, which they were able to calculate in hindsight, start to flatten out and then go down. But even though they were going down, the confirmed new cases continued to go up because there is a delay. You can even see the delay right over here, and that is roughly the amount of time between when people show symptoms and they are actually tested.
Now you might be saying, "Hey, all right, this isn't too bad! It looks like things eventually became okay for Wuhan." But this is because they did a very serious shutdown. If they did not do this shutdown and slow the spread of the virus, you would have seen this exponential growth continue.
And it's also worth remembering what I just drew this curve on. This isn't the total number of cases; this is the number of new cases per day. If you want the total number of cases at a given point in time, you would have to sum up the gray or the yellow bars, depending on whether you want to look at actual or confirmed cases.
So, as of January 22nd, if you total up all of these gray bars over here, as of January 22nd, you get approximately 12,000 cases. While if you add up all of the yellow bars, that is roughly only 444 confirmed cases. So, before the city even went into shutdown, and this is what the Chinese doing reasonably good testing, you had a far higher number of cases than the confirmed cases would make you believe.
And as large as the ratio is on a given day before the city shutdown between the number of actual new cases per day and the number of confirmed new cases per day, it's probably higher in a lot of the geographies where we live because we're not testing as well as the Chinese did.
For example, this is data once again compiled by Thomas Pueyo on his blog post. This is just a screen capture of it, and I'm really just giving his analysis. This shows the total tests performed and the tests performed per million citizens as of March 3rd, and you can see, for example, where I live, the United States is not doing so well.
And so the number of reported cases in places like the United States, where we are really just starting to ramp up testing, is far understating the number of actual cases out there. So, how do we go about estimating the actual number of cases in our area? Well, once again, I'm going to use Thomas's analysis.
We're going to be looking at the number of deaths and estimations of mortality rate, time from infection to death, and how fast the virus actually spreads. So, in other videos, I'll talk more about some of Thomas's analysis, but for mortality rate, it'll make the math simple, and this actually does seem to be a pretty good estimate.
We can assume that there's a one percent mortality rate. The reports are as low as 0.6 percent in South Korea and then as high as roughly five percent in places like Iran. But it looks like the higher numbers are where the hospital system is being overwhelmed, and then the lower numbers at the 0.6 percent might not be fully accounting for all of the mortality that will happen due to the cases that are actually out there.
So, we'll assume a mortality rate of one percent. The other thing we need to think about is the time from infection to death in those one percent of cases where someone does die. To figure that out, I will look at this data right over here. This top chart comes from this link, which Thomas cites, and I'll give the link in the description below.
This is the incubation period. This is an estimate of the time from when someone gets infected to when they start to show symptoms, and this estimate is roughly five days. Then, once you see symptoms, how long does it take to death in those one percent of cases, or whatever the percentage is?
Well, there's varying estimates, but it looks like to make the numbers easy, we can estimate roughly 15 days. So, one way to think about it is five days from infection to showing the symptoms and then another 15 days from showing the symptoms to death for a total of 20 days from infection to death in what we're assuming the one percent of cases.
So, I'll write 20 days. Now, the other thing we're going to estimate is the days to doubling. Days to double: this is how long it takes for the infection to double in the population, and this is going to be heavily dependent on what the population is doing, how dense they are, how much they're interacting.
But we'll look at some of these estimates, and they're in very different contexts. The lower the doubling rate, that means the virus is spreading very, very fast, while if you have a population that's doing all the right things, they're taking all the precautions, the doubling rate will be lower.
So, we can look at a conservative estimate and take a higher doubling rate than all of these estimates. It'll make our math a little bit easier. Let's just assume a doubling rate of five days. I'm using slightly different numbers than Thomas used, but it will be indicative, and you can do the same analysis with whatever estimates that you choose to do.
So, let's assume five days to double, which might be conservative, especially for places like the United States, where we have not taken anywhere near the action of a place like China, South Korea, or Japan.
Now, let's use these numbers to figure out what might actually be happening in our areas based on the data that we are presented with. So, let's say that we unfortunately hear on some day that there is one death in our region or in our city.
Now, based on our estimates, we're saying that the average time from infection to death is about 20 days. That means that that person would have likely contracted the virus roughly 20 days ago. 20 days ago, and so I'm going to make a timeline. This is 20 days ago; this would be 10 days ago; 10 days ago; this would be 15 days ago, and then this would be five days ago.
Now, it's possible that they were the only person who contracted the virus on that day, and then they happened to, unfortunately, get very sick and then pass away 20 days later. But if we assume that the mortality rate is roughly correct, it's quite possible that a hundred people were infected that day.
The person that we know about is that one in a hundred who actually get sick enough to pass away. And so, if we assume that on 20 days ago, that not one person but a hundred people—so the actual number of people who are infected that day is 100 infected that day.
Once again, because it's a one percent mortality rate, if we assumed a 0.5 percent mortality rate, then we would say, "Alright, there might have been 200 people infected that day," 0.5 percent of whom get all the way to death 20 days later. If you assume a 5 percent mortality rate, which would be a very unfortunate situation, but that is a mortality rate that we are seeing in different parts of the world, then you would say, "Well, maybe there were 20 people infected that day."
When you only have one, two, or three deaths in a region, that will make the estimates more difficult. But as unfortunately, we are likely to see a larger number of deaths in various regions, that will make these backward estimates more and more reasonable.
Now, if the infection rate in the population doubles every five days, what is now going to happen after five days? You're going to have 200 cases in your region—200 cases. Now, these wouldn't just be new cases; this would be the cumulative total number of cases due to those hundred.
Now, this is actually quite conservative because this is assuming that those 100 that were infected 20 days ago are the only infected cases in your region. There might be other infected cases that were infected before that date, but I'm just assuming that the hundred that were infected that day are the only cases to be conservative.
And so they double after five days, and then they'll double again after five more days, and so you will get to 400 cases after five more days. And then you will, after five more days, have doubled—and I can't even fit it on the screen anymore—you’re going to have 800 cases.
And that means today, just by evidence of that one death, you probably have on the order of—and I can't even draw the whole bar—approximately 1600 cases.
And so this is just to be a little bit sobering about how serious this is and how much the data that we actually get is actually lagging the circumstances on the ground, particularly in places like the United States, where we are barely even getting started testing.
For example, in my county, which is Santa Clara County in California, we just had our second death unfortunately reported yesterday, and there was another death five days before that. Now, there's only under a hundred reported cases in my county, but based on this analysis, the actual number of infected persons in my county is likely to be at least a factor of ten more than that, and it could be as high as one thousand, two thousand, three thousand people.
We won't know for sure until we can do the type of hindsight analysis that the Chinese had, but this is to just remind us how serious the situation actually is. So, the big takeaway here is to take all of this very, very seriously, especially because the mortality rate itself can change depending on how well equipped the hospital system can handle the situation.
If we all socially isolate and take the proper precautions, the spread rate will lower, and we won't overwhelm the hospital system and will hopefully be able to keep the mortality rate as low as possible. But if we don't take the precautions, and if we're just complacent because we see this lagging data that's being reported to us because of the lack of testing in places like the United States, then it's very possible that we eventually overwhelm the hospital system in the next few weeks, which would cause the mortality rate to go higher.