Conditional probability tree diagram example | Probability | AP Statistics | Khan Academy
Accompany screens job applicants for illegal drug use at a certain stage in their hiring process. The specific test they use has a false positive rate of 2% and a false negative rate of 1%. Suppose that 5% of all their applicants are actually using illegal drugs.
We randomly select an applicant; given the applicant tests positive, what is the probability that they are actually on drugs? So let's work through this together.
First, let's make sure we understand what they're telling us. There is this drug test for the job applicants, and then the test has a false positive rate of 2%. What does that mean? That means that in 2% of the cases when it should have read negative—that the person didn't do the drugs—it actually read positive. It is a false positive; it should have read negative but it read positive.
Another way to think about it: if someone did not do drugs and you take this test, there's a 2% chance of saying that you did do the illegal drugs. They also say that there is a false negative rate of 1%. What does that mean? That means that 1% of the time, if someone did actually take the illegal drugs, it'll say that they didn't. It is falsely giving a negative result when it should have given a positive one.
Then they say that 5% of all their applicants are actually using illegal drugs. So there are several ways that we can think about it. One of the easiest ways to conceptualize it is just let's just make up a large number of applicants. I'll use a number where it's fairly straightforward to do the mathematics. So let's say that we start off with 10,000 applicants.
I will both talk in absolute numbers and I just made this number up; it could have been 1,000, it could have been a hundred thousand. But I like this number because it's easy to do the math that have been saying: nine thousand seven hundred and eighty-five. So this is also going to be one hundred percent of the applicants.
Now, they give us some crucial information here. They tell us that 5% of all their applicants are actually using illegal drugs. So we can immediately break this 10,000 group into the ones that are doing the drugs and the ones that are not. So, 5% are actually on drugs, 95% are not on drugs.
What is 5% of 10,000? So that would be 500. So 500 are on drugs, and once again this is 5% of our original population. How many are not on drugs? Well, 9,500 are not on drugs, and once again this is 95 percent of our group of applicants.
Now let's administer the test. What is going to happen when we administer the test to the people who are on drugs? Well, the test ideally would give a positive result; it would say positive for all of them. But we know that it's not a perfect test; it's going to give negative for some of them. It will falsely give a negative result for some of them, and we know that because it has a false negative rate of 1%.
So of these 500, 99% is going to get the correct result in that they're going to test positive. So what is 99 percent of 500? Let's see, that would be 495. So, 495 are going to test positive. I will just use a positive right over there, and then we are going to have 5—1%—which is 5, are going to test negative. They are going to falsely test negative; this is the false negative rate.
If we say what percent of our original applicant pool is on drugs and tests positive, well 495 over 10,000—this is four point nine five percent. What percent of the original applicant pool that is on drugs but tests negative for drugs? The test says they are not taking drugs. Well, this is going to be five out of ten thousand, which is 0.05 percent.
Another way that you could get these percentages is if you take five percent and multiply it by one percent, you're going to get 0.05 percent; five hundredths of a percent. If you take five percent and multiply it by 99%, you're going to get four point nine five percent.
Now let's keep going. Now let's go to the folks who aren't taking drugs, and this is where the false positive rate is going to come into effect. So we have a false positive rate of two percent. So two percent are going to test positive. What's two percent of ninety-five hundred? It's one hundred and ninety would test positive even though they're not on drugs. This is the false positive rate.
So, they are testing positive, and then the other 98 percent will correctly come out negative. The other 98 percent? So ninety-five hundred minus 190, that's going to be nine thousand three hundred and ten will correctly test negative.
Now what percent of the original applicant pool is this? Well, 190 is one point nine percent, and we could calculate it by 190 over 10,000, or you could just say two percent of ninety-five percent is one point nine percent. Once again, multiply the path along the tree.
What percent is nine thousand three hundred and ten? Well, that is going to be ninety-three point ten percent. You could say this is nine thousand three hundred ten over ten thousand, or you can multiply by the path on our probability tree here: ninety-five percent times ninety-eight percent gets us to ninety-three point ten percent.
But now I think we are ready to answer the question: given that the applicant tests positive, what is the probability that they are actually on drugs?
So let's look at the first part: given the applicant tests positive. So which applicants actually tested positive? You have these 495 here who tested positive, correctly tested positive, and then you have these 190 right over here who incorrectly tested positive. What they did test positive.
So how many tested positive? Well, we have 495 plus 190 tested positive. That's the total number that tested positive. Which of them were actually on drugs? Well, of the ones that tested positive, 495 were actually on drugs.
We have 495 divided by 495 plus 190; this is equal to 0.726. So we could say approximately 72%. Approximately 72%.
Now this is really interesting. Given that the applicant tests positive, what is the probability that they are actually on drugs? When you look at these false positive and false negative rates, they seem quite low. But now when you actually did the calculation, the probability that someone's actually on drugs is high but it's not that high. It's not like if someone were to test positive that you say, "Oh, they are definitely taking the drugs."
You could also get to this result just by using the percentages. For example, you could think in terms of what percentage of the original applicants end up testing positive. Well, that's four point nine five percent plus one point nine percent. Four point nine five and we'll just do it in terms of percent plus one point nine percent.
And of them, what percentage were actually on the drugs? Well, that was the four point nine five percent. Notice this would give you the exact same result.
Now there's an interesting takeaway here because this is saying of the people that test positive, 72 percent are actually on drugs. You could think about it the other way around: of the people who test positive for 495 plus 190, what percentage aren't on drugs? Well, that was 190, and this comes out to be approximately 28%. One hundred percent minus 72%.
So, if we were in a court of law and let's say the prosecuting attorney, let's say I got tested positive for drugs, and the prosecuting attorney says, "Look, this test is very good, it only has a false positive rate of 2%. Sal tested positive; he is probably taking drugs."
A jury who doesn't really understand this well or doesn't go through the trouble that we just did might say, "Oh yeah, Sal probably took the drugs." But when we look at this, even if I test positive using this test, there's a twenty-eight percent chance that I'm not taking drugs—that I was just in this false positive group.
The reason why this number is a good bit larger than this number is because, when we looked at the original division between those who take drugs and don't take drugs, most don't take illegal drugs. Thus, two percent of this larger group of the ones that don't take drugs, well this is actually a fairly large number relative to the percentage that do take the drugs and test positive.
So I will leave you there. This is fascinating not just for this particular case, but you will see analysis like this all the time. When we're looking at whether a certain medication is effective or a certain procedure is effective, it's important to be able to do this analysis.