Conditional probability and independence | Probability | AP Statistics | Khan Academy
James is interested in weather conditions and whether the downtown train he sometimes takes runs on time. For a year, James records weather each day: is it sunny, cloudy, rainy, or snowy, as well as whether this train arrives on time or is delayed. His results are displayed in the table below.
"Alright, this is interesting. These columns: on time, delayed, and the total. So for example, when it was sunny, there's a total of 170 sunny days that year, 167 of which the train was on time, three of which the train was delayed. We could look at that by the different types of weather conditions, and then they say, for these days, are the events delayed and snowy independent?"
"So think about this, and remember we're only going to be able to figure out experimental probabilities. You should always view experimental probabilities somewhat suspect; the more experiments you're able to take, the more likely it is to approximate the true theoretical probability. But there's always some chance that they might be different or even quite different."
"Let's use this data to try to calculate the experimental probability. So the key question here is: what is the probability that the train is delayed? And then we want to think about what is the probability that the train is delayed, given that it is snowy."
"If we knew the theoretical probabilities, and if they were exactly the same, if the probability of being delayed was exactly the same as the probability of being delayed given snowy, then being delayed or being snowy would be independent. But if we knew the theoretical probabilities, and the probability of being delayed given snowy were different than the probability of being delayed, then we would not say that these are independent variables."
"Now, we don't know the theoretical probabilities; we're just going to calculate the experimental probabilities, and we do have a good number of experiments here. So if these are quite different, I would feel confident saying that they are dependent. If they are pretty close with the experimental probability, I would say that it would be hard to make the statement that they are dependent, and that you would probably lean towards independence. But let's calculate this."
"What is the probability that the train is just delayed? Pause this video and try to figure that out."
"Well, let's see. If we just think in general, we have a total of 365 trials or 365 experiments, and of them, the train was delayed 35 times. Now, what's the probability that the train is delayed given that it is snowy? Pause the video and try to figure that out."
"Well, let's see. We have a total of 20 snowy days, and we are delayed 12 of those 20 snowy days. So this is going to be a probability: 12/20 is the same thing as if we multiply both the numerator and the denominator by 5. This is a 60% probability, or I could say a 0.6 probability of being delayed when it is snowy."
"This is, of course, an experimental probability, which is much higher than this. This is less than 10% right over here. This right over here is less than 0.1. I could get a calculator to calculate it exactly; it'll be 9 point something percent or 0.9 something. But clearly, you are much more likely—at least from the experimental data it seems like—a much higher proportion of your snowy days are delayed than just general days."
"In general, than just general days. And so based on this data, because the experimental probability of being delayed given snowy is so much higher than the experimental probability of just being delayed, I would make the statement that these are not independent."
"So for these days, are the events delayed and snowy independent? No."