Geometric distribution mean and standard deviation | AP Statistics | Khan Academy
So let's say we're going to play a game where on each person's turn they're going to keep rolling this fair six-sided die until we get a one, and we just want to see how many rolls does it take. So let's say we define some random variable, let's call it X, and let's call it the number of rolls until we get a one.
So what's the probability that X is equal to 1? Pause this video and think about it.
All right, the probability that X is equal to 1 means that it only takes us one roll to get a 1. Well, that's going to be a one-sixth probability.
Well, what's the probability that X is equal to two? Well, that means that on the first roll we get something other than a one, so that is going to be 5/6, and then on the second roll, we get a 1, so that has a 1/6 probability.
And we could keep going. What's the probability that X is equal to 3? Pause the video and think about that.
Well, that means we miss on the first two, so we have a 5/6 chance of getting something other than a 1 on the first two rolls, so we could say that's (5/6) times (5/6), or we could write (5/6) squared, and then on the third roll we have the 1/6 chance of getting the 1. So, times 1/6.
And I think you see a pattern here, and you might recognize what type of random variable this is. This is a geometric variable.
Now, how do we know that? Well, each trial or each roll is either a success or a failure. Every time we roll, we either get a one or we don't. We have the same probability of success of rolling a one each trial. These are independent trials, and there's no set number of trials. It could take us an arbitrary number of trials to get the first success, so that's what tells us that we're dealing with the geometric random variable.
Now, one question is: what is going to be the mean of this geometric random variable? Well, we proved in another video where we talk about the expected value of a geometric random variable, we're really talking about the mean of a geometric random variable, and it is a little bit intuitive. If you were to just guess what is the mean of a geometric random variable where the chance of success on each roll is 1/6, you might say, "Well, maybe on average it takes you about six tries," and you would be correct.
The mean of a geometric random variable is one over the probability of success on each trial. So in this situation, the mean is going to be 1 over the probability of success in each trial, which is 1/6, so it's equal to 6.
So, one way to think about it is on average, you would have six trials until you get a one.
Now, another question is: what's a measure of the spread of a geometric random variable? We don't prove this in another video; maybe I'll do it eventually.
The standard deviation of a geometric random variable is the mean times the square root of (1 - p), or you could just write this as the square root of (1 - p)/p. Now, in this situation, what would this be? Well, the standard deviation of this random variable, this geometric random variable, it's going to be the square root of (1 - 1/6) all of that over (1/6).
So this is going to be equal to the square root of (5/6) over (1/6), which is equal to 6 times the square root of (5/6), and this is going to be approximately equal to 5 divided by 6.
Now, we take the square root of that and then multiply that times 6 gets us to about 5.5, so approximately equal to 5.5.
And what's interesting about a geometric random variable, obviously the lowest value here in this case is 1, 2, 3, can go higher and higher, but it can go arbitrary. You could get really unlucky, and it might take you a thousand rolls in order to get that one. It could take you a million rolls—very low probability—but it could take you a million rolls in order to get that one.
And so another thing to realize about a geometric random variable's distribution, it tends to look something like this where the mean might be over here, and so you have a very long tail to the right of your mean.
And this is classic right skew, and so all geometric random variable distributions are right skewed. They have a long tail of values, an infinitely long tail of values they can take to the right.
Now, one last question. Instead of dealing with a six-sided die, what would be the situation if we were dealing with a 12-sided die? What would then be the mean of our random variable, and what would be the standard deviation of our random variable? Pause this video and think about that.
Well, the mean would be 1 over (1/12) because you have a probability of 1/12 every time of getting a 1. We're assuming we're playing the same game now with the 12-sided die, so 1 over (1/12) would be 12.
So, on average, it would take 12 rolls to get that first one, and then our standard deviation is going to be essentially this times the square root of (1 - 1/12).
Okay, let me write it this way: it's (1 - 1/12) over (1/12), which is the same thing as 12 times the square root of (11/12).
11 divided by 12 is equal to; take the square root and then multiply that times 12, and you get about 11.5, 11.5.
And so you can see with a 12-sided die, it has the same pattern where you have your mean of your random variable and then you have a standard deviation that goes a reasonable bit on either side of the mean. It's almost equal to the mean, and actually, in both situations, it's a little bit lower than the mean, but then there's many, many, many values that go far to the right of your mean.
And so you have this classical right skew for a geometric random variable.