yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Marginal distribution and conditional distribution | AP Statistics | Khan Academy


4m read
·Nov 11, 2024

Let's say we're a professor at a university of a statistics class and we administer an exam. We are curious about the relationship between the amount of time that students study and the percent that they get correct on the test.

So, what we do is we grade all of the exams. We set up these buckets: the time studied - 0 to 20 minutes, 21 to 40 minutes, 41 to 60 minutes, or greater than 60 minutes. So, those are our buckets for the amount of time studying. Then, we also create buckets for the percent correct: 0 to 19% correct, 20 to 39% correct, 40 to 59% correct, 60 to 79% correct, or 80 to 100% correct.

Then, we figure out what percentage of our entire student population falls into each of these categories. For example, 2% of our students studied 21 to 40 minutes and got between 80 and 100% on the exam. Additionally, 16% studied for more than an hour, over 60 minutes, and got between 40 and 59% on the exam.

What I have right over here is a two-way table. It describes a joint distribution. The Joint distribution between... You can view these as two variables: the time studied and the percent correct.

Now, what we're going to introduce ourselves to in this video are two new ideas outside of just the joint distribution. One is the idea of the marginal distribution. I will write this in green: marginal distribution.

This is the idea of: okay, I can see I can break down my class based on both of these variables, but what if I only care about one? If I care about the distribution of just the percent correct, I don't want to break it down by time studied.

Well, if I want to figure out the distribution of percent correct, I could just total up each of these rows and I would end up with this distribution right over here. Just to make it clear, I would see that 20% of my students got 80 to 100% correct. I would see that 30% of my students got between 60 and 79% correct. I see that 35% of my students got between 40 and 59% correct. I think you see where this is going: 10% of my students got between 20 and 39% correct, and then finally, 5% of my students got between 0 and 19% correct.

All we did was total up each of these rows. Notice these now add up to 100. This describes the distribution of the scores in my class. If someone were to just give you this column, you would say, “Okay, 20% of my students got 80 to 100%. You don't know the breakdown by how much they actually studied.” You'd say 5% got between 0 and 19% on my test, but you wouldn’t know what the breakdown of that 5% was based on how much they studied.

So, this type of distribution is called a marginal distribution. Well, because you could view it as it’s written in the margin right over here. We total these rows and we write it in the margin.

Now, there's another marginal distribution we could figure out: the distribution of the amount of time people study in my class. So if we cared about that, we would total up each of these columns. We would total up each of the columns and look at this right over here.

We'd say, “Okay, 7% of my class studied between zero and 20 minutes, 15% of my class studied between 21 and 40 minutes, 43% of my class studied between 41 and 60 minutes, and 35% of my class studied more than 60 minutes, more than an hour.” If I just look at this marginal distribution, this marginal distribution of the time studied, I'm not able to get the breakout of that 35% that studied for more than an hour.

If I just looked at that marginal distribution, once again, it's called that because I'm writing in the margin, in this case, below our table. If I just looked at that marginal distribution, I would not know the breakdown by the actual percent correct.

Now, there's another type of distribution that's related to these joint distributions, or you could say these two-way tables. That's thinking about the distribution of one variable given what bucket you fall in for the other variable.

So, let me write this down. If I want to say the distribution of scores... Let me write it this way: distribution of scores among those who studied more than 60 minutes.

So, where would I get that? Well, it's all right! I'll go to the column of the people who studied more than 60 minutes and then I'd find this distribution of scores, and I see it right over here. Among the folks who studied more than 60 minutes, I have this distribution of scores.

In that group, 10% got 80 to 100%, 5% got 60 to 79%, 16% got 40 to 59%, none of them (0%) got 20 to 39%, and 4% got 0 to 19%. So, this distribution of one variable given a bucket that you're falling into for another variable... This is called a conditional distribution because you're getting a distribution conditioned on a value of another variable.

So, this right over here is a conditional distribution.

The big idea here is that you have this two-way table we're trying to relate how two variables... Well, how we’re trying to study how two variables relate to each other. If we care about just the distribution of one of the variables, for example, the time studied, we can sum up the columns here and get this marginal distribution.

If we cared about the distribution of percent correct, we could sum up the rows and get that distribution. If we wanted, in the case that I just talked about, the distribution of one variable—the distribution of one variable in this case, the distribution of scores—the distribution of percent correct given a certain value, conditioned on a value of another variable, well, that's going to be a conditional distribution.

More Articles

View All
A Steam Pit Celebration | Live Free or Die
[Music] Yeah, that’s good. Even these rim rocks are pretty warm, but most importantly, everything below the ground level’s red hot under there. Matt’s putting the finishing touches on the primitive pit he’ll use to roast his wild turkey, but it’s a delic…
Atomic radii trends | Atomic models and periodicity | High school chemistry | Khan Academy
As we continue into our journey of chemistry, we’re going to gain more and more appreciation for the periodic table of elements. We’re going to realize that it gives us all sorts of insights about how different elements relate to each other. We’re going t…
Constructing t interval for difference of means | AP Statistics | Khan Academy
Let’s say that we have two populations. So that’s the first population, and this is the second population right over here. We are going to think about the means of these populations. So let’s say this first population is the population of golden retrieve…
Variance of a binomial variable | Random variables | AP Statistics | Khan Academy
What we’re going to do in this video is continue our journey trying to understand what the expected value and what the variance of a binomial variable is going to be, or what the expected value or the variance of a binomial distribution is going to be, wh…
The Absurd Search For Dark Matter
I am at a gold mine a couple hours outside of Melbourne because, one kilometer underground, they’re putting in a detector to look for dark matter. Let’s go. (epic music) It’s gonna take 30 minutes to go down a kilometer underground. Dark matter is thought…
Your brain is lying to you..
Your brain lies to you every day, and you don’t even know it. The human brain is powerful; there’s no doubt about that, but it has its limitations. Your mind loves to simplify information, mainly for speed, and this results in cognitive bias. These biases…