yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Marginal distribution and conditional distribution | AP Statistics | Khan Academy


4m read
·Nov 11, 2024

Let's say we're a professor at a university of a statistics class and we administer an exam. We are curious about the relationship between the amount of time that students study and the percent that they get correct on the test.

So, what we do is we grade all of the exams. We set up these buckets: the time studied - 0 to 20 minutes, 21 to 40 minutes, 41 to 60 minutes, or greater than 60 minutes. So, those are our buckets for the amount of time studying. Then, we also create buckets for the percent correct: 0 to 19% correct, 20 to 39% correct, 40 to 59% correct, 60 to 79% correct, or 80 to 100% correct.

Then, we figure out what percentage of our entire student population falls into each of these categories. For example, 2% of our students studied 21 to 40 minutes and got between 80 and 100% on the exam. Additionally, 16% studied for more than an hour, over 60 minutes, and got between 40 and 59% on the exam.

What I have right over here is a two-way table. It describes a joint distribution. The Joint distribution between... You can view these as two variables: the time studied and the percent correct.

Now, what we're going to introduce ourselves to in this video are two new ideas outside of just the joint distribution. One is the idea of the marginal distribution. I will write this in green: marginal distribution.

This is the idea of: okay, I can see I can break down my class based on both of these variables, but what if I only care about one? If I care about the distribution of just the percent correct, I don't want to break it down by time studied.

Well, if I want to figure out the distribution of percent correct, I could just total up each of these rows and I would end up with this distribution right over here. Just to make it clear, I would see that 20% of my students got 80 to 100% correct. I would see that 30% of my students got between 60 and 79% correct. I see that 35% of my students got between 40 and 59% correct. I think you see where this is going: 10% of my students got between 20 and 39% correct, and then finally, 5% of my students got between 0 and 19% correct.

All we did was total up each of these rows. Notice these now add up to 100. This describes the distribution of the scores in my class. If someone were to just give you this column, you would say, “Okay, 20% of my students got 80 to 100%. You don't know the breakdown by how much they actually studied.” You'd say 5% got between 0 and 19% on my test, but you wouldn’t know what the breakdown of that 5% was based on how much they studied.

So, this type of distribution is called a marginal distribution. Well, because you could view it as it’s written in the margin right over here. We total these rows and we write it in the margin.

Now, there's another marginal distribution we could figure out: the distribution of the amount of time people study in my class. So if we cared about that, we would total up each of these columns. We would total up each of the columns and look at this right over here.

We'd say, “Okay, 7% of my class studied between zero and 20 minutes, 15% of my class studied between 21 and 40 minutes, 43% of my class studied between 41 and 60 minutes, and 35% of my class studied more than 60 minutes, more than an hour.” If I just look at this marginal distribution, this marginal distribution of the time studied, I'm not able to get the breakout of that 35% that studied for more than an hour.

If I just looked at that marginal distribution, once again, it's called that because I'm writing in the margin, in this case, below our table. If I just looked at that marginal distribution, I would not know the breakdown by the actual percent correct.

Now, there's another type of distribution that's related to these joint distributions, or you could say these two-way tables. That's thinking about the distribution of one variable given what bucket you fall in for the other variable.

So, let me write this down. If I want to say the distribution of scores... Let me write it this way: distribution of scores among those who studied more than 60 minutes.

So, where would I get that? Well, it's all right! I'll go to the column of the people who studied more than 60 minutes and then I'd find this distribution of scores, and I see it right over here. Among the folks who studied more than 60 minutes, I have this distribution of scores.

In that group, 10% got 80 to 100%, 5% got 60 to 79%, 16% got 40 to 59%, none of them (0%) got 20 to 39%, and 4% got 0 to 19%. So, this distribution of one variable given a bucket that you're falling into for another variable... This is called a conditional distribution because you're getting a distribution conditioned on a value of another variable.

So, this right over here is a conditional distribution.

The big idea here is that you have this two-way table we're trying to relate how two variables... Well, how we’re trying to study how two variables relate to each other. If we care about just the distribution of one of the variables, for example, the time studied, we can sum up the columns here and get this marginal distribution.

If we cared about the distribution of percent correct, we could sum up the rows and get that distribution. If we wanted, in the case that I just talked about, the distribution of one variable—the distribution of one variable in this case, the distribution of scores—the distribution of percent correct given a certain value, conditioned on a value of another variable, well, that's going to be a conditional distribution.

More Articles

View All
Linear vs. exponential growth: from data | High School Math | Khan Academy
The number of branches of an oak tree and a birch tree since 1950 are represented by the following tables. So for the oak tree, we see when time equals 0 it has 34 branches. After three years, it has 46 branches, so on and so forth. Then for the birch t…
Voyage Air Guitar on Good Morning America
There you go. Yeah, yeah, okay, and finally, this is for just finally. So check it out, so check it out. He’s got it at full size right now, but this is how it comes. This is a Voyager guitar; it’s foldable like this, but then it opens just like this. Ye…
Elizabeth Iorns on Biotech Companies in YC
So welcome to the podcast! How about we just start with your just quick background? Sure! So I’m Elizabeth Lyons. I’m the founder and CEO of Science Exchange, and I’m a cancer biologist by training. I did my PhD at the Institute of Cancer Research in Lon…
Rob Riggle Ice Climbing in Iceland | Running Wild With Bear Grylls
BEAR GRYLLS: OK, Rob. Your front points– your crampons are your main weight-bearing things. Good lord. BEAR GRYLLS (VOICEOVER): Comedian Rob Riggle and I are in a race against time, searching to find a case of supplies before nightfall. But first, we’ve …
Interpreting direction of motion from velocity-time graph | AP Calculus AB | Khan Academy
An object is moving along a line. The following graph gives the object’s velocity over time. For each point on the graph, is the object moving forward, backward, or neither? So pause this video and see if you can figure that out. All right, now let’s do …
how to remember everything you read
This video is sponsored by Curiosity Stream. Get access to my streaming service Nebula when you sign up for Curiosity Stream using the link down in the description below. [Music] Have you ever experienced this before? You like to read books here and the…