Identifying individuals, variables and categorical variables in a data set | Khan Academy
We're told that millions of Americans rely on caffeine to get them up in the morning, which is true. Although, if I drink caffeine in the morning, I'm very sensitive; I wouldn't be able to sleep at night.
Here's nutritional data on some popular drinks at Ben's Beans Coffee Shop. All right, so here we have the different names of the drinks, and then here we have the type of the drink, and it looks like they're either hot or cold. Here we have the calories for each of those drinks, here we have the sugar content in grams for each of those drinks, and here we have the caffeine in milligrams for each of those drinks.
Then we are asked, "The individuals in this data set are," and we have three choices: Ben's Beans customers, Ben's Beans drinks, or the caffeine contents. Now, we have to be careful; when someone says the individuals in a data set, they don't necessarily mean that they have to be people; they could be things. The individuals in this data set—each of these rows—are referring to a certain type of drink at Ben's Beans Coffee Shops.
So, the different types of drinks that Ben's Beans offers, those are the individuals in this data set. So, they're Ben's Beans drinks. Next, they ask us the data set contains, and they say how many variables and how many of those variables are categorical.
So, if we look up here, let's look at the variables. So, this first column—that's essentially giving us the type of drink—this wouldn't be a variable; this would be more of an identifier. But all of these other columns are representing variables.
So, for example, type is a variable; it can either be hot or cold. Because it can only take on one of kind of a number of buckets, it's either going to be hot or cold; it's going to fit in one category or another. And you don't just have two categories; you could have more than two categories, but it isn't just some type of variable number that can take on a bunch of different values.
So, this right over here is a categorical variable. Calories is not a categorical variable; you could have something with 4.1 calories; you could have something with 178. Things aren't fitting into nice buckets.
Same thing for sugars and for the caffeine; those are quantitative variables that don't just fit into a category. And so, here I would say that we have four variables: one, two, three, four, one of which is categorical. So, that would be choice A over here.