yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Moderating content with logical operators | Intro to CS - Python | Khan Academy


4m read
·Nov 10, 2024

Let's design a program with compound Boolean expressions.

We're working on an automated content moderation system for our site. We want our system to automatically flag posts that seem questionable so our team can investigate further and decide which ones to take down. We also want to automatically promote any posts that we think are particularly useful so that they appear at the top of the page.

Let's think about what our algorithm for flagging posts might look like. Our moderators have told us that they're especially wary of new accounts. It's not often that users who have been on our site for a while all of a sudden start posting a ton of spam. There are, of course, legitimate new users, and we don't want to discourage them from using our site by flagging their posts all the time. So we think the best thing to do might be to check for a combination of conditions we have available.

The overall sentiment of the post, whether it's positive, neutral, or negative. Here, we're most worried about negative posts. We definitely don't want users bullying each other or promoting violence on our site.

All right, now what about a better algorithm for promoting useful posts? We want to make sure not to promote posts that are negative, but we want to be fair and not just promote posts that talk about how great our site is. So we'll consider both positive and neutral posts.

Then we decided we also want to value consistent users: people who have been on our site for a while and represent trusted voices. We recognize this isn't a perfect algorithm, but we think it might be good enough for what we're trying to do here.

Next step: let's translate our content moderation algorithms into code. We start with our two pieces of data: the sentiment of the post and the user's account age in days. Our algorithm for flagging was: if the sentiment is negative and the account is new. Let's say new is less than a week old. That means our condition is: the sentiment equals equals negative, and the account age in days is less than seven. It's an AND because we only want to flag if both conditions are true.

We surround our compound condition with an if statement, and then inside the if statement, we want to print our content moderation decision. That lets our moderators know to take a closer look at this post.

Okay, now let's test this on a couple of different posts. A negative post created by a new user should get this flag. A negative post created by a longtime user shouldn't, and a positive post created by a new user shouldn't either.

Let's work on our post promotion algorithm next. This algorithm was for neutral or positive posts from trusted accounts. Because there's only three possible sentiments—positive, neutral, and negative—this is equivalent to saying sentiment is not equal to negative for trusted users. Let's go with an account age that's greater than or equal to 30 days. That's not the perfect equivalent, but we think it'll provide a good approximation.

So we add our if statement, and then inside the if statement, we just want to signal that the post has been featured. So we indent a print function call inside the statement. Then let's run through a few test cases to make sure this is working as intended: a negative post by a trusted user, a positive post by a new user, and a neutral post by a trusted user.

We have it working, but we want to refine our algorithm a bit. We are noticing that some not super useful posts are getting featured, like that post we had at the beginning that just said "hi." We think we can make a good generalization here that posts that are super short or super long are probably not the most useful. So let's add another condition to our feature case.

To do this, we need a new piece of information about each post: we need to know how many words it has. Our team says it should be easy to get this data, so we'll add a new variable: word count. This is about to make our conditions super long, so I'm going to break it up into multiple pieces to make it easier to read. Post lengths that we don't like are less than or equal to 3 words or greater than 200 words.

We use an OR here because it's suspicious if either condition is true. We store that intermediate result in a variable, and then we add it on to our feature condition. We use an AND here because we want all three of these conditions to be true in order to feature the post. However, we don't want to feature it if it's a suspicious length; we want to feature it if it's not a suspicious length. So we use the NOT operator here.

Now let's check our condition with that post that said "hi." It had a word count of one, a neutral sentiment, and a pretty old account. Great! Now that post is no longer being featured. However, I see I'm getting a lint error now where my line is too long. To fix this, I'm going to break my condition up into multiple variables.

Let's say a useful post is not negative and not a suspicious length, and then let's store the result of the account age check in a variable called is trusted user. Then our condition just becomes: if is useful post and is trusted user, which is a lot easier to understand at a glance. In fact, it's so readable that it's self-documenting such that we don't even really need this comment anymore because it just says the same thing as the variable names.

We'll go on and test with a few more cases to make sure everything works, and then we'll make sure to monitor how our algorithm performs on our site so we can keep making adjustments as needed.

More Articles

View All
How the delivery of a speech affects the impact of the words | Reading | Khan Academy
Hello readers. Today we’re talking about how the delivery of the speech affects the impact of the words. So what do I mean by that? It’s all the ways that how a person says something affects what they mean. Words on a page may have a certain definition, b…
How Weed Eaters Work (at 62,000 FRAMES PER SECOND) - Smarter Every Day 236
Hey, it’s me, Destin. Welcome back to Smarter Every Day. It’s time for the Weed Eater episode. And the way—I wanted to shut the door. The way you can tell that I’ve staged all this is that this Weed Eater’s going to crank up immediately. But here’s the de…
Motion along a curve: finding velocity magnitude | AP Calculus BC | Khan Academy
A particle moves along the curve (xy = 16) so that the y-coordinate is increasing. We underline this: the y-coordinate is increasing at a constant rate of two units per minute. That means that the rate of change of y with respect to t is equal to two. Wh…
Culture with Brian Chesky and Alfred Lin (How to Start a Startup 2014: Lecture 10)
The main stage is going to be with Brian when he comes up and talks about how he built the Airbnb culture. So you’re here, you’ve been following the presentations and now you know how to get started. You built the team, you started to sort of build your p…
Levels of Wealth (Car Edition)
Your car tells the world how well you’re doing in life. So in this video, we’ll break down what different levels of wealth drive around. Let’s see how high of a level you’ve made it to. Here are levels of wealth: the car edition. Welcome to Alux, the plac…
The SAT Question Everyone Got Wrong
In 1982, there was one SAT question that every single student got wrong. Here it is. In the figure above, the radius of circle A is 1⁄3 the radius of circle B. Starting from the position shown in the figure, circle A rolls around circle B. At the end of h…