yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Moderating content with logical operators | Intro to CS - Python | Khan Academy


4m read
·Nov 10, 2024

Let's design a program with compound Boolean expressions.

We're working on an automated content moderation system for our site. We want our system to automatically flag posts that seem questionable so our team can investigate further and decide which ones to take down. We also want to automatically promote any posts that we think are particularly useful so that they appear at the top of the page.

Let's think about what our algorithm for flagging posts might look like. Our moderators have told us that they're especially wary of new accounts. It's not often that users who have been on our site for a while all of a sudden start posting a ton of spam. There are, of course, legitimate new users, and we don't want to discourage them from using our site by flagging their posts all the time. So we think the best thing to do might be to check for a combination of conditions we have available.

The overall sentiment of the post, whether it's positive, neutral, or negative. Here, we're most worried about negative posts. We definitely don't want users bullying each other or promoting violence on our site.

All right, now what about a better algorithm for promoting useful posts? We want to make sure not to promote posts that are negative, but we want to be fair and not just promote posts that talk about how great our site is. So we'll consider both positive and neutral posts.

Then we decided we also want to value consistent users: people who have been on our site for a while and represent trusted voices. We recognize this isn't a perfect algorithm, but we think it might be good enough for what we're trying to do here.

Next step: let's translate our content moderation algorithms into code. We start with our two pieces of data: the sentiment of the post and the user's account age in days. Our algorithm for flagging was: if the sentiment is negative and the account is new. Let's say new is less than a week old. That means our condition is: the sentiment equals equals negative, and the account age in days is less than seven. It's an AND because we only want to flag if both conditions are true.

We surround our compound condition with an if statement, and then inside the if statement, we want to print our content moderation decision. That lets our moderators know to take a closer look at this post.

Okay, now let's test this on a couple of different posts. A negative post created by a new user should get this flag. A negative post created by a longtime user shouldn't, and a positive post created by a new user shouldn't either.

Let's work on our post promotion algorithm next. This algorithm was for neutral or positive posts from trusted accounts. Because there's only three possible sentiments—positive, neutral, and negative—this is equivalent to saying sentiment is not equal to negative for trusted users. Let's go with an account age that's greater than or equal to 30 days. That's not the perfect equivalent, but we think it'll provide a good approximation.

So we add our if statement, and then inside the if statement, we just want to signal that the post has been featured. So we indent a print function call inside the statement. Then let's run through a few test cases to make sure this is working as intended: a negative post by a trusted user, a positive post by a new user, and a neutral post by a trusted user.

We have it working, but we want to refine our algorithm a bit. We are noticing that some not super useful posts are getting featured, like that post we had at the beginning that just said "hi." We think we can make a good generalization here that posts that are super short or super long are probably not the most useful. So let's add another condition to our feature case.

To do this, we need a new piece of information about each post: we need to know how many words it has. Our team says it should be easy to get this data, so we'll add a new variable: word count. This is about to make our conditions super long, so I'm going to break it up into multiple pieces to make it easier to read. Post lengths that we don't like are less than or equal to 3 words or greater than 200 words.

We use an OR here because it's suspicious if either condition is true. We store that intermediate result in a variable, and then we add it on to our feature condition. We use an AND here because we want all three of these conditions to be true in order to feature the post. However, we don't want to feature it if it's a suspicious length; we want to feature it if it's not a suspicious length. So we use the NOT operator here.

Now let's check our condition with that post that said "hi." It had a word count of one, a neutral sentiment, and a pretty old account. Great! Now that post is no longer being featured. However, I see I'm getting a lint error now where my line is too long. To fix this, I'm going to break my condition up into multiple variables.

Let's say a useful post is not negative and not a suspicious length, and then let's store the result of the account age check in a variable called is trusted user. Then our condition just becomes: if is useful post and is trusted user, which is a lot easier to understand at a glance. In fact, it's so readable that it's self-documenting such that we don't even really need this comment anymore because it just says the same thing as the variable names.

We'll go on and test with a few more cases to make sure everything works, and then we'll make sure to monitor how our algorithm performs on our site so we can keep making adjustments as needed.

More Articles

View All
TIL: Lionfish Jewelry Can Help Save the Ocean | Today I Learned
There are a ton of fish in the sea, but there is one fish in particular that we are working very hard to take out, and that is the invasive lionfish. Lionfish were introduced off the coast of South Florida in the mid-1980s. Lionfish are prolific breeders;…
How Bill Ackman DESTROYED the Market by 3,023%
Big part of investing is not losing money. If you can avoid losing money and then have a few great hits, you can do very, very well over time. Billionaire investor Bill Amman just shared his secret five-step formula for successfully investing in the stock…
What VCs Look for When Investing in Bio and Healthcare
Right, so welcome back. In this next panel features bio and Healthcare investors from Andreessen Horowitz, Coastal Adventures, and Ben Rock. They are some of the most respected firms out there. So, before we bring them up on stage, I wanted to introduce y…
Killer Whales: Exxon Valdez Oil Spill Nearly Decimated This Pod (Part 2) | National Geographic
Toa Nutella sweet, huh? Boom, channel 16. In the morning, we make contact with Craig Matka. He’s agreed to give us rare access to his research. Most studies on the effects of the spill started after the fact, but Craig’s work predates the spill. So if any…
3 Stocks UP BIG During the Lockdown
Hey guys, welcome back to the channel! In this video, we’re going to be talking about three stocks that have just gone absolutely beast mode because of the lockdown. So, it’s kind of an interesting situation because, generally speaking, the lockdowns hap…
5 FREE Ways to Get Better With Money
Hey guys and welcome back to the channel. Today we’re going to be discussing five awesome tips that will help you get better with money that are completely free. No fluff! I’m not going to tell you to go fill in surveys for 10 hours. I’m going to tell you…