Manipulating the YouTube Algorithm - (Part 1/3) Smarter Every Day 213
- A couple of months ago I made a Twitter thread about some weird activity I saw online, and after I posted that thread, tons of engineers from many different tech companies reached out to me privately to tell me their stories. My interest in all this started one day when I was scrolling on YouTube, and the algorithm served up a pretty weird video for me to watch. You know how the algorithm works, right? It looks at your past activity and tries to figure out what you could watch in the future that would keep you on the platform the longest. It optimizes watch time.
The algorithm suggested I watch this video. Now, I'm not a super political guy, but I know those are important topics, and it had 138,000 views and it was only one day old, so to me this looked like it was a real news story. So when I clicked on the video it got weird fast. Strange music starts playing, and a robot voice comes on, and clearly starts reading me a script.
- [Voiceover] After Trump sends note to Ginsburg, he breaks silence on plan for Supreme Court. Democrats didn't think Donald would dare. Get ready, America. Ruth Bader Ginsburg hasn't been seen in public for 57 days.
There were red flags all over the place. The robot voice was reading typos. The name of the channel itself was some generic fake news site. There's no way this is legitimate news. But the problem is it had engagement levels that were off the chart: 94% like to dislike ratio. Look at all these comments. How can this be happening on a video of such low quality?
I started diggin' a little bit deeper, so I searched YouTube for the exact same title to see what came up, and whoa, look at this, all of these videos have exactly the same title, exactly the same script, but they're all just a little bit different. If you play the videos you get different graphics scrolling across the scene. You might get a different robot voice.
[Voiceover] Breaking News Mencos, after Trump sends note to Ginsburg he breaks silence on plan for Supreme Court-- (orchestral alert)
[Voiceover] After Trump sends message to RBG he breaks silence on big plans for Supreme Court.
[Voiceover] After Trump sends note to Ginsburg he breaks silence on plan for Supreme Court. (loud alert)
[Voiceover] After Trump sends note to Ginsburg he breaks silence on plan for Supreme Court.
The content was essentially the same; it was just arranged in a different way. Different photos, different B-roll, different title screens. I'm a YouTuber and I spend some time thinking about content ID systems and things like that, and it's clear to me that these manipulations of the video and even the audio are attempts to get around YouTube's automated recognition systems.
Let me explain how this works. YouTube engineers look at the individual pixels on a video, and then they use the values of these pixels to perform some type of mathematical function, which gives you a number called a hash. You then compare hashes to other videos that have been uploaded and try to figure out if the same content has been uploaded by someone else in the past.
The challenging task here is to make a system that's fast enough to find the exact copies of specific frames of video across the entire YouTube library, while at the same time making it smart enough to not be tricked. How would you do that with math? Instead of sampling every pixel, what if they sampled specific spots on every video and measured the color values at those specific locations? They could then compare those spots with every other video uploaded to YouTube.
But think about what happens if a sneaky person resizes those images. The colors would change at those locations. The same thing would happen if you rotate the image or flip it or even apply a filter of some kind. Now I don't know how YouTube samples these pixels or the audio from the video or what mathematical functions they use, but I know that's like a company secret because that's how they defend the platform.
If the bad guys were to figure out how these detection algorithms work, then they could get around them and they could beat the defenses. If you look at these crazy videos like a software engineer, you can start to see some really interesting details. For example, why would there be a globe spinning in this image? Well if you think about it, that's going to change the hash. What if the YouTube engineers figured the globe trick out and then they shut that technique down?
Well, in this video the globe is mirrored. That's going to change the hash in a different way. This specific instance is a counter-counter-countermeasure. Which is fascinating. This one is probably my favorite. Why on earth would they put virtual snow falling over the top of a video of the royal family? If you think about it, this changes the math in a kind of random way, and therefore, YouTube's ability to detect it.
This is like a new form of camouflage that's using math instead of colors on this new battlefield where daily fights take place by opposing forces of software engineers. But instead of fighting over hills or pieces of land, the winner of these individual skirmishes gets a few moments of your time, which in today's attention economy is super valuable.
Because of the original video I found, I assumed most of this activity was on the right. But I found these videos just two clicks away from a mainstream channel on the left.
[Voiceover] Speaker of the House, Nancy Pelosi just keeps racking up wins over Donald Trump. (intense music)
[Voiceover] Speaker of the House, Nancy Pelosi, just keeps racking up wins over Donald Trump.
It's the exact same stuff just trying to manipulate things in the other direction. If you look at this channel specifically, it was started over 10 years ago by what appears to be an actual human. It uploaded a bunch of gaming content. And then at this moment right here it started uploading videos about politics.
At this point it's clear to me this is not just low quality content. This is a coordinated attack against the YouTube algorithm, complete with countermeasures. This is a serious, well-funded activity done by people meant to do harm.
If you're a teen or 20-something you probably think these old people are getting duped into voting for someone that doesn't make sense. And if you've got a few more miles on the tires, you might be looking at the younger generation and thinking, man, how are all these manipulative people able to whip them up into a frenzy so easily?
So is the internet getting worse because we're getting worse and this is just a reflection of us, or is there actually someone playing with the dials and pitting us against each other? Today's social discourse takes place on the public forum of the internet. And front and center in that forum are three primary platforms: YouTube, Twitter, and Facebook.
This video is the first in a three-part series on what exactly these external forces are doing to manipulate our social media platforms and how they're doing it. Now a key to a good lie is to convince you that there is no deception. When I started trying to research this stuff, there's all kinds of information on the internet, but it's very difficult to cut through all the falsehood.
So my approach is pretty simple. I'm literally going to get on airplanes and fly to the engineers who are trying to beat this stuff and have a straight-up conversation with 'em. We're gonna get to Twitter and Facebook later, but for the purposes of this video, let's look at specific active attempts to manipulate the YouTube algorithm.
Okay, it's time to move past the speculation stage to the actual engineering data stage. So I'm here at this building in California. The person I'm gonna talk to does not wanna be on camera, so we're gonna respect that, but I'm gonna go find out exactly what went down with these specific videos and report back.
I was just gonna stand in front of the sign and tell you what happened, but I have to think about how I'm gonna say this. This is complicated. I don't want to attribute any words to YouTube here, so let's just assume these are my words, but there seems to be two types of attack against the YouTube algorithm.
Number one, there seems to be a financial motivation. People are trying to create videos to extract ad revenue from YouTube. And so this is legitimate content. There's nothing outside of the terms of service here, except maybe the fame engagement policies that YouTube has. But for the most part, it's legitimate content uploaded and meant to extract money from YouTube.
Number two are the ideological attacks. These are attacks meant to sway public opinion and make people think certain things and perhaps even make people fight with each other. To understand this better, I rented an office in San Francisco to interview Renee DiResta. She's an expert in malicious propaganda online.
Okay, this is Renee DiResta. She is super smart on, I guess you'd say coordinated inauthentic behavior, incentivized content on social media and how to beat it, right?
Yep, I look at how different types of actors manipulate the social ecosystem across platforms.
Awesome, I loved your stuff on Rogan and Sam Harris podcasts.
Thank you.
You're just great. If you're paying attention, you've seen what she does. I wanna talk about what's going on on YouTube.
Okay.
Recently I found this really weird video. It's clearly manufactured content. And from what I can tell there's two reasons that there's manufactured content. Number one is it's financially motivated. And the second thing is ideological.
Yes.
Right? Is that correct? Is there a third component I'm not seeing?
No, that's correct. And there's actually a lot of, there's actually some overlap there. Because if you're producing really partisan content, particularly sensational stuff, you're able to capture engagement and get people paying attention. Because particularly right now in a highly partisan, polarized country, people are looking for that stuff. And they're not necessarily paying a lot of attention to who the source is.
So if you make something that looks interesting you'll be able to theoretically attract views, keep people there, and then you can both monetize and do something divisive. You're gonna use fake accounts to social it with end groups and then you're gonna try to get a critical mass of real people to come and amplify it.
- In order to get these videos in front of human eyeballs, you have to first trick a robot algorithm type thing, and the way you do that is with artificial engagement. Artificial engagement is done with fake logins or compromised accounts. They sell them like wine on the black market. A new one's gonna set you back about a quarter. A 2014 is gonna set you back about seven bucks.
Renee showed me some footage of what she calls a click farm. They use these devices to try to artificially inflate engagement online. You can easily find these places online that will sell these services to you straight up.
- You know, a lot of us, we use the number of views, number of likes, number of followers as like a hero stick for quality, and so there are hundreds and hundreds and hundreds of these businesses that just offer you things like views. So these are people who are just selling, selling likes. Funny enough, based on the number of likes on the ad for YouTube likes I would bet that they're gaming their Instagram (mumbles), too. (laughing)
The internet is fake.
- I've been doing this wrong. (laughing) I've been trying to make quality content this whole time. The strategy seems to be pretty simple. You make a bunch of videos on one particular topic. You put 'em online and then the metadata points to each other, right? At this point the artificial compromised accounts are used to give them artificial lift, and at some point, one of these videos will creep up above the noise in the algorithm and it will start to get shown to actual humans.
It's really easy to get mad at YouTube at this point, right? Look at all this stuff that's happening on the platform. But let's step back and think about it. If you were a software engineer, how would you use math and algorithms to detect this activity? I would argue that this is very, very clever, and it's very hard to detect this in an automated way.
If you look at the engagement on these videos, the majority of these comments are actual humans discussing the videos. These are real people engaging with this content. From an engineering perspective, this is extremely difficult to detect, especially at any kind of meaningful scale.
So we understand the fake engagement piece, right? But think about the content creation itself. For a video that I'm proud of, for example, it will take dozens of hours for me to make this thing, right? In this one particular case we're talking about, we saw dozens of videos uploaded in the same day. So clearly computers are involved. But how are they doing it?
Believe it or not, there's already an entire industry built around this technology. It's a great way to get these small stories out. Whatever website you go to to get your news, you've probably seen these things: automatically generated videos. Several companies offer these types of services. One is called Wochit. If you go to the Wochit website, they boast over 1.5 million videos were created on behalf of their customers last month.
These are videos created for businesses you recognize. On their website they show how the system works. You type in the topic you wanna make a video about. They ingest millions of pieces of licensed content from different sources. You slap in a script and yo have a video within minutes. This is a very expensive business-to-business service.
But for these businesses trying to make it in the attention economy, it's totally worth it. Now let's think about YouTube. The Wochit News YouTube channel has uploaded over 3,000 videos in the last two months. Most of these use actual voice actors reading a script. This is an incredible amount of content.
Stop and think about what that could mean for the future of YouTube. It works like this. You have all of this content like B roll, photos, audio, things of that nature. And it goes into this machine and out pops these videos. Which is cool if you're a newsroom and you're using a service like Wochit to try to create content for legit users online. But the problem is, this is just technology.
Think about what would happen if this was developed by people of ill intent. If you're clever, you can change the content that you're putting into the machine and the machine can start creating videos, each video with its own special flare so it can get around the countermeasures built into the YouTube system and you simply upload all these to different YouTube channels that you've created with fake email accounts and there you go. You're suddenly flooding YouTube with automatically created content that has the incentives of making you money or changing the way people think.
It's probably mostly financially motivated.
- I think it's mostly financially motivated. I think Facebook has said the same thing about propaganda. A lot of the stuff on their site, coordinated inauthentic activity, is mostly economically motivated. Even during 2016, now the notion of fake news is so tied to Russia, but fake news wasn't actually about Russia. If you remember, back in 2016 during the campaign it was about people just creating these hyper-partisan sites that were literally fake news, demonstratively fake, Pope endorses Donald Trump and this kind of stuff.
And it was just pushing people to the sites to try to make money on the ads. And so that's what I think a lot of the challenge here is. The really strong actors, the nation-states that are trying to do this kind of stuff, will spend the years to build up the audiences over time and then you have these more fly-by-night operations where a blog spins up overnight, they game their way into distribution, and then they just make money on the ads.
Someone that's doing this subversive activity, if they're doing it we're not gonna know.
Right, for a long time, for a long time. Unless they make a mistake.
So it's happening right now.
Yeah.
It's happening right now probably in the sidebar of this video people are watching right now, we just may not know it's inauthentic behavior.
I think it's really hard to find this stuff. It gets better and better. That's the other thing. I think people assume it's obviously fake or obviously, obviously incorrect English or obviously sensationalist memes or something like that. No, they actually just started repurposing their content from our own real, authentic, hyper-partisan pages.
Can we just stop and be disturbed that this is the kind of content that's getting real eyeballs? And I know what you might be thinking, well it's clearly different. I could tell the difference in that. But think about who I am, right? I'm an engineer who understands countermeasures. I think through strategy. I spend hundreds of hours a month thinking about the YouTube algorithm. I tailor my thumbnails, I understand how titles work. All of this to say I still got tricked.
And so it's a cat and mouse game. Well, basically you have offensive content, then you have a countermeasure, and then you have a counter-countermeasure, and then the YouTube engineers have to develop a counter-counter-countermeasure, and so this just continuously ratchets up and I don't see a way to win, you're not gonna win this.
No you're never gonna win. There's no winning. It's managing.
All you have to do is increase the cost for the adversary to influence society.
Regardless of whether this material is made in some far-off land to make a quick buck, or if it's from a malicious nation trying to influence a foreign election, it's all taking advantage of this flaw in your heart, the desire to fight with your neighbor. These people literally make us hate each other and then we turn around and give them our money.
If your first inclination is to be mad at YouTube right now and some kind of outrage, then you don't get it. Like you don't see what's happening here. I know these engineers. They're using all the math at their disposal to try to fix this as desperately as they can, but until our hearts change towards political grace, these people are gonna keep taking advantage of us.
I don't think what kind of laws we make to try to get around this, they're going to make us fight and we're gonna sit there and do it and then close our eyes and give them our money. We've got to be smarter than this.
I think a lot of the countering has to be done in the real world, right? It's the, I think you had said this in a Twitter thread of yours I'd read where you talked about the need to actively practice--
Love thy neighbor kind of stuff.
Right, right. I think that that's, unfortunately, it is a, they're preying on human biases. This is the thing, we have a brand new information ecosystem, right? We have democratized creation of content. We have no more gatekeepers. Anyone can say what they want, do what they want, maximized expression, algorithms to help you find people, but ultimately human nature, like the people have not changed.
So it's this fascinating new information environment ecosystem but with a very old set of biases and ways of being that are just kind of part of the human experience, and I don't think we're necessarily adept at recognizing what social media has done to us as individuals and as members of society.
And that I think is one of the key challenges, where no amount of regulating of algorithms or catching of bad guys changes that kind of fundamental truth. Trying to use real community to kind of return people to that human connection is the thing that we're missing right now and that's because it's much harder to do that, to create the kind of active unity that you're talking about 'cause like who's in charge of doing that?
Normally that would have been like your churches, your neighborhood, your community. I think, I don't know what that looks like ported online where everybody's spending their time.
- I'm not trying to scare you by showing you all this stuff. Obviously there's a lot of hardcore engineering thought that goes into everything I just showed you, but it's happening. There are bad actors trying to manipulate people online for financial gain, and I don't think it's YouTube's fault.
When somebody wants to do bad things to you, they're gonna do whatever they can. Next up, Twitter and Facebook. They were awesome. They let me set up cameras in the building. They also let me talk to some of the engineers in charge of building these countermeasures. It is a fascinating discussion.
I hope you'll join me as we continue this algorithm manipulation series. I think this is super important stuff. If there's someone that you think could benefit from knowing that this is how the internet actually works, please pass this video along to 'em.
Also, consider subscribing if you feel like this earned it. This is a ton of work and I hope it brings value to your life. That's it. I'm Destin, you're getting smarter every day. Have a good one.