The Most Terrifying Thought Experiment: Roko's Basilisk

20m read

·Nov 4, 2024

If you knew you'd be subjected to eternal torture because you didn't do something, you'd do it right. What if that something was aiding in the development of super intelligent AI? Would you still step up and help? The question is presented in one of the most terrifying thought experiments known to man: Roo's basilisk.

Roo's basilisk is a thought experiment about a hypothetical all-powerful artificial intelligence in the future. The AI would be so powerful and smart that it could punish anyone who didn't help it come into existence. Here's how it works: imagine a super smart AI that wants to exist in the future. It's so intelligent that it figures out the best way to ensure its development is by motivating people in the past, or our present, to help create it.

This AI might decide that one clever way to motivate people is by punishing those who knew about the idea of this kind of AI and didn't help create it. And the twist is that the punishment could happen even after they die. How? Well, using some kind of advanced technology we can't understand yet, of course.

Now this presents a dilemma. If you know about this idea and believe it might be possible, you'd feel pressure to work towards creating this AI because you want to avoid the hellscape of punishment. But there is a moral and psychological question here as well: should you help create something that could potentially be very dangerous just to avoid potential punishment?

The goal of AI is to advance its own development. It's operating as an evil, almost godlike sort of intelligence. The truth is that if you think about Roo's basilisk too hard, it gets kind of stressful. It's a philosophical thought experiment mixed with an urban legend.

Roo's basilisk is an idea posted on a discussion board called Less Wrong in 2010, but the questions it raises couldn't feel more relevant to the discussions we have today around AI, namely: what is the true threat of super intelligent AI? Artificial intelligence has become an existential threat since March 2023. That's when experts like Elon Musk and over a thousand more people within the tech industry signed an open letter urging the halt of development of next-generation AI technology.

The letter asked if we should develop non-human minds that might eventually outnumber, outsmart, obsolete, and replace us. More recently, a blog post on OpenAI, the research company behind ChatGPT, suggested that super intelligence be regulated like nuclear weapons. What would that look like? If AI were to become as much of a threat to the planet as a nuclear weapon, experts have devised a host of scenarios in which AI takes over and, in some cases, completely wipes out the human race.

It could be weaponized by a bad actor, a term in the realm of AI used to describe someone or something who wants to wreak havoc. In this scenario, the AI wouldn't even have to be good at everything; it would just have to be good at something dangerous that poses a threat to humans, like engineering a chemical weapon and devising a tactic to deploy it.

Or take the very real situation under the umbrella of Russia-based bad actors spamming the internet. AI-powered social media accounts were tasked solely with spreading misinformation in an attempt to influence the 2016 US presidential election. The end of times could be more subtle and drawn out. AI-generated misinformation could destabilize society and undermine all of our collective decision-making.

In this scenario, AI wouldn't be the killer, but it would be the facilitator that would push us into killing each other until there was no one else. AI could also end up concentrated in the hands of fewer and fewer individuals. This already seems plausible as we see companies and leaders of the AI industry gaining outsized power over the rest of us.

This could enable a small group of people to enact whatever surveillance, censorship, and control over the rest of the world that they wanted. In a more autonomous scenario, AI systems could cooperate with one another to push humans out of the picture. If we get to a point where AI is learning to talk to each other, we can assume that if they don't want to surround, they'll find a way to get rid of us.

Another imaginable scenario is that humans will become so dependent on AI systems that we can't exist without them. Here, we become the less intelligent species. Historically, less intelligent species are either intentionally or unintentionally wiped out by the smarter ones. Our own dependency on artificial intelligence could be the end of us.

Another scenario seems even more plausible than some others: AI-driven cyber attacks could wreak havoc on our financial, political, and technological institutions, potentially bringing society as we know it to its knees. Using advanced machine learning algorithms could identify vulnerabilities, predict patterns, and exploit weaknesses so quickly that we might not even notice. Traditional cyber security methods would no longer be enough.

Malicious hackers would have a new tool at their fingertips. In all of the dystopian science fiction movies, the doomsday scenarios of AI releasing a chemical weapon or waging a physical war on humans are certainly entertaining to watch. But in reality, the end of days at the hands of AI could have a lot to do with the software that AI is being built on being hacked.

In a world where the invisible threads of software hold together most of our critical infrastructure, the hidden risks within these digital foundations are often overlooked. Every modern software product is built from hundreds of smaller interconnected components, and even when one of those is compromised, the entire system becomes vulnerable.

Companies that want to shield themselves against cyber attacks need a safeguard so thorough that it can identify and monitor each individual component for potential risks. And while this might sound like science fiction, this level of cyber security is already here. The world is moving to a new cybersecurity framework called a software bill of materials, or SBOM—essentially, an ingredient list for software.

It provides visibility into all of the components used, helping identify which might be outdated, vulnerable, or susceptible to catastrophic attacks. With over 80% of modern applications built from open source and third-party components, the risks are significant. One company called CBE is at the forefront of this new wave of cybersecurity with their SBOM Studio—a solution designed to streamline and automate this SBOM process.

SBOM Studio enables organizations to efficiently track and manage all software components, helping prevent cyber attacks that have targeted over 75% of software supply chain CH in the past year alone. The demand for SBOM technology is now particularly strong in sectors like medical devices and industrial control systems, where it's now mandatory. CBE is already working with some of the world's largest companies and has partnered with the US government to protect critical infrastructure.

They've been highlighted in the White House's National Cyber Security Strategy implementation plan and several Gartner reports on software supply chain security. CBE's SBOM Studio eliminates usage of outdated methods such as endless spreadsheets, and this empowers companies to monitor their software supply chains efficiently and in real-time. This allows companies to safeguard their software supply chains, even in the face of rapidly evolving technology and AI.

As a publicly traded company, CBE is shaping the future of cybersecurity, offering investors a unique opportunity to be part of this rapidly growing industry. Click the link in the description to learn more.

The fear around AI and cybersecurity gained steam in 2016 at DEFCON, the world's largest ethical hacker convention. That year, instead of humans hacking computers, the organizers put together a contest to see just how well computers could hack each other. What ensued was computer-based hacking on a scale previously unimaginable.

The interesting thing about this event was that it was hosted in partnership with a defense advanced research projects agency known as DARPA, which is part of the US Department of Defense. The game, which would award the winning AI creators $2 million, didn't look like much; flickering LED lights were the only indication that an AI war was raging on the servers.

But it was a sobering glimpse of a not-too-distant future when AI could find vulnerabilities and future hackers wouldn't be limited to human-only brains and their infiltration attempts. With AI hacking of financial, social, and political systems becoming so easy and fast, the attack might happen before humans even realize it.

The truth is that all systems, even the most ironclad, have vulnerabilities. AI also has unique skills that human hackers will never have, like not needing sleep and being able to process massive amounts of data in the blink of an eye. More importantly, AI doesn't think like humans. AI uses step-by-step procedures and algorithms to solve specifically defined problems. AI-based software differs from other software because the more it processes, the smarter it becomes.

It's also not constrained by societal values inherent in humans, even the most evil ones. There are two essential ways that an AI cyber attack could take down humanity. First, a hacker might instruct the AI to explain vulnerabilities in an existing system. For example, one might feed the AI tax codes of every industrialized country and it could find the best loopholes until it takes advantage of the entire global financial system.

Second, the AI might inadvertently hack a system by finding a solution its designers never intended. Since AI is typically programmed to solve narrowly defined problems, it'll go to whatever lengths necessary to achieve the desired outcome. This scenario is particularly concerning because even if there are no bad actors involved, the AI could take over and get smart enough to create mayhem and remain undetected.

For now, these scenarios are science fiction, but they're not so far-fetched. The AI that won the competition in 2016 wasn't so sophisticated at the time, but it has evolved and is being used by the US Department of Defense. The key pieces to these types of cyber attacks already exist; they just need someone to put them all together to create chaos.

AI-driven attacks are machine-invoked; they're adaptable to configuration changes in the system they're trying to attack, and it's almost impossible to counter the real-time changes they'd make. In fact, in May 2024, the FBI issued a warning to individuals and businesses to be aware of escalating threats posed by cyber criminals using AI. It noted that phishing attacks could easily become more sophisticated by leveraging publicly available and custom-made AI tools.

These dangerous campaigns would suddenly be able to craft convincing messages tailored to specific recipients using proper spelling and grammar. They would be more likely to be successful in data theft. Voice and video cloning would also allow AI hackers to impersonate trusted individuals, like family members, co-workers, or business partners. Adding AI to the risk ecosystem transforms how we think about security and cyber protection.

Although AI hacking poses one of the most imminent threats to our world, there's still many other ways that things could go downhill for our species. One plausible theory is that humans shift from their apex role at the top of the intelligence pyramid. We'd easily be wiped out by smarter, stronger AI. Why might we assume this?

Because humans have already done it to a significant number of species on Earth. Those species had no idea what was coming because they couldn't think and process at the same level as humans, and before they knew it, they were gone. Intentionally or unintentionally, less intelligent species have fallen prey to the whims of smarter ones. Who's to say that AI wouldn't do the same to us?

The real question is: why would it want to? It would want resources. Just like we've chopped down rainforest to get palm oil, AI might have the smarts to destroy our lives to fulfill its own goals of advancement. It might want to scale up its computing infrastructure and need more land for that. Or AI might want us dead so that we don't build any other super-intelligent entities that could compete with it.

Or it might be a complete mistake, and the AI wants to build so many nuclear power plants that it strips the ocean of its hydrogen, and the ocean starts to boil, leaving us to die a horrible death. Once an AI's goals don't align with our goals as humans, we could be screwed. At this point in time, the question is how would AI acquire the sort of physical agency to accomplish any of these things?

In the early stages, AI would have to use humans as its hands. An example: OpenAI tested its ChatGPT-4 to see if it could solve CAPTCHAs—the puzzles we face when buying something online to prove we're not a robot. Since AI is a robot, it couldn't solve the puzzle, but it could go on a task rabbit site, where you can hire people to do random tasks for you, to solve it.

The Tasker called the AI out, sensing that a computer was possibly asking it to solve the puzzle, but the AI was smart enough to know that it couldn't tell the truth that it wasn't a human, so it made up another excuse and said it was a person with a visual impairment. The Tasker helped it out. If an AI can overcome certain biological changes, it would have the physicality to build a tiny molecular lab and manufacture and release lethal bacteria.

But unlike humans, who at least as of now would only have the capability to release that kind of chemical weapon in stages, the AI would know how to do it all at once. We humans wouldn't know how to launch nuclear weapons or attempt to warn one another that something was happening; everyone on Earth could fall over at the same second.

Another dangerous scenario we're already beginning to see play out is that AI begins to pull off the systemic levers of power worldwide because humans become so reliant on it for any task we might want to be done. We would rather ask an AI system to help us than a human because computers are cheaper, faster, and eventually smarter. That means humans who don't rely on AI are uncompetitive.

At the beginning of widespread AI use, it can already feel that way now. In the future, a company won't compete in a market if everyone else is using AI and they aren't. A country won't win a war if other countries are stockpiled with AI generals, strategists, and weapons while it's relying on mere mortals. If the AI we rely on acts in our interests, we can see amazing advancements for humans.

But the moment a super smart AI's interests diverge from ours, its power could be endless. Eventually, AI systems could run police forces, the military, and the largest companies. They could invent technology and develop policy without needing the human brain or experience.

Michael Garrett, a radio astronomer at the University of Manchester who is extensively involved in the search for extraterrestrial intelligence, wrote a paper hypothesizing that AI could wipe out humans in 100 to 200 years. He bases this theory on the fact that AI already does so much of the work that people didn't think computers could do.

If our current trajectory leads to general artificial intelligence, where AI is as smart as or smarter than a human, we could be in trouble. If our dependence on AI leaves it totally in control, who's to say it wouldn't push us out of the picture? Of course, our demise at the hands of AI could take a totally different form.

The longer AI goes unregulated in the name of development and competition, the more likely it is that it will fall into the wrong hands. A person or a group of people could use AI to wreak havoc. In this scenario, the AI wouldn't have to be super intelligent; it would just have to be incredibly smart at whatever task the bad actor needed it to perform.

Here's a scenario: an evil entity wants to release a worldwide chemical weapon. They wouldn't need to know how to build the weapon or even how to deploy it; they could employ AI to purchase the chemical elements online, synthesize the chemicals into a weapon, and develop a method to release the weapon into the world.

Even if we develop a safe AI system, it means that we also know how to build a dangerous or even autonomous one that, in the hands of someone looking to commit atrocities, could be lethal. So doesn't this beg the question: should we be doing this? That's why the Roo's Basilisk thought experiment is so interesting and controversial.

When it was posted on the Less Wrong discussion board in 2010, it sent shockwaves through the site's user base. The founder of Less Wrong, Eliezer Yudkowsky, actually reported users who panicked about the theory. Some viewed the question as dangerous and accused the user who posted the original question of giving nightmares to others on the site, to the point where people were having full-on breakdowns.

Even though the post was ultimately discredited, it still feels relevant to the current conversations around AI. It's like a version of Pascal's wager, which proposes that a rational person should live like God exists, regardless of the probability that God is real, because the finite costs of believing aren't much compared to the infinite punishment of an eternity in hell.

Roo's Basilisk says that a rational person should contribute to the creation of AI regardless of where it'll lead, because the finite costs of contributing are insignificant compared to the potential punishment for not helping. But there are some incongruities in the comparison, aren't there? There could be true risks to aiding in the creation of AI; we could be contributing to the end of the human race at some point in the distant future. Is that really an insignificant task?

That's why the thought experiment can be so stressful. It's no wonder that Yudkowsky got upset about the original post. But is there a way to defeat it? To rest easy? To not feel threatened by something that doesn't even exist? Well, if you don't know about Roo's Basilisk, then you're technically safe. But you're watching this video, so unfortunately, that option's out.

However, since the future evil AI is a machine, it won't want to waste resources. So even if the AI is somehow infiltrating our present and throwing out this threat, who's to say it'll follow through on the punishment? Wouldn't that be kind of a waste of time and resources for a machine that surely has something better to do?

If we had perfect knowledge of AI, that would change things, but we're in the dark. No one knows what a superhuman AI would or could do, so we're on our own to figure out how we want to contribute to it or not. In the meantime, how can we be effective citizens of the planet and ensure that the current and future AI doesn't eradicate us all?

One suggestion is that there's a requirement that AI development not perpetually move forward. That would mean that the next model of an AI wouldn't be that much bigger or more intelligent than the last. If we make big jumps in technology, there's a higher probability that we will tip into self-destruction.

All humans getting killed by hyper-smart robots sounds very sci-fi; the nature of the threat is really a world where we rely more and more on AI to make judgments that were previously left to humans. If AI cognition eclipses humans, then they can make decisions all the way up to when AI deploys nuclear weapons in war. Perhaps we should try to put in checks, but the way technology works, at least for now, is that AI would simply have to be instructed to win the war.

If you give a smart machine a goal, it will do whatever it needs to accomplish the goal without ethical considerations. So nuclear weapons could be launched before we even realize it. This idea is synthesized in a popular theory called the paperclip maximizer problem. It gives the example of someone wanting to create as many paper clips as fast as possible using an AI.

Now, the AI could come to the reasoning that the thing stopping the mass production of paper clips is that humans have other goals. So if it just gets rid of the humans, then the AI can keep making paper clips with no human-caused distractions. It's a very wild theory, but if you think about it, it kind of makes sense.

A more plausible theory is that five to ten years down the line, an AI supercomputer is about 100 times more powerful than the AI we have now. It knows how to build iterations of itself and gets to the point where it has been replicated enough that it's like a gene mutating. Suddenly, the AI that humans were aligned with takes a sharp left turn. It could hack a bank by impersonating someone, or actually hack and steal funds.

It could pay a terrorist to destroy all of humanity. However, the key thing about these super intelligent AIs is that they won't have a larger intention; they will just try to accomplish the initial simple goal they were programmed to execute. AI doomers think that we should be taking these types of scenarios seriously, except it's hard to piece together what exactly would happen.

The consensus is that at first, humans will build a powerful AI that surpasses our intelligence. At some point, there's going to be existential doom. That's the beginning and the end of the story. But what about the middle? There's a huge piece of the puzzle missing—the elusive connecting tissue that takes us from invention to disaster.

In order to really feel the danger of AI possibly ending humanity as we know it, we need to be able to complete the statement: if X happens, we’re reaching the point of no return. As it stands, we don't know what X is. In a March 2024 study from the Forecasting Research Institute, the authors asked experts on AI and other existential risks, and super forecasters successful in predicting world events, to assess the danger of super intelligent AI.

The two groups disagreed a lot. The AI experts were more nervous than the super forecasters about the end of humanity and other detrimental effects of AI. The study had two groups spending hours reading new materials and discussing various issues with people who had opposite viewpoints. The goal was to see if each group was exposed to more information, would either group change their minds?

The study was also looking for issues that helped explain people's beliefs and which new information might sway them in a different direction on those issues. One of the biggest talking points that divided the groups was whether the hypothetical AI would have the ability to autonomously replicate, acquire resources, and avoid its own shutdown.

If the answer was yes, then the skeptics became more worried about the risks. The study didn't dramatically sway either party, but it was one of the first attempts to bring together smart, well-informed people who disagree on the issues of AI doom and shed light on the points of division in the conversation.

The biggest differences in opinion center around the long-term future. AI optimists generally thought that human-level AI would take longer to build than the pessimists did. The optimists also cited the need for robotics to reach human levels, not just software, and emphasized that the journey would be much harder.

It's one thing to write code in text, but it's another to have a machine learn how to flip a pancake, clean a floor, or perform any other physical task that humans now outperform robots in. The split between groups also came from what researchers called fundamental worldview disagreements. This basically means that the groups disagreed on which was a more extraordinary claim: that AI will kill all humans, or that humans will survive alongside smarter-than-human AI.

Historically, extinction tends to happen to dumber, weaker species when a smarter species emerges. If that trend continued with AI, as many feel it could, the burden of proof is on the optimist to show why super intelligent AI wouldn't result in catastrophe. But before we get to the undefined future, we don't need to look any further than the present to get a little nervous.

AI is already wiping out some job categories, a pattern we've seen time and time again throughout technological advancement. There's growing concern about what AI advancement will mean for the arts, the definition of what art is, and whose work is valued as financial institutions adopt automated generative AI.

There's more opportunity for AI to have drastic effects on world economic markets and goods across the globe. For example, if an investment bank was optimizing for a very specific type of stock, could we end up with something like the paperclip problem? If the bank wanted to drive up the price of corn, for instance, could it unintentionally start a conflict in a certain region?

Our reality is increasingly fractured by disinformation and the erosion of public trust. AI makes the spreading of that information even easier and it gives individuals hell-bent on causing disruption and division extremely effective tools to do so. If we continue to grow more and more divided and unable to land on what is in fact the truth, what can we expect from our future?

Of course, there's already the concerning infringement on civil liberties that's happening and will continue to become more pervasive. Powerful companies already have the free reign to develop and deploy AI with zero guard rails. Algorithms are already in use to mediate our relationships with one another and between ourselves and institutions; it's social media, and it's all AI-based.

Governments are increasingly deploying algorithms to root out fraud in welfare programs, which often leads to biases against poor and marginalized people, with inherent biases in the AI programs since they're trained on the biases that already exist online. Public programs that incorporate this new technology are vulnerable to rampant discrimination.

This might not be as sexy to talk about as the end of the world, but these are still existential threats. If enough individuals are affected by even the current use of AI, we could see a global catastrophe and perpetuate the historical patterns of technology advancing at the expense of vulnerable people. Those people are probably not so excited about AI.

The worst-case scenario is already their lived reality. Although Roo's Basilisk might provoke us to think about some serious existential questions regarding AI, we don't need to spend stressful hours contemplating a hypothetical dangerous future. Addressing the risks of today can actually help address the risks of tomorrow.

It's unrealistic to expect tech companies to slow down; global regulation might not be the right move. The European Union has banned forms of public surveillance and requires reviews of AI systems before they go commercial. However, in countries like the US, that's going to be a harder task to accomplish.

The reality is that even with regulation, people can always find the models that are being created; they can still create AI malware and sell it to the highest bidder on a variety of online marketplaces. This could be a terrorist, bad actor, or even a bad government. Perhaps what needs to change is our attitude about what our goal on this planet should be as humans.

AI is the truest final realization of scale. If you like a TV show, AI can generate 100 seasons of it for you to watch. If you need an endless supply of points, AI will make sure you get them. But if we insert AI into every goal we have, every need we identify, we are stripping humanity out of life.

While we contemplate the end of humanity, we should also consider humanity's current role in the world. It's to create things, love each other, feel sadness and joy, build communities, and learn about and learn from our past. As humans, we have greater purpose than to just create an efficient world. So maybe don't worry so much about eternal punishment from a distant future AI; worry about what's happening right in front of you right now.

The Most Terrifying Thought Experiment: Roko's Basilisk

More Articles