yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

How to get better at video games, according to babies - Brian Christian


3m read
·Nov 8, 2024

In 2013, a group of researchers at DeepMind in London had set their sights on a grand challenge. They wanted to create an AI system that could beat, not just a single Atari game, but every Atari game. They developed a system they called Deep Q Networks, or DQN, and less than two years later, it was superhuman. DQN was getting scores 13 times better than professional human games testers at “Breakout,” 17 times better at “Boxing,” and 25 times better at “Video Pinball.”

But there was one notable, and glaring, exception. When playing “Montezuma’s Revenge,” DQN couldn’t score a single point, even after playing for weeks. What was it that made this particular game so vexingly difficult for AI? And what would it take to solve it? Spoiler alert: babies. We’ll come back to that in a minute.

Playing Atari games with AI involves what’s called reinforcement learning, where the system is designed to maximize some kind of numerical rewards. In this case, those rewards were simply the game's points. This underlying goal drives the system to learn which buttons to press and when to press them to get the most points. Some systems use model-based approaches, where they have a model of the environment that they can use to predict what will happen next once they take a certain action. DQN, however, is model free. Instead of explicitly modeling its environment, it just learns to predict, based on the images on screen, how many future points it can expect to earn by pressing different buttons.

For instance, “if the ball is here and I move left, more points, but if I move right, no more points.” But learning these connections requires a lot of trial and error. The DQN system would start by mashing buttons randomly, and then slowly piece together which buttons to mash when in order to maximize its score. But in playing “Montezuma’s Revenge,” this approach of random button-mashing fell flat on its face. A player would have to perform this entire sequence just to score their first points at the very end. A mistake? Game over. So how could DQN even know it was on the right track?

This is where babies come in. In studies, infants consistently look longer at pictures they haven’t seen before than ones they have. There just seems to be something intrinsically rewarding about novelty. This behavior has been essential in understanding the infant mind. It also turned out to be the secret to beating “Montezuma’s Revenge.” The DeepMind researchers worked out an ingenious way to plug this preference for novelty into reinforcement learning.

They made it so that unusual or new images appearing on the screen were every bit as rewarding as real in-game points. Suddenly, DQN was behaving totally differently from before. It wanted to explore the room it was in, to grab the key and escape through the locked door— not because it was worth 100 points, but for the same reason we would: to see what was on the other side.

With this new drive, DQN not only managed to grab that first key— it explored all the way through 15 of the temple’s 24 chambers. But emphasizing novelty-based rewards can sometimes create more problems than it solves. A novelty-seeking system that’s played a game too long will eventually lose motivation. If it’s seen it all before, why go anywhere?

Alternately, if it encounters, say, a television, it will freeze. The constant novel images are essentially paralyzing. The ideas and inspiration here go in both directions. AI researchers stuck on a practical problem, like how to get DQN to beat a difficult game, are turning increasingly to experts in human intelligence for ideas.

At the same time, AI is giving us new insights into the ways we get stuck and unstuck: into boredom, depression, and addiction, along with curiosity, creativity, and play.

More Articles

View All
How we use the video wall to sell corporate jets.
This is an Airbus 319 320. So, usually when somebody comes in, I’ll send them in here for a little bit to sort of get the feeling of being in the plane. I’ll say, “How much you want to spend?” So let’s say the guys want to spend 20 million bucks. Out of …
Graphing hundredths from 0 to 0.1 | Math | 4th grade | Khan Academy
Graph 0.04 on the number line. So here we have this number line that goes from 0 to 0.1, or 1⁄10. Between 0 and 1⁄10, we have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 equal spaces. Each of these spaces represents 1⁄10 of the distance. It’s 1 out of 10 equal spaces,…
Capturing Death - What One Photographer Learned on Assignment | Exposure
How do you want to die? Is really the question. You know, what is the quality of your death? What is the quality of a good death? It is the thing that we’re most afraid of. You’re going to die. You will be no more. Who, whoever it is that you believe you …
Can Money Buy Happiness? Yes, According to Philosophy & Science
Some people claim that money is the root of all evil, pointing at the enormous amounts of violence humanity imposes on itself motivated by acquiring it. Others argue that not money but the lack of money is the root of evil, as people, out of fear of being…
The Slight Edge by Jeff Olson: Summary
Hey, it’s Joey and welcome to Better Ideas! If you’re like most people, you’ve had a vision of your potential future self: the richer, better looking, better groomed, happier version of yourself. Have you ever wondered if you can actually, you know, be t…
8 Hiking Essentials You Shouldn’t Leave Home Without | National Geographic
Action! Fellow adventurers, thrill seekers, and aficionados of the great outdoors, lend me your ears. I’m Starlight Williams, digital editor at National Geographic, amateur peak seeker along the northeast coast, and budding glamper. From trusty hiking pol…