yego.me
💡 Stop wasting time. Read Youtube instead of watch. Download Chrome Extension

Building Dota Bots That Beat Pros - OpenAI's Greg Brockman, Szymon Sidor, and Sam Altman


40m read
·Nov 3, 2024

Now, if you look forward to what's going to happen over upcoming years, the hardware for these applications for running your own, that's really, really quickly going to get faster than people expect. I think that what that's gonna unlock is they're going to be able to scale up these models, and you're going to see qualitatively different behaviors from what you've seen so far.

At OpenAI, we see this sometimes. For example, we had a paper on this unsupervised learning where you train a language model to train a model to predict the next character in Amazon reviews. Just by learning to predict the next character in Amazon reviews, somehow it learned a state-of-the-art sentiment analysis classifier. It's kind of crazy if you think about it, right? You just were told, "Hey, predict the next character." You know, if you were told to do this, well, the first thing you do is you'd learn the spelling of words, and you'd learn punctuation. The next thing you do, you start to learn semantics, right, if you have actual capacity there. This effect goes away if you use a slightly smaller model, and what happens if you have a slightly larger model? What we don't know because we can't run those models yet, mmm, but in upcoming years, we'll be able to.

What do you guys think are the most promising under-explored areas in AI? If we're trying to make it come faster, what should people be working on that they're not?

Yeah, so many areas of AI that we already developed by quite a bit. There is some basic research in just classification, deplaning, getting reinforcement learning, and what people do is they kind of try to invent problems, such as solving some complicated games of karakia structure, and they try to add kind of extra features to their models to combat those problems. But I think there is very little research happening on actually understanding the existing methods and their limits.

For example, it was a long-held belief in defining that kind of to paralyze your computation, you need to cram as small batches as possible on every device. In fact, Baidu did this impressive engineering feat where they took recurrent neural networks and implemented that kind of GPU assembly code to make sure that you can fit like batch size one, adenines on average here. Despite all the smart people working on this problem, it's only very recently that Facebook kind of took a quiet look at just a very basic problem of classification and their great effort called ImageNet One Hour. They showed that if you actually take a code that does image classification and fix all the bugs, you can actually get away with much, much bigger batch size and therefore finish the classification probably much faster.

You know, it's not the kind of sexy research that people want to see; I already have like some [ __ ] the Adhan, but it actually, this kind of research, I think at this point, will advance the field the most.

So, Greg, you mentioned hardware in your initial answer. In the near term, what are the actual innovations that you foresee happening?

So the big change is that the kinds of computers that we've been trying to make really fast are general-purpose computers that are built on the bottom-in architecture. You know, basically processor, you have a big memory, and if you have some bottleneck between the two, with the applications that we're starting to do now, suddenly you can start making use of massively parallel compute. The architectures that these models can run on, sort of the fastest, are going to look kind of like the brain, where you know the brain is basically you have a bunch of neurons that all have their own memory right near to them, and they all talk to their neighbors. Maybe there's some kind of longer-range skip connections, and no one's really had an incentive to develop hardware like this.

What we've seen is that you move your neural networks from running on a CPU to GPU, and now suddenly you have a thousand CUDA cores running in parallel, and you can get massive performance boost there. Now, if you move to specialized hardware that is sort of much more brain-like and that runs a bunch of, you know, the sort that runs in parallel with a bunch of tiny little cores, that you're going to be able to run these models sort of insanely faster.

Okay, so I think one of the most common questions or threads of questions that were asked on Twitter and Facebook were generally how to get into AI. Could you guys give us just a primer of where someone should start if they're, you know, just a CS major in college?

Yeah, absolutely! It really depends on the nature of the project that you would like to do. I can tell you a bit about our project, which is essentially developing large-scale reinforcement learning for Dota 2, and the majority of the work is actually engineering. You know, like essentially taking the algorithms that we have already implemented and trying to scale them up. It's usually the fastest way to get improvement in our experiments.

To echo this, because I hear this come up all the time, it's like my dream to work at OpenAI, but I gotta go get an AI PhD, so I’ll see you in like five or seven years. If people are just really solid engineers but have no experience at all with AI, how long does it take someone like that to become productive for the kind of work at OpenAI that we’re looking for?

Someone like that can actually become productive from day one. There’s a spectrum of where they went and where they end up specializing. There are some people who focus on building out infrastructure, and that infrastructure can range from, well, we have a big Kubernetes deployment that we run on top of a cloud platform and building pooling and monitoring, and sort of managing this underlying layer. It actually looks quite a bit like running a startup, and a lot of people who are most successful at that have quite a bit of running that large scale in a start-up or production environment.

There’s kind of a next level of getting closer to the actual machine learning where if you think of how machine learning systems look, they tend to be this magical black box of machine learning. You actually try to make that core be as small as possible because machine learning is really hard, eats a lot of compute, and it's really hard to tell what's going on there. You want it to be as simple as possible, but then you surround it by as much engineering as you possibly can.

So, what percent of the work on the Dota 2 project would you guys say was what people would really think of as like machine learning science versus engineering?

Essentially, as far as day-to-day work goes, this kind of work was almost non-existent. There were like a few person-weeks spent on that compared to like person-months spent on engineering. I think maybe placing some good bets on the machine learning side is about what not to do rather than what to do.

At the very beginning of the project, we knew we wanted to solve a hard game. We didn't know exactly which one we wanted to do because these are great test beds for pushing the limits of our algorithms. One of the great things about engines, to be clear, you guys are two of the key people, the entire team was like ten people, the density of about ten people. You know, these things are good test beds for algorithms to see what the limits are, to really push the limit of what's possible.

You know, for sure that when you've done it, that you've done it; it's a very binary testable. So actually, the way that we selected the game was we went on Twitch and just looked down the list of most popular games in the world. Starting, you know, number one is League of Legends. The thing about League of Legends is it doesn't run on Linux, and it doesn't have a game API. Little things like that actually are the biggest barrier to making AI progress in odd ways.

So looking down the list, Dota actually was the first one that kind of had all the right properties. It runs on Linux, it has a big community around replay parsing, and there's a built-in Lua API. This API was meant for building mods rather than for building bots, and we were like, "But we could probably use it to build bots." One of the great things about Valve as a company is that they're very into you having these open, hackable games where people can go and do a bunch of custom things; philosophically, it was very much the right kind of company to be working with.

We actually did this initial selection back in November, and we were working on some other projects at the time, so it didn't really get started until late in December. One of the funny things is that, by total coincidence in mid-December, Valve released a new bot-focused API. They were saying, "Hey, our bots are famously bad; maybe the community can solve this problem, so we'll actually build an API specific for it for people to do this." That was just one of those coincidences of the universe that worked out extremely well.

So we weren't in close contact with the developer of this API, and all throughout, so at the very end of the project, what are you gonna do, right? The first thing was we had to become very familiar with this game API to make sure we understood all the little semantics, all of the different corner cases, and to ensure that we could run this thing at large scale. To turn it into a pleasant development environment, at the time, it was just two of us. One person was working with the bot API, building a scripted bot. So basically, this is to learn all the game rules, think really hard about how it works.

This particular person who wrote it, ROFL, has played about three or four games of Dota in his life, but he's watched over a thousand hours of Dota gameplay and has now written the best Dota scripted bot in the world. That, and you know, a lot of just writing this thing in Lua, getting very intimately familiar with all those details in the meanwhile.

But I was working on trying to figure out how to turn this thing into a Docker container. So they had this whole build process; it turns out that Steam can only be in offline mode for two weeks at a time, and they push new patches all the time. You needed to go from this, like, you know, sort of manually downloading the game and whatever to actually having an automated, repeatable process. It turns out that the full game files are about 17 gigabytes, and that our Docker registry can only support 5-gigabyte layers.

So, I had to write a thing to chunk things up into 5-gigabyte tarballs and put those in S3 and set them back down, so a bunch of sort of like, things there were really just about figuring out what the right workflow is, what the right abstractions are. Then the next step was, we know we want to be writing our bots in TensorFlow and Python. How do you get that?

Why was that? Well, because so machine learning, you know, that is actually quite interesting that a lot of the highest order bit on progress is just like having the game API or the higher-dropped API; it’s also, can you use tools that are familiar and sort of easy to iterate with? Before the world of kind of modern machine learning frameworks, except for right there their code in MATLAB, if you had a new idea, it would take you two months to do it. Good luck making progress.

Yeah, so it was really all about iteration speed. If you can get into the Python world, well, we have these large code bases that we built up of high-quality algorithms. There’s just so much cool and built around it that that’s like the optimal experience. The next step was to port the scripted bot into Python, and so the way I did that was I literally just renamed all of the Lua files to .py, commented out the code, and then started uncommenting function by function.

Then, you run the function; you get an exception; you then go and uncomment whatever code it depends on. As mechanically as possible, I tried to be like a human transpiler. Lua has one-index; Python has zero-index, so you have to do that. Lua also doesn't distinguish between an array type in a dictionary type, and so you kind of have to simply QA those two. But for the most part, it is something that could have been like sort of totally mechanically done.

It was great because I didn't have to understand any of the game logic; I didn’t have to understand anything that was going on under the hood. I could basically just port it over and it just kind of came together. But then you end up with a small set of functions that you do not have implementations of, which are the actual API calls, and I ended up with a file with a bunch of dummy calls, and I knew exactly which calls I needed and then implemented on top of gRPC, a pretty buff-based protocol where I, on every tick, the game would dump the full game state, send the different wire, reassemble that into an in-memory state object in Python, and then all of these API methods would be implemented in Python.

At the end of this, you know, it sounds like a bit of a Frankenstein process, but it actually worked really well. In the end, we had something that looked just like a typical OpenAI gym environment. So all you have to do is say, "Gym, make this Dota environment ID,” and suddenly you're playing Dota, and your Python code just has to call into some, you know, object that implements the glue API. Suddenly these characters are running around the screen doing what you want.

This was like a lot of the kind of thing that I was working on to the pure engineering side. You know, I went on to shamone and yeah, Cup and J and if and others going on the project; most people were building on top of this API and really didn’t have to dig into any of the underlying implementation details.

So personally, my one machine learning contribution to the project I'll tell you about because my background is primarily startup engineering, building large infrastructure, not sort of machine learning. Definitely not a machine learning PhD; I didn’t even finish college. I kind of reached a point where I got the infrastructure into a pretty stable point that I felt like, alright, like I don’t have to be fighting the fires there very constantly. I have some time to actually focus on digging some machine learning.

One particular piece that we were interested in doing was behavioral cloning, so we had one of the systems that we had built to go and download all of the replays that are published each day. The way this game works is that there are about 1.5 million replays that are available for public download; Valve clears them after two weeks, so you have to have some discovery process, you have to stick them in S3 somewhere.

Originally, we were downloading all of them every day and realized that was about two terabytes worth of data a day. That adds up quite quickly, so we ended up filtering down to the most expert players. But we wanted to actually take this data, parse it, and use it to clone the behavior for a bot. I spent a lot of time with like, sort of, you know, it's basically this whole pipeline to download the replays, to parse them, to make kind of iterate on that, then take a train a model and try to predict what the behavior would be.

You know, first, it's just like, one thing I find very interesting is the sort of different workflow that you end up with when doing machine learning. A bunch of things where, when software engineers join OpenAI, that are just very surprising. For example, if you look at a typical research workflow, you’ll see a lot of files named like, you know, "experiment I," "experiment I," "experiment is one, two, three, four," and you look at them, and they’re just like slight forks of the same thing. You’re like, is this what version control is for?

After doing this, this cloning project, I learned exactly why, because the thing is, if you have a new idea for, okay, well, I’ve kind of got this thing working and now I’m gonna try something slightly different. As you're doing the new thing, well, machine learning is something very binary: at the start it just like doesn’t work at all; you don’t know why; or it kind of works but has some weird performance, and you’re not sure exactly is it a bug or is it just how this dataset works?

You just don’t know, and so if you’ve gotten it working at all, and then you make a change, you’re always gonna want to go back and kind of compare to the previous thing you’ve had running. You want the new thing running side by side with the old thing, and if you're constantly stashing and un-stashing and checking out whatever, then you're just going to be sad. There are a lot of workflow issues like that; you just got to bang your head against the wall and then you see like, oh, I've been enlightened.

So, before we progress further on the story, can you just explain the basics of training a bot in a game? How are you actually giving it the feedback?

So, that’s no rocket science, even though like reinforcement learning sounds fancy. Essentially, what's happening is we have a bot which observes some state in the environment and performs some actions based on that state. Based on those actions that it executes, it continues playing, and eventually, you know, either that's well or poorly, so that's something that we can quantify in a number.

That's more of an engineering problem than a research problem. How to quantify how good the bot is doing? You need to come up with a metric, and then, you know, the bot gets feedback on whether it is doing good or not and then tries to select the actions that yield the positive feedback, the high reward.

To give us a sense for how well that works, the bot plays against itself to get better. Once you had everything working, how good would a bot from day N do against a bot from day N-1?

So, I guess we have a story that illustrates what to expect from those techniques. When you said this project, our goal wasn't to really do research. I mean, at some high level it was, but we were very goal-oriented. All we wanted to do is solve problems, right? We want to solve that fight if I have any other mice love that.

The way it started was like every day Sanders was just clicking Rafa, and Rafa was in parenting scripted bots, so he just like reach, I write the logic; I think this is what both should do. “Well, he says he’s a creep; he needs to go.” Yeah, they are. He spent like three months of his time, and Rafa is actually a really good engineer, so we had a really good script.

What happened then is we got to the point where we could improve it much more, so we tried, "Okay, let's try something first." First Larry, and I was actually on vacation at the time, but there was another engineer who thought out my vacation, which I found super surprising. I believe there is nothing. I come back, and this is the reinforcement learning bot, and actually, it's beating our scripted bot after like a week’s worth of engineering effort—possibly two weeks—but it's just something very miniature compared to the development of the scripted bot.

So, actually, our bot, which didn't have any assumptions baked about the game, figured out that under game structure well enough to beat anything that we could come up with, which was pretty amazing to see!

At what point do you decide to compete in the tournament?

Oh well, maybe I should finish up my story—sorry, I was running a bit long, but it's good. I will get good shortly. Just finish up my machine learning contribution so I basically spent about a month really learning the workflow, got something that I was able to do some signs of life where like run to the middle, Kaha knows what it's doing. It’s so good, and it’s very clear.

When you're just doing cloning, it’s like these algorithms learn to imitate what it sees rather than the actual intent, and so it gets kind of confused. It’s like kind of tries to do some sort of creep blocking or something, but the creeps wouldn’t be around. I just feel like it’s exactly back and forth, and anyway, I got this to the point where it was actually creep blocking pretty reliably, pretty well. At that point, I turned it over to Jay, who was also working on the project, and he used reinforcement learning to fine-tune that.

Suddenly it went from only understanding the actions rather than the intent to suddenly it really knew what it was doing, and it kind of had the best creep block that anyone has seen. That was my one machine learning contribution in the project.

So time went on, and one of the most important parts of the project was having a scoreboard, so we had a metric on the wall which was the true scale of our best bot. True skills basically can evil rating that measures the win rate of your bots versus others. You put that on the wall, and each week people just try all the ideas, and some of them work, some of them improve the performance.

We ended up with this very smooth, almost linear curve. We posted in a blog post, and that really means kind of like exponential increase in the strength of the spot over time. Part of that is sometimes these data points were just train the same experiment for longer. Typically, our experiments last for maybe up to two weeks, but also a lot of those were while we had a new idea.

We tried something else, you know, we made this tweak, we added this feature, removed this other component that wasn't necessary. We knew that so we chose that the goal of 1v1, I don't recall exactly when, but it must have been in the spring or maybe even early summer, but we really didn't know are we actually gonna be able to make it?

Unlike normally when you're building an engineering system, you think really hard about all the components. You decompose it into this subsystem, that subsystem, that subsystem, and you can measure your progress as what percent of the components are built. You really have ideas that you need to try out, and it's sort of unpredictable in some sense.

Actually, one of the most important changes to the project and our making progress was initially the way that management was happening was that each week we’d written down our milestones, let’s beat this person by this date, let’s beat this other person by this date—let's, you know, be able to do, you know, kind of these outcome-based milestones on a weekly or bi-weekly basis.

Those things would come and go, and you wouldn’t have them, and then what are you supposed to do? It’s completely unnatural, right? It’s not like there was anything else you could have done. You just have more ideas you need to try. Shifting it to what are all the things are gonna try by next week was a good insight. If you didn’t actually do everything you said you were gonna do, then you should feel bad and do more of it, and if you did all those and like it didn’t work, then, you know, fair enough, but you achieved what you wanted to achieve.

Going into the International, two weeks before the International was kind of our cutoff for, at this point, there’s not much more we can do. We’re going to do our biggest experiment ever, put all of our compute into one basket, and see where it goes. Two weeks out, how good was the bot?

Well, it was badly. Sometimes we need professionals that we had testing, but not always. Sometimes it happened. Yes, yes, yes! To be specific, I’m just pulling this back in. So July 8th was when we had our first win against our semi-pro tester. We were consistently beating him, but that was not very reliable data.

This was the week before the International, and so we didn’t really know how good we were getting. We knew that the score was going up. When was the last time that like an OpenAI employee beat the bots? How far out was that?

I think like a month or two out, it could beat all the OpenAI people. Two weeks out, it could at one time beat a semi-pro, I’d say. So four weeks was the first time that it beat the semi-pro, okay?

You know, two weeks out, we don’t know. I mean, I guess we could rerun that, but yeah, you know, we really didn’t know how good it was at that time. We just knew, hey, we’re able to beat our semi-pro occasionally, and we’re going into the International figuring that hey, there’s a 50/50 shot.

I think we were telling Sam the whole way, like, you know, the probability with these things, you never really trust the probabilities; you just trust the trend of the probabilities. Even that was just swinging wildly. You guys had text me every night of a, “Bo, we’re gonna, you know, no chance, we’re not going!” “Oh, we’re definitely gonna win every game!”

Yep! It was very clear that our own estimates of what was gonna happen were miscalibrated. Throughout the week of Ti, we still didn’t know. What was happening? You guys all went to Seattle for this week?

Most of the team went.

Okay, yes. So you're like holed up in a hotel or a conference center or something?

Actually, the reality of it was that we were holed up near the stadium where the event was happening. Let me describe how we were holed up. We were given a locker room in the basement of Key Arena, so we all had production badges, and you could feel very special as you walked in years ago. You know, I just have to get to, you know, kind of skip the line and go to the backstage area, but it was literally a locker room; they converted it into a filming area, and we all had our laptops in there.

They would also bring in pro players every so often. We had a whole filming setup, and then we played against the pros, and we had a partition that we set up, which was just like a black cloth basically between like the whole team sitting there, being like, are we gonna be able to beat this pro? Maybe!

Trying to keep as quiet as possible, and these, you know, these pros who were playing. On Monday, they brought three or I think two pros and like one very high-ranked analyst by, and we had our first game. We really didn’t know what was gonna happen, and we beat this person 3-0.

You know, this was actually a very exciting thing for everyone at OpenAI. At the time, what I was I was kind of live-slacking the updates as the game is like, “This person said this,” and I’m with that; like, “Now it’s this many last hits!”

Were you winning by a large margin?

So yeah, do you remember the details of that one?

Let's oblate sack what the margin was. I mean, we have all the data, but Valve brought in the second pro, this professional named Pyke, and he played the bot, and we beat him once, we beat him twice, and then he beat us.

Oh, okay.

We knew exactly what had happened. Essentially what happened is he accumulated a bunch of wand charges, right? This item accumulates charges, and he accumulated more charges than our bot has ever seen. It turns out that like our bots, the don't—it turns out that like there was a small—I think it's safe to say a kind of a bug in Dota in our setup.

Okay, so basically some threshold that your bot was not ready for.

I'd say very, very specifically the kind of root cause here was that he had gone for an item in early wand build. Our bot had just never done item early wand build. It just like our bot had just never seen this particular item build before.

It never had a chance to really explore what it means, so it had never learned to save up stick charges and use them. What it would do is it was very good at calculating like who’s gonna win a fight, while trying to recognize that. He’s like, “I wonder what happens if I push on this axis?” Sure enough, it was an axis the bot hadn’t seen.

Then we played a third match against another pro, went 3-0 on that. It’s actually very interesting getting the pros' reaction because we also didn’t really know the kind of fun it’s gonna be, cool or they hate it!

We got a mix of reactions. Some of the pros were like, "This is the coolest thing ever! I want to learn more about it." One of the pros was like, "This thing's stupid! I would never use it." But apparently after the pros left that night, they spent four hours just like talking about the bots and kind of what it meant.

The flavors were like highly emotional in their reactions to the bot. They’d never been beaten by the computer, so it’s kind of unbelievable. For example, one of the players who actually managed to eventually beat the bot was like, "Okay, this bot, I think, uses less."

Like, “I have never seen this before.” Then he kind of calmed down, and after like five or ten minutes, he was like, "Okay, this is actually great! This is gonna improve my practice a lot.”

So after your bot lost that first time, did they start talking about counterintuitive strategies to beat it?

Well, I think at that point, that you can answer that—I don’t think pro players are interested in that. The pro players are mostly interested in the aspect where they it lets them get better at the game, which means that…but there was a point after the event where we set up this big LAN party where we had 50 computers running and we kind of unleashed this form of humans onto our bot, and they found all the exploits.

We kind of expected them to be there because the bot can only learn as well as the environment in which it plays allows it to, right? So, there are some things that you just never seen and of course, those won’t be exploitable, and we were kind of excited about our next step which is 5v5 because 5v5 is one giant exploit!

Like, essentially it’s about like exploiting the other team, like being where they don’t expect you to be, doing other division things.

One thing I think it was pretty interesting about the training process is that a lot of our job while we were doing this was seeing what the exploits were and then making a small tweak that fixes them. Like the way that I now think about machine learning systems is that they’re really a way to make the leverage of human programmers go way up, okay? Right?

Normally like when you’re building a system, you build component one, component two, component three, and kind of your marginal return on like building component four is, you know, similar to your margin on the return on component one, whereas here, a lot of the early stuff that we did, it’s just like your thing goes from being like crappy to like slightly less crappy.

But once we were at the International, and we had this loss to PiCat, we knew okay, well, the root cause here is just it’s never seen this item build before.

All we had to do was make a tweak to add that to our list of item builds, and then it played out this scenario for the next however long. Can you walk me through actually how that tweak works on the technical side? Because my impression is kind of what you guys have been saying. It’s just been in a million games, so it kind of has learned all this stuff, and some people talk about, you know, these networks; it’s just very gray, and they don’t actually know how to manipulate what.

How are you guys getting in there and changing things?

Yes, so it’s kind of some sense on the high level you can compare this process to teaching a human. Like you know, you see a kid doing maths and it’s kind of like confusing like addition with subtraction. I was right, I can look here at this symbol; this is what you're not seeing clearly, right? The same with those tweaks.

Clearly our bot had never seen this one build that Greg correctly mentioned, and you know, all we had to do was we had to say that like when the bot plays games and chooses what items to purchase, we just need to add some probability of sampling that specific build that it has never seen.

When it plays a couple of games against opponents that use that build when it uses the build a couple of times itself, then it kind of becomes more comfortable with the idea, of what happens, the in-game consequences of that build.

I have kind of a couple couple different levels that answer I think are pretty interesting. One is at a very like kind of object level. The way that these models work is you basically do have a black box which takes in some lists of numbers and outputs a list of numbers. It’s very smart in how it does that mapping, but that's what you get.

Then you think of this as this is my primitive; now what do I build on top of that so that as little work as possible has to be done inside of the learning here? A lot of your job is, well, one thing that we noticed that we’d forgotten as well on Monday was, well, it wasn’t we’d forgot; it just hadn’t got around to it was passing in data that corresponds to the person; I was passing in the visibility of a teleport.

As a human, you can see when someone’s teleporting out. Our bots just did not have that feature; that list of numbers passed in did not have that feature. One of the things you need to do is you just need to add it, and that kind of goes from your features, you know, your feature vectors, however long that was. Now it’s got one more feature on, and the bot wasn’t recognizing that as an on-screen thing, so it doesn’t see the screen it's passed data from the bot API.

It really is given whatever data we give it, okay? It’s kind of on us to do some of this feature engineering, and you want to do as much as you can to make it as easy as possible so that it has to do as little work inside as possible, to spend, you know, you think of it as you get some fixed capacity. Do you want to spend that on learning the strategy? Do you want to spend it on learning how to like, you know, map, you know, choose which creep you want to hit?

You want to spend that on trying to parse pixels? You know, at the end of the day, I think a lot of our job as the system designers here is to push as much of that model capacity as much of the learning towards the interesting parts of the problem that you can’t script, that you can’t possibly do any processing for.

That’s kind of one level; a lot of the work ends up being identifying which features aren’t there, we're kind of engineering the observation action spaces in an appropriate way. Another thing is I think is like another level resigns out is like the way that this actually happened was, you know, we’re there on Monday. People got dinner and then Ramon and yeah, Cup and Raul and I inside, and I think, you know, maybe one or two others stayed up all night to do surgery on our running experiment.

It was very much like a you’ve got your production outage, and everyone’s there, all hands on deck trying to go and make the improvements. So, specifically to kind of zoom in and give you a bit of fear, what this felt like was working on the world.

You know, like this is a very tiring week; every day we were like the day was just like meeting with the pros and how are you going about getting excited? The nights were kind of coding up the next version of experiment because actually it’s a bit from day-to-day, like iteration of the experiment was not good enough to beat the next player, next day’s professional.

So just that warning to download the new parameters of the network, and it would be good enough to beat it, but the day before it wasn't. How are you discerning that?

That was again something of almost a coincidence. I mean, yeah, there might be something a little deeper, but the, you know, kind of full story of the week was we did the Monday play okay and then we lost to PiCat. So just to clarify, are you guys in the competition or not in the competition?

So the thing that we did was we did a special event to play against Dendi, who’s one of the best players of all time. While we were there, we were also like, well, let’s test this out against all these other pros that are physically here right now.

Let's see how we do. Got it? Alright, so Monday happens, you start training it.

Yep. So actually, so yeah, this experiment we kicked off sometimes the prior week. You know, and weeks before things—yes, something like that.

Our infrastructure is really meant for you run an experiment from scratch, you start from complete randomness, and then you run it. Then you two weeks later go and see how it does.

We didn’t have two weeks anymore, and so we had to do this surgery and this very careful like, you know, read every single character of your commit. Make sure that you’re not gonna have any bugs because if you mess it up, we’re out of time. There’s nothing you can do; it’s not one of those things like if you’re just a little bit more clever that you can, you know, go and do a hot patch and have everything be good.

It’s just literally the case that you gotta let this thing sit here; it’s got to bake. So Monday came and went, and we were running this experiment that we performed surgery on, and the next day we got a little bit of reprieve where we just played against some kind of lower-ranked players who were commentators and popular in the community but who weren’t pushing the limit of our bots.

On Wednesday at 1:00 p.m., our contact from Valve came by and said, "Hey, I’m gonna get you RTZ and Sumail,” who are basically the top players in the world. I was like, “Could we push them off to Thursday maybe?” He was like, “Their schedule's booked. You're gonna get them when you get them,” and that we're gonna get them at 4 p.m.

So we looked at our bot to see how it was doing, and we’d kind of been along the way gauging it. We tested it against our semi-pro player, and he said, "This bot is completely broken."

Oh no! You know, kind of pictures of maybe we had a bug during the surgery went through our head, and he showed us the issue. He said, “Look, first wave, the bot takes a bunch of damage it doesn't have to take. There’s no advantage to that. I’m gonna run it; I’m gonna go kill it. I’ll show you how easy it is.”

He ran it to kill it, and he lost.

Okay, and don't jump ahead! Explain what happened!

So he played it five times and he lost each time until he finally did figure out how to exploit it. We realized that this bot had learned a strategy of baiting you, pretending to be a really dumb bot where you don’t know what you're doing and then when the person comes in to kill you, you just turn around and go super bot.

It was legitimately a bad strategy, you know, if you’re really, really good, but I guess it was good against the whole population of bots that it was playing against. You had never seen it until that day.

So we expected it was one of the major examples of the things that we kind of didn’t have explicitly incentive for, and yet they both actually learned them.

Yeah, essentially I mean this just kind of funny because of course when they both played against its other versions it was just good fighting strategy. I kind of other divide it, but if you got a very interesting psychological effect on humans because our strategy was not to fall for the way it kind of was to wait it out a little bit because the voter IDs are at a disadvantage. But he’s like, “Okay, look, I’m gonna go for a kill.”

So, it kind of had a very interesting psychological effect on humans, which I thought was like its kind of almost knows, it’s a bot. Yeah, it knows how it’s attacked.

You know, it’s funny to see a bot which kind of seems like it’s playing with the emotions of the here. Of course, it was not what actually happened, but it seemed this way.

So now we were faced with the dilemma: it's 1:00 p.m. on Wednesday; these best players are going to show up at 4 p.m. We have a broken bot. What are we gonna do? We know that our Monday bot is not gonna be good enough.

So the first thing we do is we write, "Well, Monday bot is pretty good at the first wave; this new bot is a super bot thereafter." Okay, so can we stitch the two together?

So we wrote; we already had some code for doing something similar, so we kind of revived that. In the three hours, Jay spent his time doing a very careful stitch where you run the first bot and then you cut over at the right time to the second bot.

This is literally just like bot one plays the first segment of time and then literally just that, and he finished it 20 minutes before the course, before the granites by its semi-pro Santa Cruz!

This is great, so at least we got that done in the nick of time. But the other question was how do we actually fix the spot? I mean, I actually just finished like one aspect because we were also kind of uncertain what happens when you switch over from one bot to the other.

I was actually standing by the pro who was playing it, and I was looking at the time at the moment when it was switching. I was like, "Yeah!”

Of course, it was probably completely unnecessary, but we weren’t sure what would happen there.

So I didn’t know about that part of the story, so the question of how do you actually fix it.

There was a little bit of debate of like maybe we should abandon ship on this switch back to our old experiment run that one for longer. I forget who suggested it, but someone’s like, “I think we just have to let it run for longer because you learn a strategy of baiting; well the counter strategy for that is just don’t bait."

Okay, well, the whole time! So we got that run for the additional three hours, and we first played our T Z who showed up on our switched bot, you know, kind of the Franken bot, and that beat him three times.

We were like, "Alright, let’s try out this other bot and just see what happens with the additional three hours of training.” Because, you know, our semi-pro tester at least validated that like it looks like it's fixed.

So in that three hours of training, how many games is it actually playing simultaneously?

That’s a good question—quite a bit!

Okay, and so we played this new bot against our T Z. He didn’t know how I was gonna do, and sure enough, it beats him! And he loved it! He was having a lot of fun; he ended up getting loving games that day.

I remember maybe was ten, but I think he just like, "Oh this is so cool. We were supposed to have Sumail that day as well, but due to a scheduling snafu he had to be at some panel, and by the time, it didn’t work out."

But Arteezy and his coach, who also coaches Sumail, both said there’s a yeah…both said, “Sumail is gonna beat this bot; it’s gonna happen. You know, maybe he’ll have a little bit trouble to figure it out for the first game, but like after that, you’re in trouble.”

So, I all right, we’ve got one more day to figure out what to do.

For the nice dinner, we kind of rested, kind of, you know, like no slack with some people at home, and then in the morning, we downloaded the new parameters of the network and just let it play.

We hung out and just let it go, just let it play. It's the exact opposite of how I’m used to engineering deadlines happening.

Normally, you’re working right up until the minute.

So you guys wanted like, you guys were getting like full nights of sleep, nice and relaxed?

No, no, absolutely not.

I don’t like this clear the night before—the night we found that, like, two nights before the day where we got the rest relaxation; the night would look something like the following had like we had a full day of dealing with the problems and kind of like emotional highs and sauce.

I was absolutely knackered! Come midnight we start working, okay? We need to make all those changes. Like, the one thing that we talked about around midnight, we start with four people, and we are all so tired that, you know, we look through all the comments, and we are going to add to the experiments.

There are actually two people looking at them because we didn’t trust a single person given how tired we were. So they’re like looking at those constants at 6:00 a.m. as I was doing this like updating the model, which is a lot of nasty off-by-one indexing things.

Even though it’s a short call it took me like six hours.

Six hours to the—somewhere around 3 a.m. We had like a phone call either because it turns out a certain number of machines start exceeding some limits. We try to make them raise the limits, and around 6 a.m. we’re okay—we are ready to deploy this.

Then there was the flying—is just like one-man job.

Yack-yak was just like, you know, thinking like, you know, clicking the deploy. Kind of fixing all the issues that came up, I was staying around just exclusively to make sure that the coop doesn’t fall asleep.

Eventually, at 11 a.m. the experiment was running, and we kind of went to sleep, woke up at 4 p.m. or something. It had it had over 24 hours to train.

I think it ended up being like one-and-a-half days until the game.

Sorry, yeah, just repeat the timeline. So this was Monday, when we played the first set of games, had the loss, did the surgery that night, played it, I guess starting on 11 on Tuesday then, yeah Wednesday, 4 p.m. is when we played our T Z.

That night, we trained for longer. I don't think we made any changes after that. Maybe we made some small ones.

Alright, but then on Thursday is when we played Sumail.

Okay, and so I think Tuesday to Wednesday was the night we’ve made less changes.

Yeah, and there was quite a bit like it’s, it’s, you know, there’s quite a bit of different work going on that all kind of came together at once. Like one thing I think was really important was, so we went one of our team members, his handle of SCI ho, he’s a very well-known programming competition I competitor, was spending a lot of time just watching the bot play and seeing why does it do this weird thing in this case?

What are all of like, you know, weird tweaks and really getting intuitions for, oh because we’re representing this feature in this way? If we change it to this other thing, it’s gonna work in a different way.

I think that like this really trying to— it’s almost a very human-like process of watching this expert play in the game and try to figure out what are all the little micro-decisions that are going into this macro choice.

It’s kind of interesting starting to have this very different relationship to the system you build because normally, the way that you do it is well your goal is to have everything be very observable.

So, yeah, you want to put metrics on everything and like you know that if something’s not understandable, add more logging. Like you know, that’s how you design the systems, whereas here you do have that for the surrounding bits; but for this machine learning core, there you really do have to understand it at more of a sort of behavioral level.

Was it ever stumping you where you’re just like that’s being creative in a way that we didn’t expect it to, and it may be even working but you don’t know why or how it decided to make that choice?

Few small ones, like where there’s like some early days of the project where we’re like we have professionals like playing the next iteration of the bot. He’s like, “Yeah, the bot is really good at creeping,” and we were like, there is also one part of the story that I think is interesting, and I think probably we can wrap up this part.

To see how well I, you know, we are semi-pro test here; I played hundreds of games against this bot over the past, you know, couple months, and so we wanted to see just how does he benchmark relative against our T Z.

We had him play against RTZ, and you know, Arteezy was up the whole game. He was just like beating him to the last hit by 500 milliseconds every single time.

So our semi-pro was like “Alright, I've got one last-ditch effort to go try this strategy that the bots always do to me,” and it’s like some, some, you know, strategy where you I do something complicated, and then you like triple wave your opponent, get him out of the tower, have regen, you don’t need to go in for the kill, and he did it, and it worked!

Whoa! This was the bots that like taught him the strategy that you could use against humans.

I think that was like very interesting and a good example of kinds of what you can get out of these systems that they can discover these very sort of non-obvious strategies that can actually be taught to humans.

And how did it go with Sumail?

So with Sumail, we went undefeated, and I think it was 5-0 that day. One thing that’s actually interesting, so we’ll probably blog about this in upcoming weeks—we’ve actually been playing against a bunch of pros since then.

So we, our bot has been in very high demand, and some of these pros have been live streaming it, and so we’ve gotten a better sense of kind of watching as humans go from, you know, just being completely unable to beat it—if you play against it for long enough, you can actually get pretty good.

There’s actually a very interesting set of stats there that, you know, will be I kind of pulling and analyzing in a bit.

Are there humans that consistently beat the bots today?

Yeah, so I think there’s one who is like a 20% win rate or something, and that player played games, finds strategies to exploit.

No, actually he becomes essentially as good as the bot really—at what the bodies— they find extremely surprising.

But it turns out that he played hundreds of games of it, so it’s actually—and you see a top player like this, he beat most humans; they’re all professionals.

It’s not just some random kid who’s good at beating the bot, that’s right. The way to think about this is that, yeah, I mean, the game of professional video game player is pretty high bar.

I think everyone wants to be a professional video game player. Those games, the number of pros is very small. There are some who I have really liked, you know, and you’re playing hundreds of games against it; you’re gonna get very, very good at the things that it does.

I was talking to Arteezy. I was asking him, has it changed your play style at all? He said he thinks it has.

The thing it’s done for him is it’s helped him focus more because, you know, while you’re just there in lane last hitting—now suddenly that’s just so rude, right?

Because you’ve just been doing it so much, you’ve gotten so good at it, and I think that one really interesting thing to see is gonna be how can you improve—can you improve human play style? Can you change human play style? I think that we’re starting to see some positive answers in that direction.

So, I know we’re almost out of time. I could do a little lightning round just quickly to go through special.

Yes, actually! To the question of what kind of skills you need to work out of, and do you have a very small—this was gonna be the first lightning round question.

So specific list of things that that we found very useful at least in in the data team is some knowledge of distributed systems because we build a lot of those, and those are easy to not do properly.

Another thing that we found very important is actually writing bug-free code. Essentially, I know it's kind of taken for granted in the computer science community that like everybody makes bugs and so on, but here it’s even more important in our projects that you minimize it because they’re very hard to debug.

Specifically, many bugs manifest in kind of lower training performance, where to get that number it takes a day, and in like a spree of hundreds of convincing it’s really easy to miss.

The primary way of debugging this is actually reading the code, so every bug has a very high cost associated with it. Writing like this correct, bug-free code is quite important to us, and we sometimes actually kind of sacrifice good engineering, Cabot’s good kind of code modularity to make our code shorter and simpler and kind of have glass essentially less lines where you can make bugs.

I guess lastly, as much like primary skills good engineering. But if somebody really feels like, "Gosh, I really need to brush up my maps," like I really need to kind of go in there and feel comfortable—not have someone ask me a question about Master Day and send that.

I think mostly getting good basics in linear algebra and basic statistics—that's especially when doing experiments. It’s easy to make elementary statistics mistakes, and linear algebra is just kind of most of what you need to know to like basic optimization as well to follow what’s happening in those models.

This is kind of compared to being a good engineer quite easy to pick up, at least in projects like the one we’re doing.

Yeah, I wanted to talk about some non-technical skills that I think are really important. So one is I think that there’s like a real humility that’s required if you’re coming from an engineering background like I am, working in these projects where you're no longer the technical expert in the way that you're used to, right?

I think that you know if you go when you build, you talk to—you talk to like, you know, you listen—you want to build a product for doctors, where I think you can talk to ten doctors, and honestly, whatever thing you’re going to build is probably going to be a valuable addition to their workflow because doctors can’t really build their own software tools.

You know, some can, but you know, in a general world, no—whereas with machine learning research, you know, everyone that you’re working with is very technical and can build their own tools, but if you inject engineering discipline in the right place, if you build the right tool at the right time, if you kind of look at the workflow and think, "Oh, we could do it in this other way," that’s where you can really add a bunch of value.

It’s about knowing when to inject the engineering discipline but also knowing when not to, and being, you know, to Sherman’s point, sometimes we really just want the really short code because we’re really terrified of bugs.

And so that can yield different choices than you might expect for something that’s just a pure production system.

So, who writes the least bugs per line of code at all of OpenAI?

I’m definitely not gonna say me!

Yeah, it’s three, I could—the arc of the least amount of times in black. So it’s more okay to have bugs that are gonna cause exceptions, right?

My bugs usually cause experience, so that's fine. That’s fine. What you don’t want is the things that cause correctness issues where it gets 10% worse.

Yeah, so there was another question related to skills, but this is for non-technical people.

Yeah, Tim Beco asks how can non-technical people be helpful to AI startups?

I was gonna say I think I think one important thing is that for AI generally right now, I think there’s a lot of noise and I think it could be hard to distinguish what is real from what’s not.

I think just like simply educating yourself I think is like a pretty important thing. It’s very clear that AI is gonna have a pretty big impact; just look at what’s already being created and extrapolate that without any new technology development, any new research, and it’s pretty clear, there’s gonna be baked into lots of different systems.

There are a lot of ethical issues to work through, and I think that being kind of a voice in those conversations and educating yourself is a really important thing.

Then you look to, well, what are we gonna be able to develop next? I think that’s where the really transformative stuff is gonna come.

Okay, I once saw a post in Greg's rescue time report. I was pretty shocked. Do you have any advice for work views for working such long hours?

I think it’s not a good goal; I would not have a goal of trying to maximize the number of hours you sit at your computer. For me, I do it because I love it, and that the thing that the activity that I love most in the world is when you’re in the zone writing code, producing it for something that’s meaningful and worthwhile.

I think that as a second-order effect, it can be good, but I wouldn’t say that like that is the way to have an impact. I will also say more specifically; the only way I’ve ever seen people be super productive is if they’re doing something they love. There is nothing else that will sustain you over a long enough period of time.

Okay, is the term AI overused by many startups just to look good in the press?

Yes!

Okay, what is the last job that will remain as AI starts to do everything else? The last human job?

I was gonna be the hardest thing for AI to do in general. I think it’s actually not AI researcher. The AI researcher will forget key—it’s actually very interesting. When you ask people this question, I think that everyone tends to say whatever their job is as the hardest one.

But I actually think that AI researchers got to be one that you’re gonna want to make these systems very good at.

Totally.

The last question maybe this is obvious. Can you just connect the dots between how playing video games is relevant to building AGI?

It’s actually maybe one of the most surprising things to me, the degree to which games end up being used for AI research.

The real thing that you want, right, is you really want to have algorithms that are operating in complex environments where they can learn skills, and you want to increase the complexity of those skills that they learn.

That’s either, you push the environment, or you push the complexity algorithms, you scale these things up. If that’s really the path that you want to take to building really powerful systems, so games are great because they are a pre-packaged environment that some other humans have spent time sort of making first of all, putting in a lot of complexity, making sure that there’s like actual intellectual things to solve there, or not even just intellectual, but you know, like interesting mechanical challenges.

You kind of can get human-level baselines on them, so you know exactly how hard they are. They’re very nice, unlike, you know, something like robotics, where you can just run them entirely virtually, and that means you can scale them up, and you can run many copies of them, and so they’re a very convenient testbed.

What you’re gonna see is that there’s a lot of work that's gonna be done in games, but the goal is to, of course, bring it out of the game and actually use it to solve problems in the real world and to actually, you know, be able to interact with humans and do useful things there.

I think they’re very good sort of starter and a very good place, like I think one thing that I really like about this Dota project and bringing it to all these pros is that we’re all going to be interacting with super advanced AI systems in the future.

Right now, I think we don’t really have good intuitions as to how they operate, where they fail, and what it’s like to interact with them, and that this is a very low-stakes way of having your first interaction with very advanced AI technology.

Cool, if someone wants to get involved with OpenAI, what should they do?

Well, we have a job posting on our website. I guess the tips that we’ve given about how to get a job out of OpenAI are very geared towards a specific job, most of the things that we have there which is a large-scaring force and letting engineer go.

Yeah, and in general, we look for people who are very good at whatever technical access they specialize in, and we can use lots of different specialties now.

Great! Alright, thanks, guys, just to echo that. Like, everyone thinks they have to be in AI PhD. Not true! Neither of these guys are. Alright. Thanks a lot.

Thanks!

Yeah, thank you!

More Articles

View All
Why do we launch rockets from Florida?
Why do we launch rockets in Florida? I remember as a kid just not getting it as I watched these rocket launches get scrubbed due to bad weather. I was like, you guys know that’s Florida, right? That’s where they get the hurricanes and the thunderstorms. A…
How To Become A Millionaire: Index Fund Investing For Beginners
What’s up you guys, it’s Graham here. So let’s cover one of my favorite ways to invest ever, besides real estate. I would even go so far as to say that this is the best, safest, and easiest long-term investment strategy out there for most people. Also, th…
NASA’s zero-gravity plane: How astronauts train for microgravity | Michelle Thaller | Big Think
Ack, you had the question: how do we remove gravity from the anti-gravity chambers that astronauts train in? And you’ve probably seen films of astronauts in training, floating around in a closed chamber. And it does kind of look like we may have something…
Preparing for the AP US History Exam (5/4/2016)
Hi, this is S of the KH Academy, and you know we’ve always had a lot of content on KH Academy for the various AP tests, and we’ve actually been building out a lot for American history. So I’m here with Kim, who’s our AP History or American history fellow.…
Slope and intercept meaning from a table | Linear equations & graphs | Algebra I | Khan Academy
We’re told that Felipe feeds his dog the same amount every day from a large bag of dog food. Two weeks after initially opening the bag, he decided to start weighing how much food remained in the bag on a weekly basis. Here’s some of his data: So we see af…
Later Stage Advice with Sam Altman (How to Start a Startup 2014: Lecture 20)
All right, uh good afternoon and welcome to the last class of how to start a startup. So, this is a little bit different than every other class. Every other class has been things that you should be thinking about in general at the beginning of a startup. …