YC Tech Talks: Designing Game Characters with Deep Learning, from Cory Li at Spellbrush (W18)

7m read

·Nov 5, 2024

My name is Corey, uh I'm the CEO at Spell Rush and I'm here to talk to you today about, uh, designing characters with deep learning. So, um, we're Spell Rush. Uh, we're a YC company as well. Uh, we're building deep learning tools for art and artists.

Uh, what exactly does this mean? Art is hard. So my co-founder is a professional artist, but we're a very fairly small team. So, a lot of the question is often how do we scale up, uh, her ability to create more content without becoming a massive studio? Uh, drawing things takes time; illustrating things takes its time. And budgets on a lot of AAA titles and kind of the studio pipeline format, budgets are often 50, 60, 70 percent of the total production budget. Um, so obviously like art is becoming increasingly more expensive and increasingly difficult to scale.

So, um, what if we could use AI in the art pipeline? This was kind of the question we asked ourselves originally when we started, uh, our company. And we've been kind of building tools to sort of help in this direction. So, a quick quiz for the chat. Um, so we're building AI tools. Which one of the following images was not actually drawn by a human? So I'll give 10 seconds or so.

So these are three illustrations in the anime style of character portraits, and um, it turns out the right one is actually drawn entirely by our AI. The left two are from popular, uh, Twitter artists. Uh, and the key thing here is we can actually create images that are on par with what an illustrator would be able to draw. So we can, um, basically in the amount of time, so the left two images would have probably taken a professional illustrator anywhere from two to fifteen hours in order to draw. And our tool can actually draw a character in sub two seconds.

And that's not all. We can not only can we draw a character, but we can also, we can draw hundreds of characters in the same amount of time, uh, that it would take to generate one character. So the possibilities are kind of endless in that respect. So, here's an example. Uh, we actually have one of our older models online, uh, that one of this model is called, uh, on wifilabs.com.

And what you see here is you can actually interact with one of our earlier models. You have the ability to pick any character and then customize this character using various steps through the AI through the online flow. So, yes, yes, I am reading the chat. We were the, we were the ones behind.

So, um, how does this work? So I'll give a brief introduction to how this technology works. The ability, the way we are able to kind of create characters from scratch, the general idea is we’re using a technology called GANs or Generative Adversarial Networks. The way this works is we have a network, a neural network called a generator, and we have a second neural network called discriminator.

The generator's job is to learn how to draw art, and the discriminator's job is to learn how to tell good art from fake art. So we can take these two agents, and we can actually construct a network in the following way: we take the generator and we take a corpus of real art that we want the generator to learn how to draw like. Uh, in this example, we have a bunch of classical paintings. So we actually want the generator to learn how to draw classical paintings.

We can take these two images and feed them at random to the discriminator. So the discriminator's job is now to decide whether an image came from the real corpus or from the generator's learned drawings. What happens is this is then fed, we can then evaluate whether the discriminator was correct in his assessment, and then we backpropagate and update the weights for both the generator and the discriminator. So that both of them now can learn from whether they made a mistake or not.

This then gets propagated, and this is how the cycle learns. We run this millions upon millions of times in order to train both the generator and the discriminator. One small note, but this isn't quite important, is that both the generator and the discriminator are programmed, so they are item potent. They will always give back the same result every time.

So we actually have to feed in some noise into the generator, which we call the latent space. But this noise allows the generator to create different images. So if both of them are able to learn, if we're able to teach both the generator and the discriminator proper values, we can actually show that at some point the generator can produce images that are indistinguishable from the real data set.

So the generator can now create high quality images that almost, that basically look like they come from the real distribution of data. And then, because we were feeding in random noise in order to, uh, make the generator generate new images, this actually comes in handy because we can actually now control the output of the generator from the latent noise that we feed into it.

So, as an example, we can take a trained generator and generate the same character in multiple different expressions. Uh, we can take the same character and generate it in multiple different colors, and we can take the same character and even transfer the style or completely change the illustration style, um, for the same character. And these are sorts of tasks that would take a trained artist hours at minimum in order to do.

But when you're a small art director, you could, you could do multiple different variations very quickly using the power of the AI. So, uh, talk a little bit about the data set. We train our network by basically crawling, uh, publicly available images off the internet. We're starting with the anime aesthetic mostly because it has the most amount of data available.

But there are about 10 million images available to train. One interesting to note, one interesting thing to note about our data set is that the distribution doesn't actually follow traditional ML data sets. In particular, the SKU is actually very female oriented, so girls outnumber boys in terms of anime illustrations about, um, one to six, and then, uh, darker skin tones, people of color tend to be less than three percent of the total number of illustrations available on the internet.

So we actually spend a lot of effort in order to correct this because obviously like these percentages don't represent the real world, and representation is important, especially for illustration. So, um, what we've done is we've improved the generation of darker skin tones. So our AI can actually draw at a higher frequency than you would actually see in the real-world data set, even though it doesn't represent actual population statistics, as well as improving the generation of male characters.

Uh, fun fact, like illustrators actually, especially in Japan, don't actually even like drawing male characters because female characters often give them more likes and more retweets. So it's actually hard to find high quality male characters, but using the power of AI, you can actually draw things that normal illustrators wouldn't even want to draw.

We have a number of other active areas of research. For instance, automated animation. On the right side, I have a character that's fully generated and also fully animated using our workflow. A number of Live2D, Spine workflow assistance tools, and some super-resolution based techniques for animation processes.

So obviously training the system is quite complex, so I'm going briefly into how we train it. Um, we've built our own small mini supercomputer to do this, um, because the cloud is actually quite expensive. So here you see us loading a 42U rack into our office. We've built basically a DIY, uh, supercomputer here with the top like 100 gigabit, uh, ethernet top-of-rack router, 200 plus cores, 20 plus GPUs, and a boatload of storage running on our system.

Um, a big question people often ask me is like, what about cloud? Um, the closest comparable machine on the cloud would be an AWS p316x large, uh, which on demand is about 24 an hour. Even if you use spot instances, you may be cutting in half to about ten dollars an hour. The key thing is that training these models is actually quite expensive because it takes us about seven to ten days in order to train, which means every individual model costs us somewhere between three to four thousand dollars.

So that obviously makes us very sad, so that's why we've built, kind of in a scrappy startup way, an entire cluster in our office. Uh, yes, it can definitely run crisis Alex because these are all Titan RTX GPUs. Uh, so our total running cost is about 60 cents an hour once everything is accounted for.

So, the brief overview of our architecture: we have a custom language internally called Netgen, which allows us to describe GAN architectures very quickly. It compiles into TensorFlow low-level ops. These TensorFlow low-level ops get packaged into Singularity containers. We then schedule it onto our large cluster on Slurm. You can see the crisis-capable GPUs down there. The eight Titan RTXs is one of our nodes which we run our workloads on.

Uh, then data gets piped out: Prometheus, Grafana, standard, and then as well as a TensorBoard for tracking loss functions. So, uh, we're taking all this technology and we're building internally the world's first AI illustrated game. Right now, we're a very small team; we've got five people, uh, but we're looking to hire our sixth person. So if any of you guys, if the picture on the right resonates with any of you, you probably want to give us a shout out.

Uh, Aqua dancing on top of like, I don't know, about 40 terabytes worth of flash. Uh, so we're hiring artists. We're looking for a 2D animator and motion designer, and we're also looking for a real-time VFX artist. And also, uh, we are looking for an AI research intern for the coming winter.

And if this sounds good to you, ping us at jobs at swallowrush.com. I'll probably be in the breakout room later to answer any questions you guys might have. But otherwise, thank you so much.

YC Tech Talks: Designing Game Characters with Deep Learning, from Cory Li at Spellbrush (W18)

More Articles