Neural Networks (Part 1)
Hey guys, this is Maads 101, and today I'm going to be talking about artificial neural networks. So in this video, I'm just going to be talking about the different kinds of neural networks there are and some of the things you can do with neural networks. Then in the next video, I'm going to talk about how we actually train a neural network and how you might actually go about implementing a neural network in code.
So this is going to be a more high-level video to get you introduced to the subject, and I'm going to go into some math as well. I want to start off by talking about the wide range of things you can actually do with the neural network.
So the most basic fundamental thing you can do is classification. For instance, handwriting recognition. You know, I show you a digit that someone's written; you tell me, is it a zero, one, a two, etc. Neural networks are pretty good at things like that. They're certainly not the only approach, but actually for more complex object recognition—like I show you a picture, and I want you to tell me there's a bunny in the picture and a cage in the picture and stuff like that—neural networks are actually state-of-the-art at object detection like that. They beat SPMs; they beat nearest neighbors. I, you know, I don't know what other approaches people are trying, but right now they're state-of-the-art in that kind of thing.
Another thing you can do with the neural network is something called unsupervised learning. To give you an example, I took a database of about 4,000 faces with no labels. I didn't say whether they're a man or a woman; I didn't say any information about them—just raw pictures of faces. I fed this to a kind of neural architecture called the generative adversarial network, and that network learned to construct new random images that look like faces.
So it learned what about a face is necessary for something to look like a face. You know, a nose, a mouth, eyes, things that we know about it. It actually learned just from the raw images. I gave it no information besides the images, and it learned to generate things that actually look enough like faces that it's kind of eerie and creepy.
And then that was the unsupervised learning part. Now that my network understands a lot about faces, I use that understanding of faces to train it to figure out where to put mustaches on a person's face when it's drawing a fake mustache on their face. Because it already understood faces so well, it learned to do this in a number of seconds, and it didn't need that much data to figure out what to do. Because it already knew about faces, it just had to overlay this new objective onto the knowledge it already had.
Something probably even more cool than the things I've already talked about is reinforcement learning. Some researchers built a system using a neural network, you know, a deep artificial neural network, where it just gets the pixels of a video game, and it gets rewards and punishments based on the score of the video game. Just from that, it figures out how to play.
So it doesn't have any pre-existing knowledge of the goals of the game; it doesn't even, you know, it's just getting the raw pixels. It doesn't know what a paddle is if it's playing like Brick Breaker; it doesn't know what a brick is. It's just getting these raw pixels and these raw rewards and punishments, and it manages most of the time to figure out the objective of the game and how to, you know, best obtain that objective. It often becomes superhuman level in a number of hours. So that kind of thing is really exciting, and deep learning and artificial neural networks are basically at the heart of advances like that.
So I think that's really cool. The last thing I want to touch on, which I think is particularly interesting, is sequence modeling with a recurrent neural network. Basically, the key idea is you can use a neural network to intake a sequence in and then output a sequence.
So a sequence could be anything like a string of letters, you know, an English sentence; it could be a string of audio samples for, you know, someone talking. The output sequence could be something like that as well. So, for instance, if I wanted to make a neural network translate between English and French, it could take an English sequence of words or letters, and it would output a French sequence.
That actually would get pretty good results, and neural networks are pretty good at translating between languages. They're good at speech recognition, and it's all through this sequence modeling type of approach. Stuff like this, sequence modeling recurrent neural networks, are really exciting. I work with them a lot, and they're cool. You can use them for a lot of creative things, like generating fake text, making a robot that has a fake conversation with you, like reads your messages and then writes a new message and actually learns to spell pretty well, might learn a little bit of grammar.
But it doesn't pass the Turing test yet. So, but I think it's really exciting. So sequence modeling is another cool thing you can do with neural networks.
So now I want to actually start describing more of what a neural network actually is and how it works. I'm going to go over the structures of a couple different kinds of neural networks just to give you an idea of what's actually going on. A natural place to start is to describe the behavior of a single neuron, and this will allow us to eventually build up larger, deeper networks and talk about more complicated kinds of neural networks.
So in a simple neuron, which is this yellow thing here, we have a bunch of inputs and one output, and everything is basically a number. So this input might be, you know, in a given time, 3; this might be -7; this might be 2; the output might be zero; it might be 3; it might be -1, something like that. So everything is a number, and the neuron just acts as a mapping between a bunch of inputs and one output.
So it's just a way to combine a bunch of inputs into a single number. You know, that's basically the core of what a neuron does. There's a bunch of different ways you might be able to think to combine a bunch of inputs into an output. For instance, you might add them all up to get the output; you might multiply them to get the output; you might take all their square roots and add them to get the output, something along those lines. There are just a whole bunch of possible things you could do with a neuron to get a given output.
But the most standard way to combine inputs to get an output is something called a linear combination. I'm going to show a formula for this, and I promise I'll give a more intuitive explanation in a second and actually a cool example of how this works.
So what a linear combination does is we have a bunch of weights, which are basically constant for the neuron. So the neuron's output we will find by multiplying the first weight by the first input, and the second weight by the second input, and the third weight by the third input, and so on and so forth, and then we'll add all these together. The purpose of the weights is basically to assign different priorities to the different inputs.
So let's say W1 was zero; well then the output of the neuron would not depend on this input at all, right? Because we're multiplying N1 by W1. So if the first weight is zero, then the first input is always multiplied by zero, so it has no influence on the output. But if W1 were say a tenth, then changing N1 by even a little bit would change the output drastically because it's multiplied by this large weight. In that case, it would be a tenth.
So the weights decide how much each of the inputs influence the output of the neuron. The weights are constant; you know, they define the behavior of the neuron. You know, the set of weights defines how the neuron's output responds to the different inputs. The inputs might change, so if I show this neuron—maybe I'm trying to get it to classify a picture—if I show it a picture of a car versus a picture of a truck, the inputs might change, but the weights won't change.
So the weights define the behavior of the neuron, whereas the inputs are, you know, the variables that the neuron is acting on. So now I want to give you a more concrete understanding of how this might work. I'm going to do that by giving you an actual example.
So let's say we want to construct a neuron, just a single neuron, that will predict whether someone's laptop will break in the next year or not. The neuron will take four inputs. It'll take the age of the laptop, the number of times it was dropped, the number of times it was repaired, and the number of times some kind of liquid was spilled on it. It's going to take these four inputs, and using some kind of linear combination, we want it to output some higher than normal number if the laptop is very likely to break and some low number if the laptop is likely to survive.
And high and low are relative in this case; really, we just mean the more likely the laptop is to break, the higher the output of the neuron should be. So this isn't going to be a perfect model. I don't think a single neuron is capable of modeling this perfectly, but we can capture some kind of actual meaning just by deciding for ourselves some weights that a neuron might have to do this.
So we've already decided the inputs of the neuron, and we're going to use a linear combination. This is what our neuron is basically going to look like. It's going to take these four variables, and it's going to compute an output by doing this weighted sum kind of thing. So we'll do weight one times the age plus weight two times the number of drops plus weight three times the number of repairs plus weight four times the number of spills, and we want this mathematical expression to come out to be a higher than normal number if the laptop is likely to break.
So how could we go about deciding these weights? Well, in the next video, I'm going to talk about how we can actually train a network to get it to learn the weights that are the best. But in this video, I just want to think about this intuitively.
So as the laptop gets older, it's probably more likely to break. So we want the output of the neuron to increase as the age increases. I'm just going to make weight one which multiplies the age—I'm going to make that one. So for every year the laptop ages, the output of the neuron goes up by one, right? So every year, the death likelihood is higher.
Now, the number of drops—if you drop your laptop, that’s really bad. It's probably worse than your laptop aging a year. So weight two, let's make it four. So if you drop your laptop once, the output of the neuron will go up by four. That's a lot, and maybe that's a little over the top, but at least it shows that this is worse than aging.
Now, if you repair your laptop, that's actually probably a good thing. It depends on where you had it repaired, but it's probably a good thing. So maybe weight three should be negative because the more repairs you get, the lower the output of the neuron should be. So maybe that'll be -1.
Now if you spill something on your laptop, well I don't know—if it's a small spill, maybe you can clean it up. It's still bad, but probably not as bad as a drop. So maybe that's a three; it's less bad than a drop but more bad than aging. So this is just a crude idea of maybe what our neuron might look like.
You know, if you give me these variables, I'll figure out an output like this, and we could actually come up with a table of some examples of hypothetical laptops, you know, their age, things like that, and what this neuron would actually output if we had a laptop that was, you know, in that scenario.
So here, I've got a really young laptop with no problems; its output is really low. Here I've got a relatively young laptop, but it's been dropped; it's been repaired twice; and some guy really didn't care about their laptop—spilled something on it, dropped it. So the output is high for this one because a lot of bad things have gone on, and this last one—you can see the output is really low because even though it's old, it's gotten a lot of repairs.
So it actually is beating this slightly younger laptop because this older one has gotten more repairs. So this is the kind of thing our neuron might output. Now one little thing to note is, you know, how do we decide what's a good output and what's a bad output? You know, like, if you go to the Apple Store and ask them, "What are the odds my laptop will break in the next year?" and they say the answer is 7, what do you, you know, like, how do you—what do you get out of that?
What you really want to know is, is that above normal? You know, is that below normal? Is that like really bad? Is that okay? So one thing we might want to do is center our neuron around zero. We might want it so like, maybe the 50% likelihood of breaking is at zero, and if you're really in good shape, your laptop's not going to break, it would be negative. And if it's in bad shape, it would be positive, something like that.
So all we have to do for that is just add some kind of constant to this. So maybe we'll subtract two from the output of the neuron so that way, the bad laptop is positive; the other ones are negative. This one's neutral, so that's just another idea—it's called biasing. So we have this neuron; it computes an output, and then we just add some constant to it to get it to be centered around zero in a nice way.
So the kind of neuron I just showed you is called a linear classifier, and linear classifiers are great for a lot of things. You can use them for text categorization, maybe spam filtering, stuff like that. And they will actually work really well, but there are some serious limitations to linear classifiers.
So there are some very simple sets of data that can be proven, you know, impossible to make a linear classifier for. I can give you a pretty simple table, and it's impossible for you to construct a neuron using the model I showed you to actually generate the outputs that I specify in the table. And what's worse is no matter how you combine linear classifiers in a network—so like, if you try to put multiple of these linear neurons together in a network and nest them and feed the output of one into the input of the next—no matter how you do that, it will always be equivalent to a single linear classifier.
So, no matter how you arrange these complex neurons, it'll always only be as powerful as one neuron. And the problem with that, of course, is that any limitation for one neuron is a limitation for a network. So we need a way to improve the neurons in a slight, kind of subtle way that will allow us to do anything we want with our networks without limitations.
The way we solve the problems of linear classifiers when we're dealing with neural networks is actually loosely inspired by biology. So, I guess a naive theory of the way neurons work in the brain is that they take weighted inputs, and up to a certain point, the neuron will not fire. Then if it's weighted inputs, you know, if it has enough weighted input, all of a sudden, the neuron will fire.
So basically, a neuron either has an output of zero or one in this naive theory of the brain. And if you'll remember, our linear neurons can have any value, unfortunately. So they don't match up with this naive theory of the brain. So our laptop neuron might have output a -1, a 7, a zero; it doesn't just have two outputs.
The idea to fix the problems of linear classifiers is to turn the linear neuron into something nonlinear—something with a kind of a spike, kind of a jump. That prevents it from just being a line that can go anywhere, and it actually makes it so combining neurons in a complex network is more powerful than just having one neuron.
So more concretely, here is what we're going to do. We're going to have our good old linear neuron right here, and it will just do the same thing it did before, but its output is now fed to this other thing called an activation function. It's the output from that activation function which we will finally use as the actual output of the overall neuron.
So basically, up until this point, everything is linear, and this will introduce a nonlinearity—something like that spike that I talked about in the biological neuron. So what does this thing, this activation function, actually do? Well, you can think of it as just a regular mathematical function that you probably learned if you took a high school algebra class or something like that.
Basically, it takes an input and it applies some mathematical function to it to get an output, and we can actually graph what this might look like. So here actually is a really good example of something that looks kind of like a spike. If the output of the linear part of the neuron is really negative, you know, it’s maybe -7 or something, the activation function will output -1 or something very close to -1.
And if the linear part of the neuron outputs something very large, even if it outputs like three, it's very close to one. In the middle here, it actually does look pretty linear; you know, if the linear part of the neuron outputs zero, then the activation function will output zero, and you know, it's almost a line. So as the linear part continues to increase, the activation will continue to increase, but then it slows down.
So this is a way to squash the output of the linear part of our neuron in such a way that the linear part still has some continuous way of sliding up and down, but it's now constrained between -1 or 1 or, in this case, between 0 and 1. It's squashed in a way that's similar to biology.
So the idea is just we introduce something at the end of a neuron which will squash the output of the neuron. We can draw that in a complete picture here. So now our new augmented neuron, our fancy neuron, has a linear part and an activation part. So in the end, we first take this linear combination of the inputs, and then we apply it to some nonlinearity, which I called Sigma.
So now I want to start talking about how we could arrange neurons in a network. The kind of network I'm going to start by describing is called a feed-forward fully connected network. So here I have a little small example of what one might look like. The basic idea is you arrange the neurons in a series of layers.
So each layer is a column of neurons, and basically, every layer is connected to the layer before it. So, if I'm a neuron in layer three, I get my inputs from layer two, from the outputs of the neurons in layer two. And I know I said a neuron can only have one output, and that is true, but that doesn't mean you can't send that output to multiple places.
So a neuron in layer one will send its output to every single neuron in the layer after it, so that's what you can see from the diagram. So basically, the way this network works is you have three inputs at the beginning— in this particular case in the diagram—and each of those inputs is fed to all of the neurons in the first layer.
Then all of those neurons' outputs are fed to all of the neurons in the second layer, and then all of those neurons' outputs are fed to the neuron in the third layer, which then makes the final decision of the network. So it has the same behavior as a single neuron. It takes a bunch of inputs and it generates one output, but it's a lot more powerful than one neuron because basically by having a bunch of layers, we can do more computation.
In fact, if we have enough neurons in one of the middle layers, we can do any computation you can actually prove mathematically that you can approximate any function you want using this kind of setup. So you might ask, why is it important to have multiple layers in this kind of setup, or why might we suspect that it would be nice to have a deeper network? That's what deep learning is; it's having a network with more layers.
The reason it makes intuitive sense that you might want a deep network is the following: Let's say I want to identify what kind of handwritten digit I'm looking at in a picture. Well, the first thing I see is a bunch of pixels. But then from that set of pixels, I can infer something a little stronger. I can say there's a bunch of little line segments. I can find all the little maybe three-pixel-long line segments, and then from those line segments, I can say, well, where are all the pen strokes?
Like, maybe a longer line, something like that. And then, from there, maybe I go to something even more abstract, like where are all the circles or circly-like things? Where are all the loops? Where are all the crossing edges, things like that? And then finally, I use those more high-level principles to just finally decide what digit it is.
And I'm not making that decision based on pixels anymore. I'm making it based on these abstract ideas like loops or circles or crosses, things like that. And that's the idea basically behind latent variables. So I guess I don't know what the opposite of latent is—the apparent variables are the pixels in the image, but then there are various levels of latent variables that might exist in the data, like pen strokes or abstract shapes and figures.
So the idea behind making a neural network deep is the input layer represents the apparent variables—the pixels—and then you would want the network to learn the next layer, maybe to figure out some more abstract ideas about the image, like where are the line segments. And then in the next layer after that, it uses the more abstract things to figure out even more abstract things, like the squiggles or whatever.
Then finally, at the end, it figures out the most abstract principle, which is like the number eight, which is very hard to infer from just a bunch of pixels, but if you've gone through this step of more and more latent hidden variables, you might actually get the right answer most of the time. So the idea behind a deep network is basically to just figure out more and more abstract or meaningful pieces of knowledge as you get deeper and deeper into the network.
So now I want to just talk about two modifications you could make to neural networks to make them even more powerful and interesting. The first idea is to give the network some memory—that is, some state, some information that it can use over time. So right now with our just our regular feed-forward network, the network gets some inputs and produces an output, and then it gets more inputs and produces another output, and it has no way of remembering what the last thing it saw was.
It just keeps producing outputs for new inputs, and the obvious way to implement memory, to make the network remember the last thing that happened, is to have some of its outputs connected back to its inputs. So the network at the end of every run, at the end of every feed-forward iteration, decides some stuff that it will see next time.
So it decides, you know, the contents of its short-term memory, if you will. So this is an idea called a recurrent neural network. And now I want to switch gears to talking about something else called a convolutional neural network.
So in our feed-forward neural network, everything in one layer is connected to everything in the layer before it, and this, you know, in and of itself seems to make some amount of sense, especially if we have no idea what the relationship between different parts of the data are. You know, if we have an arbitrary vector of data about a patient—maybe their medical info—and we're not doctors, we don't know what should be connected where, so let's just connect everything and let the neural network figure it out.
But for some things, we do kind of know what should and shouldn't be connected. So say you want to do something like visual object recognition, like recognize if a picture contains a monkey or doesn't contain a monkey. When you're looking at a picture, it tends to be that close pixels—pixels that are spatially close to each other—are more related than pixels that are far away from each other. It's just the way, I guess, vision works, the way sight works.
So, the idea with a convolutional neural network is basically to set it up so it's not like in the middle layers it's not that everything is connected to everything else. Rather, you will apply little localized kind of neural networks to different regions of the image. Then gradually, as you go deeper and deeper, more and more layers, it starts to relate further and further parts of the image until you have a final classification.
But the general idea is just a convolutional network utilizes our pre-existing knowledge about what things are more likely to be related than others, stuff like that. But anyway, I hope you learned a lot about artificial neural networks. Thanks for stopping by, subscribe for more videos like this, and goodbye.