GNU Assembly Tutorial 2

5m read

·Nov 3, 2024

Hey guys, this is Matt Kidzone1 with our second assembly tutorial. In the last tutorial, I taught you about your processor, your memory, and some of the assembly language. Today, I'm going to be describing more in detail the code we wrote last tutorial in assembly, and I'm going to be introducing some new code as well. So let's get started by looking at the code we wrote last assembly tutorial to refresh our memory.

So this is a program that just returns the number zero. It has approximately seven lines of code if you want to count this blank line, and it's eight. So the thing is that this is the line of code we focused on last tutorial. Today, I'm going to be focusing on these two lines of code and what they do.

Now, before you even understand what these two lines of code are, you have to understand something about every process that runs. So let's think back: every assembly program can access any part of memory. The problem with that is that two assembly programs might accidentally bump into each other in memory and start overwriting each other's data. Programmers don't see this as good, and so everyone came up with a solution to this problem. It's pretty, pretty simple to grasp, and that is something called the stack.

A stack is a pointer to a place in memory that every application has that's unique to them. This place in memory will only be used by them and has a limited size but is for them only to use. So we use stacks to store all our application's data normally. So that's what stacks are for.

Before we even go on even farther, the reason we have to put this at the beginning of every function is for one reason that you'll understand in a second, and I'll explain it to you. It's the same reason we have to put this at the end of every function before the ret statement. So right here is a move instruction; right here’s a push instruction. So you guys already know what moveq does; it moves 64 bits from the first value into the second value.

So that means that rbp will equal rsp. So what exactly is the importance of this? What are rsp and rbp? Rbp is the base pointer, so that's the top of the stack; that's the first byte of the stack. That's the pointer to that. Anyway, rsp is a pointer to the last byte of the stack, so that way everything in between and including rbp to rsp is your application's data.

You can also subtract from rsp to give yourself more room. So the stack starts at somewhere, and you subtract from that somewhere to get more space on the stack. So here's how the stack starts: ebp and esp are the same; they both point to the first byte of where you're allowed to write on your stack.

Let's say we added two values here: ebp would still point to the first value, and then there would be two more values that, let's say, are both eight bytes, considering this diagram I made and assumes they're eight bytes. So esp will point to the last byte, and ebp will point to the first byte. So every time you add an object, you want to subtract from esp and then put it at esp.

So that leads us to this pushq instruction. Pushq adds eight bytes to the stack, and so it's adding rbp to the stack and subtracting eight from rsp. Okay, so why do we put this at the beginning of every function? That's another question, and the reason is the assumption that every function is being called by another function. This isn't always necessarily true, but it's good to keep under these concepts.

So say you're in a function that has a stack that looks like this; then you want to call a function. Do you really want the function to see ebp up here and to see everything you put in your stack? Well, the answer to that is no. So what this does right here is sets the top of the stack to the bottom of the stack. Essentially, if this is your function stack, here's esp. When you call another function, it turns ebp into esp so that way there's only the top of the stack, so this essentially gives you more room on the stack for your function and doesn’t let you see what was on the stack in the function that was calling you.

But why do we do this? Why do we randomly add the place or the top of the stack to the stack? The answer to that is that when we do this, we're changing the stack around, and we're getting rid of what the calling function had set up in the stack. The thing is that they might want that feature back; they might want their stack to be back to normal.

So what we do is first we push the top of the stack to the stack, then we move the top of the stack to the bottom of the stack. This leaves then gets this back from the stack and sets it back to rbp so that way the function, when we return to the previous function, its stack will be the same as it was when we left. So that’s why you do this, and that’s why you do this.

That's just a brief explanation, and you can ask me for more details if you don't quite understand. So I'm going to move on. The next thing I'm going to teach you is something called calling. Right now we have a main function. There are lots of other functions that come with GNU, which, you know, is what Unix/Linux is, that are useful, and the function, if you've done this in math, has things you pass into it.

So say you have a function that takes two numbers and adds them. A function has a return value, which is put into eax, and as many parameters as it wants to have. We're going to today be learning about a function called puts. So to just call this function, we do call _puts because the name of the function is _puts. In C, it was puts, but when you compile things in C, it adds an underscore under them to make sure that you know that you're calling a C function.

And this code right here will call the function. The problem is, how does puts know to what to print out? And that's where parameters come in. So up here, before the global name, I'm going to do a few things, and I'll explain what I'm doing here in the next tutorial. Let's get rid of that.

All right, so puts requires one parameter, and that parameter is expected to be put into the rdi register. So to do that, we do the following: lea q [lc0 + rip], and then we're going to move that into rdi, then we're going to move zero into eax. It's just because that's the convention you normally would do. Voila! You'd think this would work because this would load—and I'll explain this later—this loads "Hello, World!" into rdi basically, and then it calls this.

But the problem with it is that we still haven't done one thing that's mandatory for calling functions, and that is that your stack has an extra 16 bytes on it. So to do that, we subtract 16 from the stack just like so: subq stands for subtract quad, so subtracting 16 from the quad rsp. So there we go, now if we compile this, will you run it? It says "Hello, World!"

This is our Hello World application. In our next tutorial, I'll explain this to you as well as this in more depth and this to you. So that's it for this tutorial. You can Google this, personal message me if you want to know now how everything works, and I'll try to help you with that.

Anyway, thanks for watching, Mackinson1. Subscribe, and goodbye!

GNU Assembly Tutorial 2

More Articles