A Buffer Overflow is a bug class in a program typically written in a memory unsafe language like C or C++. Buffer Overflow bugs from user-input can often allow someone to overwrite some data in memory they weren’t supposed to. Before we dive into how to exploit Buffer Overflow bugs, we will do a quick introduction to Assembly.
Assembly is a language that describes raw machine code. Assembly is the lowest level programming language, and the processor executes exactly what you write in Assembly. This means that we can also directly translate machine code bits and bytes back to Assembly without losing any information. Languages like C, C++, Rust, etc. on the other hand translate what you write to machine code, which we can translate back to Assembly, but we can’t translate it back to exactly what the code was in C, C++ or Rust, only approximations based on the Assembly. For all intents and purposes, Assembly is the exact code that the processor executes.
Let’s jump into a little interactive primer on Assembly. First of all, in Assembly we don’t really have variables in the
sense that we have in JavaScript, C, Rust, Go, etc. Instead, we have a set amount of registers that can store one value at a time.
On a 64-bit system, these registers can store up to 64 bits of values (e.g. number from 0 to 18 446 744 073 709 551 615),
on a 32-bit system registers can store up to 32-bit of values (0 to 4 294 967 295). Some of the registers in Intel
architectures are named RAX
, RDI
, RBX
, these are general purpose registers that you can pretty much use for whatever
you want. Then there are others like RIP
and RSP
which control the address of the next instruction we should execute is
in memory and address to the stack (more on that later), respectively.
Underneath you will see what happens when we execute mov {register}, {value}
instructions. mov
is a “command” that
tells the processor to store (or “move”) values into a register or memory address. When you step through the program you
can see that for each mov
instruction the register updates with the value specified in the mov
instruction. Most values
in these examples are represented as HEX values.

eax
and rax
are two different names for the same register. You can
read about that here
if you're interested in learning more about it.
Most processor architectures have a concept of a “stack”. The stack is a quick place to store temporary values in memory
(RAM). 64-bit systems store 64 bits (8 bytes) at a time in a growing stack, and when you go to fetch the values you fetch
them 64 bits at a time, or 8 bytes. You store a value on the stack by using instruction push {register name}
and you
fetch a value into a register by using instruction pop {register name}
. There is a special register RSP
that has the
value (pointer) of the memory address of the last value on the stack. This means RSP
should always point to the last
value you pushed to the stack. If you pop (fetch) a value from the stack, RSP
decreases by 8, because we always push
(store) and pop (fetch) values in 8-byte increments or decrements.

On the right side underneath the registers you can see the stack, each box is 1 byte, each row is 1 8-byte value. On the top of each box you can see the memory address of the byte, on the bottom is the byte’s value and in the middle you can see the ASCII representation of the value.
Here’s how it looks if you push and pop multiple values.

There’s another special register called RIP
that always points to the memory address of the next instruction the processor
is going to execute. This register cannot be changed with mov
instructions, but we can use jmp
instructions instead.
This allows us to jump to different places in the program when we need to.

The emulator has conveniently labeled addresses for us, so instead of seeing jmp 0xFA5
, it’s changed to the more readable
jmp add1
. The labels in the assembly view are for convenience, but when the processor executes the code, it doesn’t care
nor need the labels. This means that the label add1
is for address 0xFA5
, main
is for address 0xFAB
and add2
is
for address 0xFB2
.
Another way to jump to another place in the code is to use a call {address}
instruction. A call
instruction basically
pushes the address of the next instruction after the call
instruction to stack then jumps to the specified address. The
counterpart to call
is ret
which pops the last value on the stack and jumps (returns) to that address. This lets
pieces of code act as functions that you can call and then return from when it’s done and continue normal execution.
Functions we often want to call is to print something on screen, transform text, etc., just like you use functions in
other programming languages.

Calls can also be nested, and you should see the stack grow and shrink as calls and returns are being executed.

We can also call functions with arguments, but we cannot do that inline in a call
instruction, we need to either use
the stack or the registers. For functions with few parameters we use registers for the
first argument, the second, and the third. If we want to print something, we can use the printf
function with the
memory address of the string we want to print in the RDI
register. To load the memory address into RDI
we need to use
the lea
instruction, for our intents and purposes you can think of lea
as a fancy mov
. When we run the lea
instruction
in the emulator you can see that RDI
is updated to 0x1000
which is the address of the “Hello, world!” string.

Which registers to use for function arguments, and how many before the stack is used instead, varies a lot depending on programming language, compiler, operating system, etc. Assembly does not specify which registers must be used for arguments.
Some functions also return some value, this is often in the RAX
register. After we’ve called a function we can get
the return value from the RAX
register.
Underneath is a little program that parrots what you input. First it prints the string “Say something: “ to the screen,
then it allocates 8 bytes on the stack for your input by subtracting 8 from the value of RSP
. The read
function takes
two arguments, the first (in RDI
) is the address of where to put the user’s message, the second (in RSI
) is the
maximum number of bytes from user input to read.

read
instruction, all further execution is blocked until you press enter in the console.
When running the example, you should be able to see your input (up to 8 bytes) in the stack visualization, each character
represented in their own “boxes”. printf
in this case takes the first format argument in RAX
, so we copy the value of
RSP
over to RAX
because the user input string is currently the last thing on the stack, and we want printf
to print
that string.
A bug arises when user input is allowed to exceed the number of bytes we reserved for it. In the case of the last example,
if RSI
had been a larger number than what we subtracted from RSP
before calling read
, the user could have written
over some previous values on the stack.
In the get_name
function of the example below we are allocating 0x10 (16) bytes from RSP
(with sub rsp, 0x10
),
but when calling read
we are instructing it in the RSI
register to accept up to 0x18 (24) bytes into the memory
location of RDI
(which points at the stack). The value put on the stack before the 16 bytes allocated for user input is
a return address from the call
made to get_name
in main
.

In this example there is a function called secret_function
that is not called anywhere in normal execution. However,
because of the buffer overflow it is possible to “return” to that function by first writing 16 bytes of whatever you want
followed by 8 bytes that represent the address you want to return to. For example, if I wanted to go directly to call exit
at address 0xF75, I would input AAAAAAAAAAAAAAAA\x00\x00\x00\x00\x00\x00\x0f\x75
into the console when prompted for input.
By prepending \x
to a hexadecimal, you input that byte directly, so \0x41
is the ASCII representation for A
, \x00
is just a null byte. The secret_function
starts at address 0xf52, you should be able to edit my example input to jump
to that function.
The last example is more complex than the previous examples. This program generates a random password each time it starts
and requires you to input the right one to “log in”. You’ll find that in the enter_password
function read
is being given
more space than was allocated on the stack. Try to bypass the login function and get to say_success
.
Some of the concepts in this blog post are grossly oversimplified, and we are not talking about any mitigations to these types of attacks at all. These types of attacks have worked well in the past, but due to mitigations such as ASLR, DEP, Stack Canaries, and more recently Pointer Authentication, modern binary exploitation requires a lot more effort from the attacker.
I have not touched endianness because I only see it as adding unnecessary complexity that isn’t really relevant to the exercises. In fact, the emulator actually handles endianness wrong, according to the Intel specifications.
Normally, in a situation where you can control the instruction pointer and DEP is not present (e.g. you can execute code on the stack or other writable memory) you would just write your own custom code there and jump to it. That gives you a lot of freedom in terms of how to attack, but with DEP enabled you instead have to create a ROP chain. We have another interactive tutorial on ROP that you might want to look at.
I’d love some feedback over on Twitter @bordplate. Or you can check out the code for the emulator on GitHub.