Interactive Guide to Buffer Overflow exploitation -- Vetle's HackShack

A Buffer Overflow is a bug class in a program typically written in a memory unsafe language like C or C++. Buffer Overflow bugs from user-input can often allow someone to overwrite some data in memory they weren’t supposed to. Before we dive into how to exploit Buffer Overflow bugs, we will do a quick introduction to Assembly.

Assembly is a language that describes raw machine code. Assembly is the lowest level programming language, and the processor executes exactly what you write in Assembly. This means that we can also directly translate machine code bits and bytes back to Assembly without losing any information. Languages like C, C++, Rust, etc. on the other hand translate what you write to machine code, which we can translate back to Assembly, but we can’t translate it back to exactly what the code was in C, C++ or Rust, only approximations based on the Assembly. For all intents and purposes, Assembly is the exact code that the processor executes.

Let’s jump into a little interactive primer on Assembly. First of all, in Assembly we don’t really have variables in the sense that we have in JavaScript, C, Rust, Go, etc. Instead, we have a set amount of registers that can store one value at a time. On a 64-bit system, these registers can store up to 64 bits of values (e.g. number from 0 to 18 446 744 073 709 551 615), on a 32-bit system registers can store up to 32-bit of values (0 to 4 294 967 295). Some of the registers in Intel architectures are named RAX, RDI, RBX, these are general purpose registers that you can pretty much use for whatever you want. Then there are others like RIP and RSP which control the address of the next instruction we should execute is in memory and address to the stack (more on that later), respectively.

Underneath you will see what happens when we execute mov {register}, {value} instructions. mov is a “command” that tells the processor to store (or “move”) values into a register or memory address. When you step through the program you can see that for each mov instruction the register updates with the value specified in the mov instruction. Most values in these examples are represented as HEX values.

Note that eax and rax are two different names for the same register. You can read about that here if you're interested in learning more about it.

Most processor architectures have a concept of a “stack”. The stack is a quick place to store temporary values in memory (RAM). 64-bit systems store 64 bits (8 bytes) at a time in a growing stack, and when you go to fetch the values you fetch them 64 bits at a time, or 8 bytes. You store a value on the stack by using instruction push {register name} and you fetch a value into a register by using instruction pop {register name}. There is a special register RSP that has the value (pointer) of the memory address of the last value on the stack. This means RSP should always point to the last value you pushed to the stack. If you pop (fetch) a value from the stack, RSP decreases by 8, because we always push (store) and pop (fetch) values in 8-byte increments or decrements.

Here we set RAX to the HEX value 0x13371337, we push that value to the stack and then pop it into RDI

On the right side underneath the registers you can see the stack, each box is 1 byte, each row is 1 8-byte value. On the top of each box you can see the memory address of the byte, on the bottom is the byte’s value and in the middle you can see the ASCII representation of the value.

Here’s how it looks if you push and pop multiple values.

There’s another special register called RIP that always points to the memory address of the next instruction the processor is going to execute. This register cannot be changed with mov instructions, but we can use jmp instructions instead. This allows us to jump to different places in the program when we need to.

When we jump to "add2" we add 2 to the current value of RAX, when we jump to add1 we add 1.

The emulator has conveniently labeled addresses for us, so instead of seeing jmp 0xFA5, it’s changed to the more readable jmp add1. The labels in the assembly view are for convenience, but when the processor executes the code, it doesn’t care nor need the labels. This means that the label add1 is for address 0xFA5, main is for address 0xFAB and add2 is for address 0xFB2.

Another way to jump to another place in the code is to use a call {address} instruction. A call instruction basically pushes the address of the next instruction after the call instruction to stack then jumps to the specified address. The counterpart to call is ret which pops the last value on the stack and jumps (returns) to that address. This lets pieces of code act as functions that you can call and then return from when it’s done and continue normal execution. Functions we often want to call is to print something on screen, transform text, etc., just like you use functions in other programming languages.

Calls can also be nested, and you should see the stack grow and shrink as calls and returns are being executed.

We can also call functions with arguments, but we cannot do that inline in a call instruction, we need to either use the stack or the registers. For functions with few parameters we use registers for the first argument, the second, and the third. If we want to print something, we can use the printf function with the memory address of the string we want to print in the RDI register. To load the memory address into RDI we need to use the lea instruction, for our intents and purposes you can think of lea as a fancy mov. When we run the lea instruction in the emulator you can see that RDI is updated to 0x1000 which is the address of the “Hello, world!” string.

Which registers to use for function arguments, and how many before the stack is used instead, varies a lot depending on programming language, compiler, operating system, etc. Assembly does not specify which registers must be used for arguments.

Some functions also return some value, this is often in the RAX register. After we’ve called a function we can get the return value from the RAX register.

Underneath is a little program that parrots what you input. First it prints the string “Say something: “ to the screen, then it allocates 8 bytes on the stack for your input by subtracting 8 from the value of RSP. The read function takes two arguments, the first (in RDI) is the address of where to put the user’s message, the second (in RSI) is the maximum number of bytes from user input to read.

When you reach the read instruction, all further execution is blocked until you press enter in the console.

When running the example, you should be able to see your input (up to 8 bytes) in the stack visualization, each character represented in their own “boxes”. printf in this case takes the first format argument in RAX, so we copy the value of RSP over to RAX because the user input string is currently the last thing on the stack, and we want printf to print that string.

A bug arises when user input is allowed to exceed the number of bytes we reserved for it. In the case of the last example, if RSI had been a larger number than what we subtracted from RSP before calling read, the user could have written over some previous values on the stack.

In the get_name function of the example below we are allocating 0x10 (16) bytes from RSP (with sub rsp, 0x10), but when calling read we are instructing it in the RSI register to accept up to 0x18 (24) bytes into the memory location of RDI (which points at the stack). The value put on the stack before the 16 bytes allocated for user input is a return address from the call made to get_name in main.

In this example there is a function called secret_function that is not called anywhere in normal execution. However, because of the buffer overflow it is possible to “return” to that function by first writing 16 bytes of whatever you want followed by 8 bytes that represent the address you want to return to. For example, if I wanted to go directly to call exit at address 0xF75, I would input AAAAAAAAAAAAAAAA\x00\x00\x00\x00\x00\x00\x0f\x75 into the console when prompted for input. By prepending \x to a hexadecimal, you input that byte directly, so \0x41 is the ASCII representation for A, \x00 is just a null byte. The secret_function starts at address 0xf52, you should be able to edit my example input to jump to that function.

The last example is more complex than the previous examples. This program generates a random password each time it starts and requires you to input the right one to “log in”. You’ll find that in the enter_password function read is being given more space than was allocated on the stack. Try to bypass the login function and get to say_success.

Some of the concepts in this blog post are grossly oversimplified, and we are not talking about any mitigations to these types of attacks at all. These types of attacks have worked well in the past, but due to mitigations such as ASLR, DEP, Stack Canaries, and more recently Pointer Authentication, modern binary exploitation requires a lot more effort from the attacker.

I have not touched endianness because I only see it as adding unnecessary complexity that isn’t really relevant to the exercises. In fact, the emulator actually handles endianness wrong, according to the Intel specifications.

Normally, in a situation where you can control the instruction pointer and DEP is not present (e.g. you can execute code on the stack or other writable memory) you would just write your own custom code there and jump to it. That gives you a lot of freedom in terms of how to attack, but with DEP enabled you instead have to create a ROP chain. We have another interactive tutorial on ROP that you might want to look at.

I’d love some feedback over on Twitter @bordplate. Or you can check out the code for the emulator on GitHub.