The Importance of Reverse Engineering in Network Analysis

The Importance of Reverse Engineering in Network Analysis

Comprehensive research is required to create the best detection rule for a new vulnerability or threat. But what does ‘best’ mean? Well, the interpretation of ‘best’ depends on what we know about the vulnerability, but sometimes key information may not be available. Therefore, to develop accurate detection rules that can track malicious activity, you must search for this information in non-traditional areas, like the binary code of malicious tools.

In this blog, we will detail the process of creating accurate network signatures by closely analyzing the source code of a backdoor exploit. Reverse engineering in network analysis is essential for building rules that can effectively detect malicious network packets, reduce false positives, and ultimately help defend against malicious threats to OT/IoT

Threat Detection 101

Let’s imagine that the only information available for a certain vulnerability is a basic, non-technical description of a router that executes commands and exploits created by the same researcher. Even with this limited information, it’s still possible to create the first rule to detect that exploitation. Figure 1 shows an example of intelligence and network traces harvested by Nozomi Networks Labs IoT honeypots. This example shows a network packet exploiting CVE-2022-27255, but the exploitation is not immediately clear. More context is needed in order to prevent false positives.

Network packet exploiting
Figure 1. Network packet exploiting CVE-2022-27255.

To detect this exploitation, we need to examine the protocol in use to understand what data should and should not be present at specific offsets. SANS suggests a detection strategy based on specific strings, and the packet sizes based on the parameters of a legitimate packet.

While SANS provided a great threat detection strategy, our goal is to detect the different ways attackers are exploiting certain vulnerabilities. It’s a tough decision between creating a rule that is flexible enough to detect multiple variants of that exploit, risking the chance of false positives, or making a rule narrow and focused on detecting just that one variant.

Creating Detection Rules Based on Exploits

Packets, like the one in Figure 1 which exploits CVE-2022-27255, are not custom-made each time the vulnerability is exploited. So, when this type of exploit becomes publicly available, it allows us to gain a better understanding of how these network packets are created and aids us in creating better and more efficient detection rules.

In most cases, new exploits will be created based on the first version released by the researcher who discovered the vulnerability. We have observed IoT devices being targeted by malware that mimic source code of public exploits. We noticed similar loops, strings and execution flow logic present across different attackers exploiting the same vulnerability; the only difference being slight variations in how they check for errors or initialize data structures.

Why is this? Well, duplicating existing exploits is a convenient way for malware authors to save time and energy spent on research that has already been done. This means that new variants are usually similar to the earlier versions, making it easier to create network signatures.

Therefore, additional information is needed to avoid unwanted detections that can be a distraction from detecting the actual threat. To achieve this, the rules must only rely on code strictly necessary for exploitation.

Take a look at an exploit for some TP-Link Archer devices in Figure 2.

In this example, a rule could be created to detect the string host=127.0.0.1;, but there’s the risk of this string also appearing in some legitimate traffic. To prevent this from happening, more context can be added by adding more conditions, like finding the string X_TP_ConnName=ewan_ipoe_s or IPPING_DIAG. The only time a problem will arise, where the proposed detection won’t work, is if 1. the IPPING_DIAG string is not necessary to successfully exploit the vulnerability, and 2. if the attackers change the IP address. On the other hand, if attackers start to change all the possible variables, for example by changing X_TP_ConnName=ewan_ipoe_s to X_TP_ConnName=none_none or even removing that parameter, they risk the chance of the exploit becoming useless. This is why malware authors would rather copy the source code of other exploits, minimizing the modifications made to them.
TP-Link Archer exploit.
Figure 2. TP-Link Archer exploit.

Investigating a Backdoor Exploit

Our team creates network signatures by reverse engineering the binary of Remote Access Tools (RATs), botnets, backdoor exploits, and other malicious tools on a daily basis.

For example, let’s take a backdoor exploit that was detected a few years ago and is still being exploited today. Some Netis devices (Netcore in China), had a backdoor in their systems that allowed any user to execute arbitrary commands and upload/download files from that device. To initiate sending commands, a specific string had to be sent to enable the backdoor. Once enabled, the backdoor would start accepting commands, executing them, and returning their results. While this isn’t an exploit per se, the research steps required would be similar in nature.

When we started our research, we saw three different exploits to access this backdoor:

  1. GitHub: DDOS-RootSec
  2. GitHub: MSF-Testing-Scripts
  3. Ideaone

In all these exploits, we saw a common string to enable the backdoor, as seen in Figure 3.

Exploit code that enables the Netis backdoor.
Figure 3. Exploit code that enables the Netis backdoor.
But before we can create a good network signature, we must first identify patterns in the network. The exploit code that enables the Netis backdoor, as seen in Figure 3, seems to be a good starting point in creating a pattern to detect this backdoor activity. After some research, we discovered more variants that implemented this protocol with the Netis backdoor, but in a different way. In this case, GitHub and Exploit-DB sent the first UDP packet with a different ‘header’: \x00\x00\x00\x00\x00\x00\x00\x00netcore. This helps us understand that if we had created a rule detecting the string AAAAAAAAnetcore, we would be missing an important percentage of attack attempts.

Threat Detection Complications

Now that we have reviewed the basic methods of threat detection, let’s go a step further. If we only focus on detecting the login command, our detection system will only be able to identify new connections. When the backdoor executable receives the first login packet, the backdoor is enabled; it will then be ready to process the packets by asking it to execute commands. These packets to execute commands are different from the login command and can’t be detected using the same detection logic.

In a malicious connection, there will be a single initial login command but multiple packets to request command execution. If our detection system focuses only on detecting the login command, it wouldn’t be able to detect malicious connections that previously enabled the backdoor. What would happen if our detection engine started listening on a network where the backdoor is already enabled? This is an unusual situation, but we can’t leave that window open. In this scenario, the detection system would not detect someone interacting with the backdoor, therefore requiring a different approach to detecting the connection to this backdoor.

A Different Approach to Find Patterns

Let’s set aside the fact that this protocol runs on the rarely used 53413 UDP port and focus on detection pattern research. There’s a fair amount of consistency on how to accomplish the login command, but the difficulty increases when trying to develop a single pattern to detect all the different commands.

Some of them send the pattern preceded by eight zeroes, as seen on Exploit-DB, while others mix the letter ‘A’ with zeroes (AA\x00\x00AAAA), as seen on GitHub. There is also another Exploit-DB example where the command has zeroes and ones in the header! With all these given examples, a few questions arise. Do all of them serve the same purpose? Are those headers strictly necessary, or may an attacker completely randomize them to avoid detection? Although there are several different source codes, we have seen these three different ways to send commands to the backdoor. We can’t be sure that these are the only three ways to interact with the backdoor. In the worst case, we will only be able to detect these three different attacks, but ideally, we would prefer to detect all possible interactions with the backdoor.

Some of them send the pattern preceded by eight zeroes, as seen on Exploit-DB, while others mix the letter ‘A’ with zeroes (AA\x00\x00AAAA), as seen on GitHub. There is also another Exploit-DB example where the command has zeroes and ones in the header! With all these given examples, a few questions arise. Do all of them serve the same purpose? Are those headers strictly necessary, or may an attacker completely randomize them to avoid detection? Although there are several different source codes, we have seen these three different ways to send commands to the backdoor. We can’t be sure that these are the only three ways to interact with the backdoor. In the worst case, we will only be able to detect these three different attacks, but ideally, we would prefer to detect all possible interactions with the backdoor.

As usual, there is no official documentation available for this backdoor… but our research let us find the backdoored executable. Some Metasploit collaborators ran the backdoored system to check if the code they were developing was able to properly interact with it. Without official documentation or the source code of the backdoor, having the backdoor binary is very good news.

Reverse Engineering the Binary to Find Patterns

Here is when the reverse engineering process allows us to fully understand how the backdoor works, its internal workflow, and the checks done to the received inbound connections to allow them to execute commands on the compromised devices. We are not going to fully reverse engineer the sample, as this has already been done.

To understand how to create detection rules for this backdoor, we will broadly follow the logic of the code and focus on the parts of the binary that parse the received UDP packets. By doing this, we will gain knowledge about the code and be able to create rules that can more accurately detect interaction with the backdoor.

When we open the backdoor sample igdmptd with our tools, it starts its initialization by calling the bind function to listen on port 53413 using the UDP protocol. This allows us to begin filtering connections. After the initialization, the main loop will start reading packets from this port and sending them to the call_mptlogin function.
 call_mptlogin function
Figure 4. call_mptlogin function.
The call_mptlogin function, as seen in Figure 4, is responsible for verifying if the content of these packets meets the necessary conditions for accepting commands. This function receives the content of the received packets after discarding the first 8 bytes. The only condition for a successful return from this function (returning 0) is that the first bytes must be netcore. As we can see, there are no restrictions on the content of the first 8 bytes. This means that if we created a rule looking for AAAAAAAAnetcore string, we would not catch many successful backdoor login commands.
After the first netcore string comparison is successful, the backdoor will redirect the execution flow to another loop that will be the new code that will interpret the received UDP packets.
The content of the incoming packets is stored in the recv_buff variable, as seen in Figure 5; the third and fourth bytes will be stored in the cmdopt variable. The cmdopt variable will be checked to know which action the packet is requesting. Depending on if the value is \x00, \x01 or \x02, different functions of the code will be reached to interact with the compromised device. The most common option encountered in the wild, \x00, allows the direct execution of bash commands starting from the 9th byte onwards.
operate_loop function analysis
Figure 5. operate_loop function analysis
After comparing these 2 bytes, no additional significant checks were performed on the rest of the bytes. This means that the other bytes in this header can be randomly changed, and the packet will be perfectly valid. Despite this, we are still detecting exploitation attempts that still use the same original exploit AA\x00\x00AAAA header, eight years after this vulnerability was discovered.
On one hand, our honeypot research suggests that creating a network signature to detect the first public exploits that use only 2 headers (AA\x00\x00AAAA and x00\x00\x00\x00\x00\x00\x00\x00) would’ve simplified our network signature creation process, even after eight years. On the other hand, the backdoor reverse engineering analysis reveals that the smallest modification to the first byte of this header would have been able to evade a detection logic designed to detect most of these attacks.

Why Reverse Engineering in Network Analysis Is Important

Detecting vulnerability exploits requires a deep understanding of all protocols involved. Sometimes achieving the perfect vulnerability detection coverage requires the risk of some false positive detections, which are more desirable than false negative alerts that can lead to unpredictable consequences.

Botnet developers tend to prioritize covering a wide range of vulnerabilities above taking time to understand each one in depth. This is because the extra effort put into perfecting an attack may leave them vulnerable to detection, and slight modifications can potentially cause the exploit to fail.

Therefore, reverse engineering the binary of malicious tools can be a good technique for researchers to use in creating accurate network signatures.