Automatic Restoration of Corrupted UPX-packed Samples

Automatic Restoration of Corrupted UPX-packed Samples

Nozomi Networks Labs scans the web on a daily basis and monitors new techniques that Internet of Things (IoT) malware developers introduce to deceive automated code analysis systems. In most cases, these threats are relatively simple and can be easily bypassed when the sample is manually analyzed in the debugger. However, it can be a challenge to set up and maintain automatic systems that can handle these threats efficiently.

In our previous blog post on how IoT botnets evade detection, we discussed how malware authors commonly use the open-source Ultimate Packer for Executables (UPX) tool to protect malicious code. They are constantly innovating to make automatic unpacking more difficult by modifying them after the packing. After determining how the malicious samples have been amended, the next step is to teach our systems how to deal with these tricks and develop solutions to counter them.

In this blog, we share a tool (available on Git Hub) that can automatically fix various types of tampered UPX-packed files so that they become easily unpackable using standard UPX functionality. This first version of the tool focuses on  handling Executable and Linkable Format (ELF) files compiled for various popular Reduced Instruction Set Computer (RISC) architectures commonly used by IoT devices, namely x86, x86-64, PowerPC, ARM, and MIPS. We will then evaluate its efficacy by using a set of samples collected using our global chain of IoT honeypots.

Identifying UPX-packed Files

UPX is one of the most popular tools used to pack both legitimate products and malware, as it is open-source and supports multiple architectures and platforms. The main problem for malware creators is that the same tool also uses an unpacking code to unpack malicious executables.

Over time, attackers have realized that some modifications of certain values in the UPX-packed executables make it impossible for the UPX tool to decompress them. At the same time, these samples are still able to run flawlessly because the amended information is not used by the unpacking code. The two most common techniques that we’re currently observing are:

  1. To either corrupt internal UPX structures l_info and p_info; or
  2. use not-yet-released versions of UPX.

Both of these techniques aim to make the corresponding samples unsupported by a standard UPX unpacking tool (upx -d). You can find additional information in our previous blog.

First, we need to find a way to reliably figure out if the sample provided is actually packed with UPX. Sometimes it may be obvious if you see the “UPX” substrings in the file; by default, files packed with UPX contain the magic value UPX! as well as a version string showing the version of the UPX tool used to compress the file (Figure 1). However, this becomes much less obvious if this information is stripped. Because this data can be easily modified without affecting the unpacking functionality,  we can’t rely on that type of indicator.

UPX compressed file with its version string.
Figure 1 – UPX compressed file with its version string.

Because we can’t rely on this kind of indicator due to the data being modifiable, we decided on a different approach: a signature-based detection. For each architecture and compression algorithm, UPX adds the same code that is in charge of decompressing the original file into memory. Changing it would require the attackers to understand what this code is doing to create new variants, eventually making it much more expensive to implement and therefore not common at all.

Locating the UPX Modifications

After our signature-based script verifies that the file was indeed compressed with UPX, it needs to parse the l_info and p_info structures to evaluate if they have been modified.

Finding these structures in non-modified UPX-packed files is simple. With a naked eye, you will easily spot the offset of the l_magic member of the l_info structure storing the UPX! magic value. However, if it has been tampered with (Figure 2), the task becomes more complicated, especially if you want to have a generic fixing tool that supports any possible modification rather than hardcoding magic values that have been used by existing malware families.

MIPS ELF file with tampered UPX magic bytes.
Figure 2 – MIPS ELF file with tampered UPX magic bytes.

To find the 12 bytes l_info structure and the adjacent p_info structure in the ELF file, our script needs to find where the program headers table ends because the structures are positioned just after it. The best way to find this is by parsing the ELF header and doing some simple math with the e_phoff, e_phnum, and e_phentsize values. See the source code of the tool we have provided for more details. Once the structures have been located, the next step will be to check if their fields are corrupted or not and restore them to original values.

The l_info Structure

One of the easiest ways to make the standard UPX decompression tool deny the processing of a UPX-compressed sample is by patching its UPX! magic value. This is one of the first checks the tool makes ensure it supports the provided file. To make this kind of modification, the attackers can just look for the UPX! magic in the compressed file and replace it with their bytes of choice. The attackers don’t even need to understand the internal structures of the packed sample.

To counter this, the signature-based script will look at where the l_magic value is located and compare it against the standard UPX! magic value. If it is different, the recovery tool will restore it back to the original value.

The p_info Structure

Another group of values that is frequently patched can be found on the p_info structure. This structure contains two important values required to decompress the content: first, the original file size, which is stored in the p_filesize and second, p_blocksize fields, both of which must contain the same value. If these values have been tampered with, the sample’s functionality won’t be affected, but the UPX tool will no longer be able to unpack the sample as these values are needed to complete the process. Luckily, it’s possible to find the original file size at the end of the compressed file, where we can find the information associated with the Pack Header class. The last 36 bytes (32 bytes with the information and 4 for the UPX! magic value) will contain the original file size located at the offset 24 as we can see in the UPX source code.

Once the original file size is identified, the p_filesize and p_blocksize values can also be restored. Now, nothing prevents the standard UPX tool from successfully handling the fixed sample, making the automatic and manual unpacking extremely easy.

Efficacy Evaluation

To evaluate the efficacy of this solution, we decided to run a full test against a snapshot of the IoT botnet landscape over a predefined period (April 1 – June 1). We took all the ELF samples that we managed to intercept using our chain of IoT honeypots over the last 2 months (Figure 3) and checked how many of them can be handled with the help of our tool.

Building a List of IoT malware samples for testing.
Figure 3 – Building a List of IoT malware samples for testing.

Over the course of these 2 months, our IoT honeypots managed to collect 2,622 malware samples, of which 2,089 were ELF files. Of the 2,089 ELF files, 696 (33%) were packed with different versions of UPX (Figure 4). Interestingly enough, we saw quite similar distribution of packed vs not-packed samples at the beginning of the year during previous UPX-related research (discussed in our previous blog) where UPX was the only packer used.

diagram, packed with UPX vs. not packed
Figure 4 – Orange – 1,393 not-packed ELF files (66%); Blue – 696 UPX-packed ELF files (33%).

Here is further breakdown of the 696 packed Elf files:

  • 618 were not modified and therefore can be easily unpacked using a standard upx -d functionality.
  • 78 samples (~4%) were tampered with and impossible to unpack by standard means, 3 of which were packed with a more complex custom solution and 75 which were modified by amending the UPX structures.
  • Of the 75, 66 of them had their version information string removed. For those whose UPX version was not removed, only 2% of them were packed using UPX 4, while the rest were packed with the current UPX 3.

After applying the introduced tool, all 75 of the modified samples, with the amended UPX structures, were successfully unpacked using upx -d functionality!

Conclusion

Here are three main characteristics of the tool we have presented, which make it more advantageous than other existing tools:

  1. It is agnostic to the changes made by particular malware families and focuses on the fundamental idea behind them instead, thereby supporting many more existing and future variations,
  2. It supports more architectures,
  3. It is easily extendable.

These characteristics allow for it to be easily integrated with other automatic sample processing systems, improve the unpacking rate, and facilitate accurate detection and attribution.

At Nozomi Networks, we believe that information sharing across the cybersecurity community is a key to winning, and we are happy to contribute to our joint success by making this tool available to the public!