Entropy and its Significance in Malware Development

The concept of entropy originates from physics, more specifically classical thermodynamics. This very concept in itssef is highly complex. However, as we do not need a thorough understanding of the physical concept, you can think of entropy as disorder, randomnes or unpredictability in an information set.

In computer sience entropy denotes the predictability of a binary executable.

This is usually measured through Shannon Entropy. Shannon Entropy is a formula one can use to measure the randomness and unpredictability of data in a file. It is calculated based on the propability distribution of byte values (0-255).

If said binary contains a lot of strings, i.e. english words encoded in ASCII, the resulting binary entropy will be low, as they are more predictable, due to the alphabet only containing 26 base letters and due to some appearing more frequent than others. However, if code within a binary is encrypted, for example if it is packed or the payload has been ecrypted, it is ‘unpredictable’ and subsequently increases the entropy.

This niche field is not applicable and signifiant for normal software developers, however, this is not the case within Malware Development.

In MalDev, the payload delivered to a target system often comes in the format of Shellcode. The arising issue with using Shellcode is, that if it is stored plainly within a binary, the analysis of said executable will subsequently result in the shellcode being discovered and analysed, which will make further execution using the same payload impossible, as it will get recognized. This is known as ‘Static Detection’, a method where Anti Virus and EDR vendors statically analyze binaries, searching for known signatures or strings.

A common and easy way to circumvent these mechanisms is the usage of encryption. By encrypting the payload and decrypting it at runtime, malware can prevent its shellcode from being analyzed, to to the encrypted payload being unknown and not recognizable as instructions by an AV or EDR system. However, this results in the binaries entropy increasing a significant amount. In addition, high entropy is sometimes considered an Indicator of Compromise (IOC), and will result in he binary getting detected, even if the shellcode is enrypted.

For our experiment, we will use metasploit to generate a stageless meterpreter payload. Afterwards we will encrypt said payload using the RC4 algorithm. However, it should be noted that any encryption algorithm should suffice. Afterwards we can use our encrypted shellcode and include it within a shellcode loader.

unsigned char encrypted[] = {
  0xFD, 0x0A, 0x2A, 0x96, 0x52, 0x23, 0x6C, 0xB5, 0xBB, 0x66, 0x15, 0x57,...
}

Using the tool DetectItEasy, we can analyze the entropy of our executable and each of its sections.

 Entropy of a shellcode loader containing an encrypted payload

  1. Entropy of a shellcode loader containing an encrypted payload

As shown in the screenshot, the entropy of our binary is unsurprisingly high, due the whole 256 kilobytes of its payload being encrypted. The analyis program even marks the program as potentially packed, due to the high amount of randomness within the executable.

In the next test we will employ a different method. Assuming we implement an algorithm which encodes every byte into ASCII, the resulting binary should be larger, but have a significantly lower entropy. To test this, the payload was first encrypted with the same algorithm and then encoded into short, legitimate english words, resulting in:

const char* encrypted_encoded = "an as hay fat bun eye den amp..."

Again, we include the encrypted+encoded shellcode within our loader and compile the program.

 Entropy of a shellcode loader containing an encrypted and encoded payload

  1. Entropy of a shellcode loader containing an encrypted and encoded payload

Analyzing the binary, as seen in the second screenshot, we notice, that the binaries entropy has almost been cut in half, indicating the correctness of our assumption. Utilizing this technique, we can now be sure, that our binary will not get detected, because of its shellcode. As in my last blogpost this is not knew knowledge. However, there is limited coverage on this technique, necessitating this blogpost. I will try to make my PoC for this public in the future.

Until then, Happy Hacking!