Code Mutation (Polymorphism)
Before we start it’s probably best to explain some things:Signature – A pattern of bytes used by an antivirus to identify malicious executables, this could be a string, parts of a function, or a hash.Crypting – This is the most common way of evading antivirus detections, it works by encrypting the malicious executable so the antivirus cannot match the malicious code to existing signatures.Payload – The malicious executable which is encrypted to evade detections, this is attached to the stub in some way (stored as a resource, added after then end of file, appended to a new or existing section).Stub – A simple program responsible for decrypting the payload and executing it in memory. Due to the payload being encrypted, antiviruses will attempt to generate signatures to match the stub’s code, but because the stub is small and simple it can be easily modified to evade existing signatures. ##
Polymorphism is a solution to a problem mainly found with worms/botnet: When an AV adds a new signature that detects the malicious executable, the infected file will be quarantined, leaving the malware running in memory until reboot. If a botmaster is running a botnet with thousands of bots, each time the stub is detected he’s likely to lose a few hundred bots, his only choice: To keep updating the bots with a new stub before the previous one is detected (which for large botnets can be every few hours), leaving the hacker with very little free time. A solution to this would be to write malware capable of programmatically generating a unique stub and replacing the old one on execution, resulting in each computer having a different stub; this is know as polymorphism.
there’s a few ways to programmatically create unique code that performs the same function as the previous.
Block Mutation A lot of assembly instructions can be freely movable, whilst some cannot. An instruction using a relative address (such as a jump or call), when moved will point to a different location, breaking the code; freely movable instructions such as those using absolute addresses or only registers can be moved anywhere.
Block based polymoprphism works by breaking the code down into small blocks, which are then numbered; the number specifies the order in which they execute and the block is either marked as movable or immovable based on its containing instruction. The mutation engine can then reorder, relocate, or separate the movable block; using jumps or similar instructions to link them together so that they execute in the correct order. Junk code (random instructions which are never actually executed) can also be added between blocks to add more entropy and change the executable size.
Register Swapping It’s possible to write the code in such a way that registers can easily be switched out, for instance all occurrences of edx within a function could be replaced with ecx, changing a lot of bytes within the application. The only problem with this approach is there’s only a few usable registers, making it easy to exhaust all possible combinations, and it’s still possible to generate signatures based on the layout of the instructions.
Internal Assembler + Intermediate Language A very effective approach is to embed an assembler within the payload, as well as create an intermediate language (IL) which the polymorphic engine uses to create ASM on the fly. A simple example would be the following IL code.
pmov Reg1, 5 add eax, Reg1
In this example instructions prefixed with p will be mutated at an instruction level, whilst those without a prefix will just be assigned a register and compiled as ASM. The IL engine would then use a seed to randomly generate the p-prefixed instructions by picking an instruction, or group of instructions, to perform the operation, as well as assign a register to Reg1 and Reg2.
The array of instructions to handle pmov would look something like this:
- push val
- mov reg, val
- xor reg, reg
- add reg, val
Once the engine has picked which instruction it wishes to use, it would then fill in the register and value, then compile it to ASM. Here are some examples of final outputs.
- push 5
add eax, edx
- mov ecx, 5
add eax, ecx
- xor ebx, ebx
add ebx, 5
add eax, ebx
By using an IL, we avoid having to first disassemble the stub code before mutating it.
Today advanced metamorphic malware which can efficiently evade signature detection is nearly impossible, but back in the days of DOS / 95 / 98 viruses, it has been achieved multiple times. The idea of metamorphism is to take polymorphism a step further and instead of encrypting the malicious executable and mutating the stub, the entire malicious executable is mutated, including the code required to perform the mutation.
Malware that is required to create a new, unique copy of itself on every propagation is also required to disassemble previously mutated code and regulate size (because instructions can be mutated into multiple instructions, it’s important to be able to do the opposite or the executable grows almost exponentially with every mutation). Due to the amount of consideration and effort that would have to go into creating modern metamorphic malware, most programmers opt to use polymorphism instead, as this allows them to generate output from a temporary representation. A simple mistake during disassembling could result in the executable ceasing to work, and it’s a lot harder to debug and test metamorphism in large applications.