Usermode System Call hooking – Betabot Style
This is literally the most requested article ever, I’ve had loads of people messaging me about this (after the Betabot malware made it famous). I had initially decided not to do an article about it, because it was fairly undocumented and writing an article may have led to more people using it; However, yesterday someone linked me to a few blogs posting their implementations of the hook code (without explanation), so I’ve finally decided to go over it seeming as the code is already available. ## Win32/64 System Calls
System call is a term used to describe functions that do not execute code in usermode, instead they transfer execution to the kernel where the actual work is done. A good example of these is the native API (Ex: NtCreateFileZwCreateFile). None of the functions beginning with Nt or Zw actually do their work in usermode, they simply call into the kernel and allow the kernel mode function with the same name to do their work (ntdll!NtCreateFile calls ntoskrnl!NtCreateFile).
Before entering the kernel, all native functions execute some common code, this is known as KiFastSystemCall on 32-bit windows and WOW32Reserved under WOW64 (32-bit process on 64-bit windows). | | |—| | Native function call path in user mode under windows 32-bit |
|Native function call path in user mode under WOW64|
As is evident in both examples: Nt* functions make a call via a 32-bit pointer to KiFastSystemCall (x86) or X86SwitchTo64BitMode (WOW64). Theoretically we could just replace the pointer at SharedUserData!SystemCallStub and WOW32Reserved with a pointer to our code; However, in practice this doesn’t work.
SharedUserData is a shared page mapped into every process by the kernel, thus it’s only writable from kernel mode. On the other hand WOW32Reserved is writable from user mode, but it exists inside the thread environment block (TEB), so in order to hook it we’d have to modify the TEB for every running thread.
Because SharedUserData is non-writable, the only other place we can target is KiFastSystemCall which is 5 byte (enough space for a 32-bit jump). Sadly that actually turned out not to be the case because the last byte, 0xC3 (retn), is needed by KiFastSystemCallRet and cannot be modified, which leaves only 4 writable bytes.
The sysenter instruction is supported by all modern CPUs and is the fastest way to enter the kernel. On ancient CPUs (before sysenter was invented) an interrupt was used (int 0x2E), for compatibility it was kept in all subsequent versions of windows. | | |—| | The now obsolete KiIntSystemCall |
Here you can see, KiIntSystemCall has a glorious 7 writable bytes (enough space for a 32-bit jump and some) it’s also within short jump range of KiFastSystemCall. As you’ve probably guessed by now, we can do a 2 byte short jump from KiFastSystemCall to KiIntSystemCall and then a 32-bit jump from within KiIntSystemCall to our hook procedure.
Now, what if something calls KiIntSystemCall? Well, it’s unlikely but we can handle that too: The rule for the direction flag on windows is that it should always be cleared after a call (that is, a function should never assume it to still be set after making a call). We could use the first byte of KiIntSystemCall for STD (set direction flag), then use the first byte of KiFastSystemCall for CLD (clear direction flag) followed by a jump to KiIntSystemCall+1, that way our hook procedure can use the direction flag to see which calls came from which function.
This is a lot simpler, either we can keep track of every thread and hook WOW32Reserved in each thread’s environment block (i think this is what betabot does), or we simply overwrite X86SwitchTo64BitMode which is 7 bytes, writable from user mode, and pointed to by the WOW32Reserve field of every thread’s environment block.
Most people who write hooks are used to redirecting one function to another; however, because both of these hooks are placed on common code: every single native function will call the hook procedure. Obviously we’re going to need a way to tell NtCreateFile calls from NtCreateProcess and so on, or the process is just going to crash and burn.
If we dissemble the first 5 bytes of any native function it will always be “mov eax, XX”, this value is the ordinal of the function within the System Service Dispatch Table (SSDT). Once the call enters the kernel, a function will use this number to identify which entry in the SSDT to call, then call it (meaning each function has a unique number). When our hook in called, the SSDT ordinal will still be in the eax register, all we need to do is gather the SSDT ordinals for all the functions we need (by disassembling the first 5 bytes), then we can compare the number in eax with the ordinal for the function we wish to intercept calls for: if it’s equal we process the call, if not we just call the original code.
Comparing the function ordinal with the one we want to hook could be messy, especially if we’re hooking multiple functions.
cmp eax, [ntcreatefile_ordinal] je ntcreatefile_hook cmp eax, [ntcreateprocess_ordinal] je ntcreateprocess_hook […] jmp original_codeThis code is going to get very long and inefficient the more functions are hooked (because every kernel call is passing through this code, the system could slow down), but there’s a better way.
We can build an array of DWORDs in memory (assuming we just want to hook NtCreateFile & NtCreateProcess, let’s say the NtCreateFile ordinal is 0x02 and NtCreateProcess ordinal is 0x04), the array would look like this:
my_array+0x00 = (DWORD)NULL
my_array+0x04 = (DWORD)NULL
my_array+0x08 = (DWORD)ntcreatefile_hook_address
my_array+0x0C = (DWORD)NULL
my_array+0x10 = (DWORD)ntcreateprocess_hook_address
Then we could do something as simple as:
lea ecx, [my_array]
lea edx, [4*eax+ecx] ;edx will be &my_array[eax]
cmp [edx], 0
call [edx] ;call the address pointed to by edx
This is pretty much what the kernel code for calling the SSDT function by its ordinal would do.
Calling Original Code
As with regular hooking, we just need to store the original code before we hook it. The only difference here is as well as pushing the parameters and calling the original code, the function’s ordinal will need to be moved into the eax register.