September 2024 – Pavel Yosifovich

C++ Programming Masterclass Live Training

Today I’m happy to announce a new 5-day live training to take place in December, “C++ Programming Masterclass”. This is remote training presented live, where the sessions are recorded for later viewing if you miss any.

The course is scheduled for the beginning of December with 10 half-days spanning two and a half weeks.

The full syllabus, who should attend, with the exact dates and times can be found here: C++ Programming Masterclass.

Here is a quick summary of the course modules:

Module 1: Introduction to Modern C++
Module 2: C++ Fundamentals
Module 3: Standard Library Basics
Module 4: Object Oriented Design and Programming
Module 5: Modern Language Features and Libraries (Part 1)
Module 6: Templates
Module 7: Modern Language Features and Libraries (Part 2)
Module 8: The C++ Standard Library
Module 9: Concurrency in Modern C++

The course is currently heavily discounted, with the a price increase after November 1st. Register here: C++ Programming Masterclass

If you have any questions, feel free to get in touch via email (zodiacon@live.com), Linkedin, or X (@zodiacon)

Improving Kernel Object Type Implementation (Part 4)

In part 3 we implemented the bulk of what makes a DataStack – push, pop and clear operations. We noted a few remaining deficiencies that need to be taken care of. Let’s begin.

Object Destruction

A DataStack object is deallocated when the last reference to it removed (typically all handles are closed). Any other cleanup must be done explicitly. The DeleteProcedure member of the OBJECT_TYPE_INITIALIZER is an optional callback we can set to be called just before the structure is freed:

init.DeleteProcedure = OnDataStackDelete;

The callback is simple – it’s called with the object about to be destroyed. We can use the cleanup support from part 3 to free the dynamic state of the stacked items:

void OnDataStackDelete(_In_ PVOID Object) {
	auto ds = (DataStack*)Object;
	DsClearDataStack(ds);
}

Querying Information

The native API provides many functions starting with NtQueryInformation* with an object type like process, thread, file, etc. We’ll add a similar function for querying information about DataStack objects. A few declarations are in order, mimicking similar declarations used by other query APIs:

typedef struct _DATA_STACK_CONFIGURATION {
	ULONG MaxItemSize;
	ULONG MaxItemCount;
	ULONG_PTR MaxSize;
} DATA_STACK_CONFIGURATION;

typedef enum _DataStackInformationClass {
	DataStackItemCount,
	DataStackTotalSize,
	DataStackConfiguration,
} DataStackInformationClass;

The query API itself mimics all the other Query APIs in the native API:

NTSTATUS NTAPI NtQueryInformationDataStack(
	_In_ HANDLE DataStackHandle,
	_In_ DataStackInformationClass InformationClass,
	_Out_ PVOID Buffer,
	_In_ ULONG BufferSize,
	_Out_opt_ PULONG ReturnLength);

The implementation (in kernel mode) is not complicated, just verbose. As with other APIs, we’ll start by getting the object itself from the handle, asking for DATA_STACK_QUERY access mask:

NTSTATUS NTAPI NtQueryInformationDataStack(_In_ HANDLE DataStackHandle, 
    _In_ DataStackInformationClass InformationClass, 
    _Out_ PVOID Buffer, _In_ ULONG BufferSize, 
    _Out_opt_ PULONG ReturnLength) {
	DataStack* ds;
	auto status = ObReferenceObjectByHandleWithTag(DataStackHandle,
        DATA_STACK_QUERY, g_DataStackType,
		ExGetPreviousMode(), DataStackTag, (PVOID*)&ds, nullptr);
	if (!NT_SUCCESS(status))
		return status;

Next, we check parameters:

// if no buffer provided then ReturnLength must be 
// non-NULL and buffer size must be zero
//
if (!ARGUMENT_PRESENT(Buffer) && (!ARGUMENT_PRESENT(ReturnLength) || BufferSize != 0))
	return STATUS_INVALID_PARAMETER;
//
// if buffer provided, then size must be non-zero
//
if (ARGUMENT_PRESENT(Buffer) && BufferSize == 0)
	return STATUS_INVALID_PARAMETER;

The rest is pretty standard. Let’s look at one information class:

ULONG len = 0;
switch (InformationClass) {
	case DataStackItemCount: 
        len = sizeof(ULONG); break;
	case DataStackTotalSize: 
        len = sizeof(ULONG_PTR); break;
	case DataStackConfiguration: 
        len = sizeof(DATA_STACK_CONFIGURATION); break;
	default: 
        return STATUS_INVALID_INFO_CLASS;
}

if (BufferSize < len) {
	status = STATUS_BUFFER_TOO_SMALL;
}
else {
	if (ExGetPreviousMode() != KernelMode) {
		__try {
			if (ARGUMENT_PRESENT(Buffer))
				ProbeForWrite(Buffer, BufferSize, 1);
			if (ARGUMENT_PRESENT(ReturnLength))
				ProbeForWrite(ReturnLength, sizeof(ULONG), 1);
		}
		__except (EXCEPTION_EXECUTE_HANDLER) {
			return GetExceptionCode();
		}
	}

	switch (InformationClass) {
		case DataStackItemCount:
		{
			ExAcquireFastMutex(&ds->Lock);
			auto count = ds->Count;
			ExReleaseFastMutex(&ds->Lock);

			if (ExGetPreviousMode() != KernelMode) {
				__try {
					*(ULONG*)Buffer = count;
				}
				__except (EXCEPTION_EXECUTE_HANDLER) {
					return GetExceptionCode();
				}
			}
			else {
				*(ULONG*)Buffer = count;
			}
			break;
		}
//...
//
// set returned bytes if requested
//
if (ARGUMENT_PRESENT(ReturnLength)) {
	if (ExGetPreviousMode() != KernelMode) {
		__try {
			*ReturnLength = len;
		}
		__except (EXCEPTION_EXECUTE_HANDLER) {
			return GetExceptionCode();
		}
	}
	else {
		*ReturnLength = len;
	}
}

ObDereferenceObjectWithTag(ds, DataStackTag);
return status;

You can find the other information classes implemented in the source code in a similar fashion.

To round it up, we’ll add Win32-like APIs that call the native APIs. The Native APIs call the driver in a similar way as the other native API user-mode implementations.

BOOL WINAPI GetDataStackSize(HANDLE hDataStack, ULONG_PTR* pSize) {
	auto status = NtQueryInformationDataStack(hDataStack, 
        DataStackTotalSize, pSize, sizeof(ULONG_PTR), nullptr);
	if (!NT_SUCCESS(status))
		SetLastError(RtlNtStatusToDosError(status));
	return NT_SUCCESS(status);
}

BOOL WINAPI GetDataStackItemCount(HANDLE hDataStack, ULONG* pCount) {
	auto status = NtQueryInformationDataStack(hDataStack,
        DataStackItemCount, pCount, sizeof(ULONG), nullptr);
	if (!NT_SUCCESS(status))
		SetLastError(RtlNtStatusToDosError(status));
	return NT_SUCCESS(status);
}

BOOL WINAPI GetDataStackConfig(HANDLE hDataStack, DATA_STACK_CONFIG* pConfig) {
	auto status = NtQueryInformationDataStack(hDataStack,
        DataStackConfiguration, pConfig, 
        sizeof(DATA_STACK_CONFIG), nullptr);
	if (!NT_SUCCESS(status))
		SetLastError(RtlNtStatusToDosError(status));
	return NT_SUCCESS(status);
}

Waitable Objects

Waitable objects, also called Dispatcher objects, maintain a state called Signaled or Non-Signaled, where the meaning of “signaled” depends on the object type. For example, process objects are signaled when terminated. Same for thread objects. Job objects are signaled when all processes in the job terminate. And so on.

Waitable objects can be waited on with WaitForSingleObject / WaitForMultipleObjects and friends in the Windows API, which call native APIs like NtWaitForSingleObject / NtWaitForMultipleObjects, which eventually get to the kernel and call ObWaitForSingleObject / ObWaitForMultipleObjects which finally invoke KeWaitForSingleObject / KeWaitForMultipleObjects (both documented in the WDK).

It would be nice if DataStack objects would be dispatcher objects, where “signaled” would mean the data stack is not empty, and vice-versa. The first thing to do is make sure that the SYNCHRONIZE access mask is valid for the object type. This is the default, so nothing special to do here. GENERIC_READ also adds SYNCHRONIZE for convenience.

In order to be a dispatcher object, the structure managing the object must start with a DISPATCHER_HEADER structure (which is provided by the WDK headers). For example, KPROCESS and KTHREAD start with DISPATCHER_HEADER. Same for all other dispatcher objects – well, almost. If we look at an EJOB (using symbols), we’ll see the following:

kd> dt nt!_EJOB
   +0x000 Event            : _KEVENT
   +0x018 JobLinks         : _LIST_ENTRY
   +0x028 ProcessListHead  : _LIST_ENTRY
   +0x038 JobLock          : _ERESOURCE
...

The DISPATCHER_HEADER is in the KEVENT. In fact, a KEVENT is just a glorified DISPATCHER_HEADER:

typedef struct _KEVENT {
    DISPATCHER_HEADER Header;
} KEVENT, *PKEVENT, *PRKEVENT;

The advantage of using a KEVENT is that the event API is available – this is taken advantage of by the Job implementation. For processes and threads, the work of signaling is done internally by the Process and Thread APIs.

For the DataStack implementation, we’ll take the Job approach, as the scheduler APIs are internal and undocumented. The DataStack now looks like this:

struct DataStack {
	KEVENT Event;
	LIST_ENTRY Head;
	FAST_MUTEX Lock;
	ULONG Count;
	ULONG MaxItemCount;
	ULONG_PTR Size;
	ULONG MaxItemSize;
	ULONG_PTR MaxSize;
};

In addition, we have to initialize the event as well as the other members:

void DsInitializeDataStack(DataStack* DataStack, ...) {
//...
	KeInitializeEvent(&DataStack->Event, NotificationEvent, FALSE);
//...
}

The event is initialized as a Notification Event (Manual Reset in user mode terminology). Why? This is just a choice. We could extend the DataStack creation API to allow choosing Notification (manual reset) vs. Synchronization (auto reset) – I’ll leave that for interested coder.

Next, we need to set or reset the event when appropriate. It starts in the non-signaled state (the FALSE in KeInitializeEvent), since the data stack starts empty. In the implementation of DsPushDataStack we signal the event if the count is incremented from zero to 1:

NTSTATUS DsPushDataStack(DataStack* ds, PVOID Item, ULONG ItemSize) {
//...
	if (NT_SUCCESS(status)) {
		InsertTailList(&ds->Head, &buffer->Link);
		ds->Count++;
		ds->Size += ItemSize;
		if(ds->Count == 1)
			KeSetEvent(&ds->Event, EVENT_INCREMENT, FALSE);
	}
//...

In the pop implementation, we clear (reset) the event if the item count drops to zero:

NTSTATUS DsPopDataStack(DataStack* ds, PVOID buffer, ULONG inputSize, ULONG* itemSize) {
//...
memcpy(buffer, item->Data, item->Size);
ds->Count--;
ds->Size -= item->Size;
ExFreePool(item);
if (ds->Count == 0)
	KeClearEvent(&ds->Event);
return STATUS_SUCCESS;
//...

These operations are performed under the protection of the fast mutex, of course.

Testing

Here is one way to amend the test application to use WaitForSingleObject:

// wait 5 seconds at most for data to appear
while (WaitForSingleObject(h, 5000) == WAIT_OBJECT_0) {
	DWORD size = sizeof(buffer);
	if (!PopDataStack(h, buffer, &size) && GetLastError() != ERROR_NO_DATA) {
		printf("Error in PopDataStack (%u)\n", GetLastError());
		break;
	}
//...
	DWORD count;
	DWORD_PTR total;
	if (GetDataStackItemCount(h, &count) && GetDataStackSize(h, &total))
		printf("Data stack Item count: %u Size: %zu\n", count, total);
}

Refer to the project source code for the full sample.

Summary

This four-part series demonstrated creating a new kernel object type and using only exported functions to implement it. I hope this sheds more light on certain mechanisms used by the Windows kernel.

Let’s Get Stacking! (Part 3)

In the first part we looked at creating a new kernel object type. In the second part we implemented creation of new DataStack objects and opening existing objects by name. In this part, we’ll implement the main functionality of a DataStack, that makes a DataStack what it is.

Before we get on with that, there is one correction I must make. A good comment from Luke (@lukethezoid) on X said that although the code returns a 32-bit HANDLE to 32-bit callers, it’s not nearly enough because all pointers passed using NtDeviceIoControlFile/DeviceIoControl would be wrong – 32-bit pointers would be passed to the 64-bit kernel and that would cause a crash unless we do some work. For example, a UNICODE_STRING provided by a 32-bit client has a different binary layout than a 64-bit one. Wow64 processes (32-bit x86 running on x64) have two versions of NtDll.Dll in their address space. One is the “native” 64-bit, and the other is a special 32-bit variant. I say “special” because it’s not the same NtDll.Dll you would find on a true 32-bit system. This is because this special DLL knows it’s on a 64-bit system, and provides conversions for parameters (pointer and structures) before invoking the “real” 64-bit DLL API. Here is a snapshot from Process Explorer showing a 32-bit process with two NtDll.Dll files – the native 64-bit loaded into a high address, while the other loaded into a low address (within the first 4GB of address space):

The changes required to support Wow64 processes are not difficult to make, but not too interesting, either, so I won’t be implementing them. Instead, 32-bit clients will be blocked from using DataStacks. We should block on the kernel side for sure, and also in user mode to fail faster. In the kernel, we could use something like this in IRP_MJ_CREATE or IRP_MJ_DEVICE_CONTROL handles:

if (IoIs32bitProcess(Irp)) {
	status = STATUS_NOT_IMPLEMENTED;
    // complete the request...
}

In user mode, we can prevent DataStack.Dll from loading in the first place in Wow64 processes:

BOOL APIENTRY DllMain(HMODULE hModule, DWORD reason, LPVOID) {
	switch (reason) {
		case DLL_PROCESS_ATTACH:
            // C++ 17
			if (BOOL wow; IsWow64Process(GetCurrentProcess(), &wow) && wow)
				return FALSE;
//...

Note, however, that on true 32-bit systems everything should work just fine, as user mode and kernel mode are bitness-aligned.

Implementing the Actual Data Stack

Now we’re ready to focus on implementing the stack functionality – push, pop, and clear.

We’ll start from user mode, and gradually move to kernel mode. First, we want nice APIs for clients to use that have “Win32” conventions like so:

BOOL WINAPI PushDataStack(_In_ HANDLE hDataStack, _In_ const PVOID buffer, _In_ DWORD size);
BOOL WINAPI PopDataStack(_In_ HANDLE hDataStack, _Out_ PVOID buffer, _Inout_ DWORD* size);
BOOL WINAPI ClearDataStack(_In_ HANDLE hDataStack);

The APIs return BOOL to indicate success/failure, and GetLastError could be used to get additional information in case of an error. Let’s start with push:

BOOL WINAPI PushDataStack(HANDLE hDataStack, const PVOID buffer, DWORD size) {
	auto status = NtPushDataStack(hDataStack, buffer, size);
	if (!NT_SUCCESS(status))
		SetLastError(RtlNtStatusToDosError(status));

	return NT_SUCCESS(status);
}

Nothing to it. Call NtPushDataStack and update the last error is needed. NtPushDataStack just packs the arguments in a helper structure and sends to the driver, just like we did with CreateDataStack and OpenDataStack:

NTSTATUS NTAPI NtPushDataStack(_In_ HANDLE DataStackHandle, 
    _In_ const PVOID Item, _In_ ULONG ItemSize) {
	DataStackPush data;
	data.DataStackHandle = DataStackHandle;
	data.Buffer = Item;
	data.Size = ItemSize;

	IO_STATUS_BLOCK ioStatus;
	return NtDeviceIoControlFile(g_hDevice, nullptr, nullptr, nullptr, &ioStatus,
		IOCTL_DATASTACK_PUSH, &data, sizeof(data), nullptr, 0);
}

Now we switch to the kernel side. The DeviceIoControl handler just forwards the arguments to the “real” NtPushDataStack:

case IOCTL_DATASTACK_PUSH:
{
	auto data = (DataStackPush*)Irp->AssociatedIrp.SystemBuffer;
	if (dic.InputBufferLength < sizeof(*data)) {
		status = STATUS_BUFFER_TOO_SMALL;
		break;
	}
	status = NtPushDataStack(data->DataStackHandle, data->Buffer, data->Size);
	break;
}

Now we’re getting to the interesting parts. First, we need to check if the arguments make sense:

NTSTATUS NTAPI NtPushDataStack(HANDLE DataStackHandle, 
    const PVOID Item, ULONG ItemSize) {
	if (ItemSize == 0)
		return STATUS_INVALID_PARAMETER_3;

	if (!ARGUMENT_PRESENT(Item))
		return STATUS_INVALID_PARAMETER_2;

If the pushed data size is zero or the pointer to the data is NULL, then it’s an error. The ARGUMENT_PRESENT macro returns true if the given pointer is not NULL. Notice that we can return specific “invalid parameter” error based on the parameter index. Unfortunately, user mode callers always get a generic ERROR_INVALID_PARAMETER regardless. But at least kernel callers can benefit from the extra detail.

Next, we need to get to the DataStack object corresponding to the given handle; in fact, the handle could be bad, or not point to a DataStack object at all. This is a job for ObReferenceObjectByHandle or one of its variants:

DataStack* ds;
auto status = ObReferenceObjectByHandleWithTag(DataStackHandle,
    DATA_STACK_PUSH, g_DataStackType, ExGetPreviousMode(), 
    DataStackTag, (PVOID*)&ds, nullptr);
if (!NT_SUCCESS(status))
	return status;

ObReferenceObjectByHandleWithTag attempts to retrieve the object pointer given a handle, a type object, and access. The added tag provides a simple way to “track” the caller taking the reference. I’ve defined DataStackTag to be used for dynamic memory allocations as well (we’ll do very soon):

const ULONG DataStackTag = 'ktsD';

It’s the same kind of tag typically provided to allocation functions. You can read the tag from right to left (that’s how it would be displayed in tools as we’ll see later) – “Dstk”, kind of short for “Data Stack”. The function also checks if the handle has the required access mask (DATA_STACK_PUSH in this case), and will fail if the handle is not powerful enough. ExGetPreviousMode is provided to the function to indicate who is the original caller – for kernel callers, any access will be granted (the access mask is not really used).

With the object in hand, we can do the work, delegated to a separate function, and then not to forget to dereference the object or it will leak:

status = DsPushDataStack(ds, Item, ItemSize);
ObDereferenceObjectWithTag(ds, DataStackTag);

return status;

Technically, we don’t have to use a separate function, but it mimics how the kernel works for complex objects: there is another layer of implementation that works with the object directly (no more handles involved).

The DataStack structure mentioned in the previous parts holds the data for the implementation, and will be treated as a classic C structure – no member functions just to mimic how the Windows kernel is implemented – almost everything in C, not C++:

struct DataStack {
	LIST_ENTRY Head;
	FAST_MUTEX Lock;
	ULONG Count;
	ULONG MaxItemCount;
	ULONG_PTR Size;
	ULONG MaxItemSize;
	ULONG_PTR MaxSize;
};

struct DataBlock {
	LIST_ENTRY Link;
	ULONG Size;
	UCHAR Data[1];
};

The “stack” will be managed by a linked list (the classic LIST_ENTRY and friends) implementation provided by the kernel. We’ll push by adding to the tail and pop by removing from the tail. (Clearly, it would be just as easy to implement a queue, rather than, or in addition to, a stack.) We need a lock to prevent corruption of the list, and a fast mutex will do the job. Count stores the number of items currently on the data stack, and Size has the total size in bytes. MaxSize and MaxItemCount are initialized when the DataStack is created and provide some control over the limits of the data stack. Values of zero indicate no special limit.

The second structure, DataBlock is the one that holds the actual data, along with its size, and of course a link to the list. Data[1] is just a placeholder, where the data is going to be copied to, assuming we allocate the correct size.

We’ll start the push implementation in an optimistic manner, and allocate the DataBlock structure with the required size based on the data provided:

NTSTATUS DsPushDataStack(DataStack* ds, PVOID Item, ULONG ItemSize) {
	auto buffer = (DataBlock*)ExAllocatePool2(POOL_FLAG_PAGED | POOL_FLAG_UNINITIALIZED, 
		ItemSize + sizeof(DataBlock), DataStackTag);
	if (buffer == nullptr)
		return STATUS_INSUFFICIENT_RESOURCES;

We use the (relatively) new ExAllocatePool2 API to allocate the memory block (ExAllocatePoolWithTag is deprecated from Windows version 2004, but you can use it with an old-enough WDK or if you turn off the deprecation warning). We allocate the buffer uninitialized, as we’ll copy the data to it very soon, so no need for zeroed buffer. Technically, we allocate one extra byte beyond what we need, but that’s not a big deal. Now we can copy the data from the client’s provided buffer, being careful to probe user-mode buffers under exception protection:

auto status = STATUS_SUCCESS;
if (ExGetPreviousMode() != KernelMode) {
	__try {
		ProbeForRead(Item, ItemSize, 1);
		memcpy(buffer->Data, Item, ItemSize);
	}
	__except (EXCEPTION_EXECUTE_HANDLER) {
		ExFreePool(buffer);
		return GetExceptionCode();
	}
}
else {
	memcpy(buffer->Data, Item, ItemSize);
}
buffer->Size = ItemSize;

If an exception occurs because of a bad user-mode buffer, we can return the exception code and that’s it. Note that ProbeForRead does not work for kernel addresses, and cannot prevent crashes – this is intentional. Only user-mode buffers are out of control from the kernel’s side.

Now we can add the item to the stack if the limits are not violated, while updating the DataStack’s stats:

ExAcquireFastMutex(&ds->Lock);
do {
	if (ds->MaxItemCount == ds->Count) {
		status = STATUS_NO_MORE_ENTRIES;
		break;
	}

	if (ds->MaxItemSize && ItemSize > ds->MaxItemSize) {
		status = STATUS_NOT_CAPABLE;
		break;
	}

	if (ds->MaxSize && ds->Size + ItemSize > ds->MaxSize) {
		status = STATUS_NOT_CAPABLE;
		break;
	}
} while (false);

if (NT_SUCCESS(status)) {
	InsertTailList(&ds->Head, &buffer->Link);
	ds->Count++;
	ds->Size += ItemSize;
}
ExReleaseFastMutex(&ds->Lock);

if (!NT_SUCCESS(status))
	ExFreePool(buffer);

return status;

First, we acquire the fast mutex to prevent data races. I opted not to use any C++ RAII type here to make things as clear as possible – we have to be careful not to return before releasing the fast mutex. Next, the do/while non-loop is used to check if any setting is violated, in which case the status is set to some failure. The status values I chose may not look perfect – the right thing to do is create new NTSTATUS values that would be specific for DataStacks, but I was too lazy. The interested reader/coder is welcome to do it right.

Inserting the item involves calling InsertTailList, and then just updating the item count and total byte size. If anything fails, we are careful to free the buffer to prevent a memory leak. This is for push.

Popping Items

The pop operation works along similar lines. In this case, the client asks to pop an item but needs to provide a large-enough buffer to store the data. We’ll use an additional size pointer argument, that on input indicates the buffer’s size, and on output indicates the actual item size. First, the “Win32” API:

BOOL WINAPI PopDataStack(HANDLE hDataStack, PVOID buffer, DWORD* size) {
	auto status = NtPopDataStack(hDataStack, buffer, size);
	if (!NT_SUCCESS(status))
		SetLastError(RtlNtStatusToDosError(status));

	return NT_SUCCESS(status);
}

Just delegating the work to the native API, which forwards to the kernel with a helper structure:

NTSTATUS NTAPI NtPopDataStack(_In_ HANDLE DataStackHandle, _In_ PVOID Buffer, _Inout_ PULONG ItemSize) {
	DataStackPop data;
	data.DataStackHandle = DataStackHandle;
	data.Buffer = Buffer;
	data.Size = ItemSize;

	IO_STATUS_BLOCK ioStatus;
	return NtDeviceIoControlFile(g_hDevice, nullptr, nullptr, nullptr, &ioStatus,
		IOCTL_DATASTACK_POP, &data, sizeof(data), nullptr, 0);
}

This should be expected by now. On the kernel side, things are more interesting. First, get the object based on the handle, then send it to the lower-layer function if successful:

NTSTATUS NTAPI NtPopDataStack(HANDLE DataStackHandle, 
    PVOID Buffer, PULONG BufferSize) {
	if (!ARGUMENT_PRESENT(BufferSize))
		return STATUS_INVALID_PARAMETER_3;

	ULONG size;
	if (ExGetPreviousMode() != KernelMode) {
		__try {
			ProbeForRead(BufferSize, sizeof(ULONG), 1);
			size = *BufferSize;
		}
		__except (EXCEPTION_EXECUTE_HANDLER) {
			return GetExceptionCode();
		}
	}
	else {
		size = *BufferSize;
	}

	if (!ARGUMENT_PRESENT(Buffer) && size != 0)
		return STATUS_INVALID_PARAMETER_2;

	DataStack* ds;
	auto status = ObReferenceObjectByHandleWithTag(DataStackHandle,
        DATA_STACK_POP, g_DataStackType, ExGetPreviousMode(), 
        DataStackTag, (PVOID*)&ds, nullptr);
	if (!NT_SUCCESS(status))
		return status;

	status = DsPopDataStack(ds, Buffer, size, BufferSize);
	ObDereferenceObjectWithTag(ds, DataStackTag);
	return status;
}

The input buffer size is extracted, being careful to probe the user mode pointer. The real work is done in DsPopDataStack. First, take the lock. Second, see if the data stack is empty – if so, no pop operation possible. If the input size is zero, return the size of the top element:

NTSTATUS DsPopDataStack(DataStack* ds, PVOID buffer, 
    ULONG inputSize, ULONG* itemSize) {
	ExAcquireFastMutex(&ds->Lock);
	__try {
		if (inputSize == 0) {
			//
			// return size of next item
			//			
			__try {
				if (ds->Count == 0) {
					//
					// stack empty
					//
					*itemSize = 0;
				}
				else {
					auto top = CONTAINING_RECORD(ds->Head.Blink, DataBlock, Link);
					*itemSize = top->Size;
				}
				return STATUS_SUCCESS;
			}
			__except (EXCEPTION_EXECUTE_HANDLER) {
				return GetExceptionCode();
			}
		}

The locking here works differently than the push implementation by using a __finally block, which is the one releasing the fast mutex. This ensures that no matter how we leave the __try block, the lock will be released for sure.

The CONTAINING_RECORD macro is used correctly to get to the item from the link (LIST_ENTRY). Technically, in this case we could just make a simple cast, as the LIST_ENTRY member is the first in a DataBlock. Notice how we get to the top item: Head.Blink, which points to the tail (last) item.

If the data stack is empty, we place zero in the item size pointer and return an error (abusing yet another existing error):

if (ds->Count == 0) {
	__try {
		*itemSize = 0;
	}
	__except (EXCEPTION_EXECUTE_HANDLER) {
		return GetExceptionCode();
	}
	return STATUS_PIPE_EMPTY;
}

If manage to get beyond this point, then there is an item, and we need to remove it, copy the data to the client’s buffer (if it’s big enough), and free the kernel’s copy of the buffer:

	auto link = RemoveTailList(&ds->Head);
	NT_ASSERT(link != &ds->Head);
	
	auto item = CONTAINING_RECORD(link, DataBlock, Link);
	__try {
		*itemSize = item->Size;
		if (inputSize < item->Size) {
			//
			// buffer too small
			// reinsert item
			//
			InsertTailList(&ds->Head, link);
			return STATUS_BUFFER_TOO_SMALL;
		}
		else {
			memcpy(buffer, item->Data, item->Size);
			ds->Count--;
			ds->Size -= item->Size;
			ExFreePool(item);
			return STATUS_SUCCESS;
		}
	}
	__except (EXCEPTION_EXECUTE_HANDLER) {
		return GetExceptionCode();
	}
}
__finally {
	ExReleaseFastMutex(&ds->Lock);
}

The call to RemoveTailList removes the top item from the list. The next assert verifies the list wasn’t empty before the removal (it can’t be as we dealt with that case in the previous code section). Remember, that if a list is empty calling RemoveTailList or RemoveHeadList returns the head’s pointer.

If the client’s buffer is too small, we reinsert the item back and bail. Otherwise, we copy the data, update the data stack’s stats and free our copy of the item.

Cleanup

The stack clear operation is relatively straightforward of them all. Here is the kernel part that matters:

NTSTATUS DsClearDataStack(DataStack* ds) {
	ExAcquireFastMutex(&ds->Lock);
	LIST_ENTRY* link;

	while ((link = RemoveHeadList(&ds->Head)) != &ds->Head) {
		auto item = CONTAINING_RECORD(link, DataBlock, Link);
		ExFreePool(item);
	}
	ds->Count = 0;
	ds->Size = 0;
	ExReleaseFastMutex(&ds->Lock);

	return STATUS_SUCCESS;
}

We take the lock, and then go over the list, removing and freeing each item. Finally, we update the stats to zero items and zero bytes.

Testing

Here is one way to test – having an executable run twice, the first instance pushes some items, and the second one popping items. main creates a data stack with a name. If it’s a new object, it assumes the role of “pusher”. Otherwise, it assumes the role of “popper”:

int main() {
	HANDLE hDataStack = CreateDataStack(nullptr, 0, 100, 
        10 << 20, L"MyDataStack");
	if (!hDataStack) {
		printf("Failed to create data stack (%u)\n", GetLastError());
		return 1;
	}

	printf("Handle created: 0x%p\n", hDataStack);

	if (GetLastError() == ERROR_ALREADY_EXISTS) {
		printf("Opened an existing object... will pop elements\n");
		PopItems(hDataStack);
	}
	else {
		Sleep(5000);

		PushString(hDataStack, "Hello, data stack!");
		PushString(hDataStack, "Pushing another string...");
		for (int i = 1; i <= 10; i++) {
			Sleep(100);
			PushDataStack(hDataStack, &i, sizeof(i));
		}
	}

	CloseHandle(hDataStack);
	return 0;
}

When creating a named object, if GetLastError returns ERROR_ALREADY_EXISTS, it means a handle is returned to an existing object. In our current implementation, this actually won’t work. We have to fix the CreateDataStack implementation like so:

HANDLE hDataStack;
auto status = NtCreateDataStack(&hDataStack, &attr, maxItemSize, maxItemCount, maxSize);
if (NT_SUCCESS(status)) {
	const NTSTATUS STATUS_OBJECT_NAME_EXISTS = 0x40000000;

	if (status == STATUS_OBJECT_NAME_EXISTS) {
		SetLastError(ERROR_ALREADY_EXISTS);
	}
	else {
		SetLastError(0);
	}
	return hDataStack;
}

After calling NtCreateDataStack we fix the returned “error” if the kernel returns STATUS_OBJECT_NAME_EXISTS. Now the previous will work correctly.

PushString is a little helper to push strings:

bool PushString(HANDLE h, std::string const& text) {
	auto ok = PushDataStack(h, (PVOID)text.c_str(), (ULONG)text.length() + 1);
	if (!ok)
		printf("Error in PushString: %u\n", GetLastError());
	return ok;
}

Finally, PopItems does some popping:

void PopItems(HANDLE h) {
	BYTE buffer[256];

	auto tick = GetTickCount64();
	while (GetTickCount64() - tick < 10000) {
		DWORD size = sizeof(buffer);
		if (!PopDataStack(h, buffer, &size) && GetLastError() != ERROR_NO_DATA) {
			printf("Error in PopDataStack (%u)\n", GetLastError());
			break;
		}
		if (size) {
			printf("Popped %u bytes: ", size);
			if (size > sizeof(int))
				printf("%s\n", (PCSTR)buffer);
			else
				printf("%d\n", *(int*)buffer);
		}
		Sleep(300);
	}
}

Not very exciting, but is good enough for this simple test. Here is some output, first from the “pusher” and then the “popper”:

E:\Test>DSTest.exe
Handle created: 0x00000000000000F8

E:\Test>DSTest.exe
Handle created: 0x0000000000000104
Opened an existing object... will popup elements
Popped 4 bytes: 2
Popped 4 bytes: 5
Popped 4 bytes: 8
Popped 4 bytes: 10
Popped 4 bytes: 9
Popped 4 bytes: 7
Popped 4 bytes: 6
Popped 4 bytes: 4
Popped 4 bytes: 3
Popped 4 bytes: 1
Popped 26 bytes: Pushing another string...
Popped 19 bytes: Hello, data stack!

What’s Next?

Are we done? Not quite. Astute readers may have noticed a little problem. What happens if a DataStack object is destroyed (e.g., the last handle to it is closed), but the stack is not empty? That memory will leak, as we have no “desctructor”. Running the “pusher” a few times without a second process that pops items results in a leak. Here is my PoolMonX tool showing the leak:

Notice the “Dstk” tag and the number of allocations being higher that deallocations.

Another feature we are missing is the ability to wait on a DataStack until data is available, if the stack is empty, maybe by calling the WaitForSingleObject API. It would be nice to have that.

Yet another missing element is the ability to query DataStack objects – how much memory is being used, how many items, etc.

We’ll deal with these aspects in the next part.