Zombie Processes

The term “Zombie Process” in Windows is not an official one, as far as I know. Regardless, I’ll define zombie process to be a process that has exited (for whatever reason), but at least one reference remains to the kernel process object (EPROCESS), so that the process object cannot be destroyed.

How can we recognize zombie processes? Is this even important? Let’s find out.

All kernel objects are reference counted. The reference count includes the handle count (the number of open handles to the object), and a “pointer count”, the number of kernel clients to the object that have incremented its reference count explicitly so the object is not destroyed prematurely if all handles to it are closed.

Process objects are managed within the kernel by the EPROCESS (undocumented) structure, that contains or points to everything about the process – its handle table, image name, access token, job (if any), threads, address space, etc. When a process is done executing, some aspects of the process get destroyed immediately. For example, all handles in its handle table are closed; its address space is destroyed. General properties of the process remain, however, some of which only have true meaning once a process dies, such as its exit code.

Process enumeration tools such as Task Manager or Process Explorer don’t show zombie processes, simply because the process enumeration APIs (EnumProcesses, Process32First/Process32Next, the native NtQuerySystemInformation, and WTSEnumerateProcesses) don’t return these – they only return processes that can still run code. The kernel debugger, on the other hand, shows all processes, zombie or not when you type something like !process 0 0. Identifying zombie processes is easy – their handle table and handle count is shown as zero. Here is one example:

kd> !process ffffc986a505a080 0
PROCESS ffffc986a505a080
    SessionId: 1  Cid: 1010    Peb: 37648ff000  ParentCid: 0588
    DirBase: 16484cd000  ObjectTable: 00000000  HandleCount:   0.
    Image: smartscreen.exe

Any kernel object referenced by the process object remains alive as well – such as a job (if the process is part of a job), and the process primary token (access token object). We can get more details about the process by passing the detail level “1” in the !process command:

lkd> !process ffffc986a505a080 1
PROCESS ffffc986a505a080
    SessionId: 1  Cid: 1010    Peb: 37648ff000  ParentCid: 0588
    DirBase: 16495cd000  ObjectTable: 00000000  HandleCount:   0.
    Image: smartscreen.exe
    VadRoot 0000000000000000 Vads 0 Clone 0 Private 16. Modified 7. Locked 0.
    DeviceMap ffffa2013f24aea0
    Token                             ffffa20147ded060
    ElapsedTime                       1 Day 15:11:50.174
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.015
    QuotaPoolUsage[PagedPool]         0
    QuotaPoolUsage[NonPagedPool]      0
    Working Set Sizes (now,min,max)  (17, 50, 345) (68KB, 200KB, 1380KB)
    PeakWorkingSetSize                2325
    VirtualSize                       0 Mb
    PeakVirtualSize                   2101341 Mb
    PageFaultCount                    2500
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      20
    Job                               ffffc98672eea060

Notice the address space does not exist anymore (VadRoot is zero). The VAD (Virtual Address Descriptors) is a data structure managed as a balanced binary search tree that describes the address space of a process – which parts are committed, which parts are reserved, etc. No address space exists anymore. Other details of the process are still there as they are direct members of the EPROCESS structure, such as the kernel and user time the process has used, its start and exit times (not shown in the debugger’s output above).

We can ask the debugger to show the reference count of any kernel object by using the generic !object command, to be followed by !trueref if there are handles open to the object:

lkd> !object ffffc986a505a080
Object: ffffc986a505a080  Type: (ffffc986478ce380) Process
    ObjectHeader: ffffc986a505a050 (new version)
    HandleCount: 1  PointerCount: 32768
lkd> !trueref ffffc986a505a080
ffffc986a505a080: HandleCount: 1 PointerCount: 32768 RealPointerCount: 1

Clearly, there is a single handle open to the process and that’s the only thing keeping it alive.

One other thing that remains is the unique process ID (shown as Cid in the above output). Process and thread IDs are generated by using a private handle table just for this purpose. This explains why process and thread IDs are always multiples of four, just like handles. In fact, the kernel treats PIDs and TIDs with the HANDLE type, rather with something like ULONG. Since there is a limit to the number of handles in a process (16711680, the reason is not described here), that’s also the limit for the number of process and threads that could exist on a system. This is a rather large number, so probably not an issue from a practical perspective, but zombie processes still keep their PIDs “taken”, so it cannot be reused. This means that in theory, some code can create millions of processes, terminate them all, but not close the handles it receives back, and eventually new processes could not be created anymore because PIDs (and TIDs) run out. I don’t know what would happen then 🙂

Here is a simple loop to do something like that by creating and destroying Notepad processes but keeping handles open:

WCHAR name[] = L"notepad";
STARTUPINFO si{ sizeof(si) };
PROCESS_INFORMATION pi;
int i = 0;
for (; i < 1000000; i++) {	// use 1 million as an example
	auto created = ::CreateProcess(nullptr, name, nullptr, nullptr,
        FALSE, 0, nullptr, nullptr, &si, &pi);
	if (!created)
		break;
	::TerminateProcess(pi.hProcess, 100);
	printf("Index: %6d PID: %u\n", i + 1, pi.dwProcessId);
	::CloseHandle(pi.hThread);
}
printf("Total: %d\n", i);

The code closes the handle to the first thread in the process, as keeping it alive would create “Zombie Threads”, much like zombie processes – threads that can no longer run any code, but still exist because at least one handle is keeping them alive.

How can we get a list of zombie processes on a system given that the “normal” tools for process enumeration don’t show them? One way of doing this is to enumerate all the process handles in the system, and check if the process pointed by that handle is truly alive by calling WaitForSingleObject on the handle (of course the handle must first be duplicated into our process so it’s valid to use) with a timeout of zero – we don’t want to wait really. If the result is WAIT_OBJECT_0, this means the process object is signaled, meaning it exited – it’s no longer capable of running any code. I have incorporated that into my Object Explorer (ObjExp.exe) tool. Here is the basic code to get details for zombie processes (the code for enumerating handles is not shown but is available in the source code):

m_Items.clear();
m_Items.reserve(128);
std::unordered_map<DWORD, size_t> processes;
for (auto const& h : ObjectManager::EnumHandles2(L"Process")) {
	auto hDup = ObjectManager::DupHandle(
        (HANDLE)(ULONG_PTR)h->HandleValue , h->ProcessId, 
        SYNCHRONIZE | PROCESS_QUERY_LIMITED_INFORMATION);
	if (hDup && WAIT_OBJECT_0 == ::WaitForSingleObject(hDup, 0)) {
		//
		// zombie process
		//
		auto pid = ::GetProcessId(hDup);
		if (pid) {
			auto it = processes.find(pid);
			ZombieProcess zp;
			auto& z = it == processes.end() ? zp : m_Items[it->second];
			z.Pid = pid;
			z.Handles.push_back({ h->HandleValue, h->ProcessId });
			WCHAR name[MAX_PATH];
			if (::GetProcessImageFileName(hDup, 
                name, _countof(name))) {
				z.FullPath = 
                    ProcessHelper::GetDosNameFromNtName(name);
				z.Name = wcsrchr(name, L'\\') + 1;
			}
			::GetProcessTimes(hDup, 
                (PFILETIME)&z.CreateTime, (PFILETIME)&z.ExitTime, 
                (PFILETIME)&z.KernelTime, (PFILETIME)&z.UserTime);
			::GetExitCodeProcess(hDup, &z.ExitCode);
			if (it == processes.end()) {
				m_Items.push_back(std::move(z));
				processes.insert({ pid, m_Items.size() - 1 });
			}
		}
	}
	if (hDup)
		::CloseHandle(hDup);
}

The data structure built for each process and stored in the m_Items vector is the following:

struct HandleEntry {
	ULONG Handle;
	DWORD Pid;
};
struct ZombieProcess {
	DWORD Pid;
	DWORD ExitCode{ 0 };
	std::wstring Name, FullPath;
	std::vector<HandleEntry> Handles;
	DWORD64 CreateTime, ExitTime, KernelTime, UserTime;
};

The ObjectManager::DupHandle function is not shown, but it basically calls DuplicateHandle for the process handle identified in some process. if that works, and the returned PID is non-zero, we can go do the work. Getting the process image name is done with GetProcessImageFileName – seems simple enough, but this function gets the NT name format of the executable (something like \Device\harddiskVolume3\Windows\System32\Notepad.exe), which is good enough if only the “short” final image name component is desired. if the full image path is needed in Win32 format (e.g. “c:\Windows\System32\notepad.exe”), it must be converted (ProcessHelper::GetDosNameFromNtName). You might be thinking that it would be far simpler to call QueryFullProcessImageName and get the Win32 name directly – but this does not work, and the function fails. Internally, the NtQueryInformationProcess native API is called with ProcessImageFileNameWin32 in the latter case, which fails if the process is a zombie one.

Running Object Explorer and selecting Zombie Processes from the System menu shows a list of all zombie processes (you should run it elevated for best results):

Object Explorer showing zombie processes

The above screenshot shows that many of the zombie processes are kept alive by GameManagerService.exe. This executable is from Razer running on my system. It definitely has a bug that keeps process handle alive way longer than needed. I’m not sure it would ever close these handles. Terminating this process will resolve the issue as the kernel closes all handles in a process handle table once the process terminates. This will allow all those processes that are held by that single handle to be freed from memory.

I plan to add Zombie Threads to Object Explorer – I wonder how many threads are being kept “alive” without good reason.

Mysteries of the Registry

The Windows Registry is one of the most recognized aspects of Windows. It’s a hierarchical database, storing information on a machine-wide basis and on a per-user basis… mostly. In this post, I’d like to examine the major parts of the Registry, including the “real” Registry.

Looking at the Registry is typically done by launching the built-in RegEdit.exe tool, which shows the five “hives” that seem to comprise the Registry:

RegEdit showing the main hives

These so-called “hives” provide some abstracted view of the information in the Registry. I’m saying “abstracted”, because not all of these are true hives. A true hive is stored in a file. The full hive list can be found in the Registry itself – at HKLM\SYSTEM\CurrentControlSet\Control\hivelist (I’ll abbreviate HKEY_LOCAL_MACHINE as HKLM), mapping an internal key name to the file where it’s stored (more on these “internal” key names will be discussed soon):

The hive list

Let’s examine the so-called “hives” as seen in the root RegEdit’s view.

  • HKEY_LOCAL_MACHINE is the simplest to understand. It contains machine-wide information, most of it stored in files (persistent). Some details related to hardware is built when the system initializes and is only kept in memory while the system is running. Such keys are volatile, since their contents disappear when the system is shut down.
    There are many interesting keys within HKLM, but my goal is not to go over every key (that would take a full book), but highlight a few useful pieces. HKLM\System\CurrentControlSet\Services is the key where all services and device drivers are installed. Note that “CurrentControlSet” is not a true key, but in fact is a link key, connecting it to something like HKLM\System\ControlSet001. The reason for this indirection is beyond the scope of this post. Regedit does not show this fact directly – there is no way to tell whether a key is a true key or just points to a different key. This is one reason I created Total Registry (formerly called Registry Explorer), that shows these kind of nuances:
TotalRegistry showing HKLM\System\CurrentControlSet

The liked key seems to have a weird name starting with \REGISTRY\MACHINE\. We’ll get to that shortly.

Other subkeys of note under HKLM include SOFTWARE, where installed applications store their system-level information; SAM and SECURITY, where local security policy and local accounts information are managed. These two subkeys contents is not not visible – even administrators don’t get access – only the SYSTEM account is granted access. One way to see what’s in these keys is to use psexec from Sysinternals to launch RegEdit or TotalRegistry under the SYSTEM account. Here is a command you can run in an elevated command window that will launch RegEdit under the SYSTEM account (if you’re using RegEdit, close it first):

psexec -s -i -d RegEdit

The -s switch indicates the SYSTEM account. -i is critical as to run the process in the interactive session (the default would run it in session 0, where no interactive user will ever see it). The -d switch is optional, and simply returns control to the console while the process is running, rather than waiting for the process to terminate.

The other way to gain access to the SAM and SECURITY subkeys is to use the “Take Ownership” privilege (easy to do when the Permissions dialog is open), and transfer the ownership to an admin user – the owner can specify who can do what with an object, and allow itself full access. Obviously, this is not a good idea in general, as it weakens security.

The BCD00000000 subkey contains the Boot Configuration Data (BCD), normally accessed using the bcdedit.exe tool.

  • HKEY_USERS – this is the other hive that truly stores data. Its subkeys contain user profiles for all users that ever logged in locally to this machine. Each subkey’s name is a Security ID (SID), in its string representation:
HKEY_USERS

There are 3 well-known SIDs, representing the SYSTEM (S-1-5-18), LocalService (S-1-5-19), and NetworkService (S-1-5-20) accounts. These are the typical accounts used for running Windows Services. “Normal” users get ugly SIDs, such as the one shown – that’s my user’s local SID. You may be wondering what is that “_Classes” suffix in the second key. We’ll get to that as well.

  • HKEY_CURRENT_USER is a link key, pointing to the user’s subkey under HKEY_USERS running the current process. Obviously, the meaning of “current user” changes based on the process access token looking at the Registry.
  • HKEY_CLASSES_ROOT is the most curious of the keys. It’s not a “real” key in the sense that it’s not a hive – not stored in a file. It’s not a link key, either. This key is a “combination” of two keys: HKLM\Software\Classes and HKCU\Software\Classes. In other words, the information in HKEY_CLASSES_ROOT is coming from the machine hive first, but can be overridden by the current user’s hive.
    What information is there anyway? The first thing is shell-related information, such as file extensions and associations, and all other information normally used by Explorer.exe. The second thing is information related to the Component Object Model (COM). For example, the CLSID subkey holds COM class registration (GUIDs you can pass to CoCreateInstance to (potentially) create a COM object of that class). Looking at the CLSID subkey under HKLM\Software\Classes shows there are 8160 subkeys, or roughly 8160 COM classes registered on my system from HKLM:
HKLM\Software\Classes

Looking at the same key under HKEY_CURRENT_USER tells a different story:

HKCU\Software\Classes

Only 46 COM classes provide extra or overridden registrations. HKEY_CLASSES_ROOT combines both, and uses HKCU in case of a conflict (same key name). This explains the extra “_Classes” subkey within the HKEY_USERS key – it stores the per user stuff (in the file UsrClasses.dat in something like c:\Users\<username>\AppData\Local\Microsoft\Windows).

  • HKEY_CURRENT_CONFIG is a link to HKLM\SYSTEM\CurrentControlSet\Hardware\Profiles\Current

    The list of “standard” hives (the hives accessible by official Windows APIs such as RegOpenKeyEx contains some more that are not shown by Regedit. They can be viewed by TotalReg if the option “Extra Hives” is selected in the View menu. At this time, however, the tool needs to be restarted for this change to take effect (I just didn’t get around to implementing the change dynamically, as it was low on my priority list). Here are all the hives accessible with the official Windows API:
All hives

I’ll let the interested reader to dig further into these “extra” hives. On of these hives deserves special mentioning – HKEY_PERFORMANCE_DATA – it was used in the pre Windows 2000 days as a way to access Performance Counters. Registry APIs had to be used at the time. Fortunately, starting from Windows 2000, a new dedicated API is provided to access Performance Counters (functions starting with Pdh* in <pdh.h>).

Is this it? Is this the entire Registry? Not quite. As you can see in TotalReg, there is a node called “Registry”, that tells yet another story. Internally, all Registry keys are rooted in a single key called REGISTRY. This is the only named Registry key. You can see it in the root of the Object Manager’s namespace with WinObj from Sysinternals:

WinObj from Sysinternals showing the Registry key object

Here is the object details in a Local Kernel debugger:

lkd> !object \registry
Object: ffffe00c8564c860  Type: (ffff898a519922a0) Key
    ObjectHeader: ffffe00c8564c830 (new version)
    HandleCount: 1  PointerCount: 32770
    Directory Object: 00000000  Name: \REGISTRY
lkd> !trueref ffffe00c8564c860
ffffe00c8564c860: HandleCount: 1 PointerCount: 32770 RealPointerCount: 3

All other Registry keys are based off of that root key, the Configuration Manager (the kernel component in charge of the Registry) parses the remaining path as expected. This is the real Registry. The official Windows APIs cannot use this path format, but native APIs can. For example, using NtOpenKey (documented as ZwOpenKey in the Windows Driver Kit, as this is a system call) allows such access. This is how TotalReg is able to look at the real Registry.

Clearly, the normal user-mode APIs somehow map the “standard” hive path to the real Registry path. The simplest is the mapping of HKEY_LOCAL_MACHINE to \REGISTRY\MACHINE. Another simple one is HKEY_USERS mapped to \REGISTRY\USER. HKEY_CURRENT_USER is a bit more complex, and needs to be mapped to the per-user hive under \REGISTRY\USER. The most complex is our friend HKEY_CLASSES_ROOT – there is no simple mapping – the APIs have to check if there is per-user override or not, etc.

Lastly, it seems there are keys in the real Registry that cannot be reached from the standard Registry at all:

The real Registry

There is a key named “A” which seems inaccessible. This key is used for private keys in processes, very common in Universal Windows Application (UWP) processes, but can be used in other processes as well. They are not accessible generally, not even with kernel code – the Configuration Manager prevents it. You can verify their existence by searching for \Registry\A in tools like Process Explorer or TotalReg itself (by choosing Scan Key Handles from the Tools menu). Here is TotalReg, followed by Process Explorer:

TotalReg key handles
Process Explorer key handles

Finally, the WC key is used for Windows Container, internally called Silos. A container (like the ones created by Docker) is an isolated instance of a user-mode OS, kind of like a lightweight virtual machine, but the kernel is not separate (as would be with a true VM), but is provided by the host. Silos are very interesting, but outside the scope of this post.

Briefly, there are two main Silo types: An Application Silo, which is not a true container, and mostly used with application based on the Desktop Bridge technology. A classic example is WinDbg Preview. The second type is Server Silo, which is a true container. A true container must have its file system, Registry, and Object Manager namespace virtualized. This is exactly the role of the WC subkeys – provide the private Registry keys for containers. The Configuration Manager (as well as other parts of the kernel) are Silo-aware, and will redirect Registry calls to the correct subkey, having no effect on the Host Registry or the private Registry of other Silos.

You can examine some aspects of silos with the kernel debugger !silo command. Here is an example from a server 2022 running a Server Silo and the Registry keys under WC:

lkd> !silo
		Address          Type       ProcessCount Identifier
		ffff800f2986c2e0 ServerSilo 15           {1d29488c-bccd-11ec-a503-d127529101e4} (0n732)
1 active Silo(s)
lkd> !silo ffff800f2986c2e0

Silo ffff800f2986c2e0:
		Job               : ffff800f2986c2e0
		Type              : ServerSilo
		Identifier        : {1d29488c-bccd-11ec-a503-d127529101e4} (0n732)
		Processes         : 15

Server silo globals ffff800f27e65a40:
		Default Error Port: ffff800f234ee080
		ServiceSessionId  : 217
		Root Directory    : 00007ffcad26b3e1 '\Silos\732'
		State             : Running
A Server Silo’s keys

There you have it. The relatively simple-looking Registry shown in RegEdit is viewed differently by the kernel. Device driver writers find this out relatively early – they cannot use the “abstractions” provided by user mode even if these are sometimes convenient.


Threads, Threads, and More Threads

Looking at a typical Windows system shows thousands of threads, with process numbers in the hundreds, even though the total CPU consumption is low, meaning most of these threads are doing nothing most of the time. I typically rant about it in my Windows Internals classes. Why so many threads?

Here is a snapshot of my Task Manager showing the total number of threads and processes:

Showing processes details and sorting by thread count looks something like this:

The System process clearly has many threads. These are kernel threads created by the kernel itself and by device drivers. These threads are always running in kernel mode. For this post, I’ll disregard the System process and focus on “normal” user-mode processes.

There are other kernel processes that we should ignore, such as Registry and Memory Compression. Registry has few threads, but Memory Compression has many. It’s not shown in Task Manager (by design), but is shown in other tools, such as Process Explorer. While I’m writing this post, it has 78 threads. We should probably skip that process as well as being “out of our control”.

Notice the large number of threads in processes running the images Explorer.exe, SearchIndexer.exe, Nvidia Web helper.exe, Outlook.exe, Powerpnt.exe and MsMpEng.exe. Let’s write some code to calculate the average number of threads in a process and the standard deviation:

float ComputeStdDev(std::vector<int> const& values, float& average) {
	float total = 0;
	std::for_each(values.begin(), values.end(), 
		[&](int n) { total += n; });
	average = total / values.size();
	total = 0;
	std::for_each(values.begin(), values.end(), 
		[&](int n) { total += (n - average) * (n - average); });
	return std::sqrt(total / values.size());
}

int main() {
	auto hSnapshot = ::CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
	
	PROCESSENTRY32 pe;
	pe.dwSize = sizeof(pe);

	// skip the idle process
	::Process32First(hSnapshot, &pe);

	int processes = 0, threads = 0;
	std::vector<int> threads_per_process;
	threads_per_process.reserve(500);
	while (::Process32Next(hSnapshot, &pe)) {
		processes++;
		threads += pe.cntThreads;
		threads_per_process.push_back(pe.cntThreads);
	}
	::CloseHandle(hSnapshot);

	assert(processes == threads_per_process.size());

	printf("Process: %d Threads: %d\n", processes, threads);
	float average;
	auto sd = ComputeStdDev(threads_per_process, average);
	printf("Average threads/process: %.2f\n", average);
	printf("Std. Dev.: %.2f\n", sd);

	return 0;
}

The ComputeStdDev function computes the standard deviation and average of a vector of integers. The main function uses the ToolHelp API to enumerate processes in the system, which fortunately also provides the number of threads in each processes (stored in the threads_per_process vector. If I run this (no processes removed just yet), this is what I get:

Process: 525 Threads: 7810
Average threads/process: 14.88
Std. Dev.: 23.38

Almost 15 threads per process, with little CPU consumption in my Task Manager. The standard deviation is more telling – it’s big compared to the average, which suggests that many processes are far from the average in their thread consumption. And since a negative thread count is not possible (even zero is almost impossible), the the divergence is with higher thread numbers.

To be fair, let’s remove the System and Memory Compression processes from our calculations. Here are the changes to the while loop:

while (::Process32Next(hSnapshot, &pe)) {
	if (pe.th32ProcessID == 4 || _wcsicmp(pe.szExeFile, L"memory compression") == 0)
		continue;
//...

Here are the results:

Process: 521 Threads: 7412
Average threads/process: 14.23
Std. Dev.: 14.14

The standard deviation is definitely smaller, but still pretty big (close to the average), which does not invalidate the previous point. Some processes use lots of threads.

In an ideal world, the number of threads in a system would be the same as the number of logical processors – any more and threads might fight over processors, any less and you’re not using the full power of the machine. Obviously, each “normal” process must have at least one thread running whatever main function is available in the executable, so on my system 521 threads would be the minimum number of threads. Still – we have over 7000.

What are these threads doing, anyway? Let’s examine some processes. First, an Explorer.exe process. Here is the Threads tab shown in Process Explorer:

Thread list in Explorer.exe instance

93 threads. I’ve sorted the list by Start Address to get a sense of the common functions used. Let’s dig into some of them. One of the most common (in other processes as well) is ntdll!TppWorkerThread – this is a thread pool thread, likely waiting for work. Clicking the Stack button (or double clicking the entry in the list) shows the following call stack:

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x500
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KiSchedulerApc+0x3bd
ntoskrnl.exe!KiDeliverApc+0x2e9
ntoskrnl.exe!KiSwapThread+0x827
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeRemoveQueueEx+0x263
ntoskrnl.exe!IoRemoveIoCompletion+0x98
ntoskrnl.exe!NtWaitForWorkViaWorkerFactory+0x38e
ntoskrnl.exe!KiSystemServiceCopyEnd+0x25
ntdll.dll!ZwWaitForWorkViaWorkerFactory+0x14
ntdll.dll!TppWorkerThread+0x2f7
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

The system call NtWaitForWorkViaWorkerFactory is the one waiting for work (the name Worker Factory is the internal name of the thread pool type in the kernel, officially called TpWorkerFactory). The number of such threads is typically dynamic, growing and shrinking based on the amount of work provided to the thread pool(s). The minimum and maximum threads can be tweaked by APIs, but most processes are unlikely to do so.

Another function that appears a lot in the list is shcore.dll!_WrapperThreadProc. It looks like some generic function used by Explorer for its own threads. We can examine some call stacks to get a sense of what’s going on. Here is one:

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x500
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KiSchedulerApc+0x3bd
ntoskrnl.exe!KiDeliverApc+0x2e9
ntoskrnl.exe!KiSwapThread+0x827
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KeWaitForMultipleObjects+0x45b
win32kfull.sys!xxxRealSleepThread+0x362
win32kfull.sys!xxxSleepThread2+0xb5
win32kfull.sys!xxxRealInternalGetMessage+0xcfd
win32kfull.sys!NtUserGetMessage+0x92
win32k.sys!NtUserGetMessage+0x16
ntoskrnl.exe!KiSystemServiceCopyEnd+0x25
win32u.dll!NtUserGetMessage+0x14
USER32.dll!GetMessageW+0x2e
SHELL32.dll!_LocalServerThread+0x66
shcore.dll!_WrapperThreadProc+0xe9
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

This one seems to be waiting for UI messages, probably managing some user interface (GetMessage). We can verify with other tools. Here is my own WinSpy:

Apparently, I was wrong. This thread has the hidden window type used to receive messages targeting COM objects that leave in this Single Threaded Apartment (STA).

We can inspect WinSpy some more to see the threads and windows created by Explorer. I’ll leave that to the interested reader.

Other generic call stacks start with ucrtbase.dll!thread_start+0x42. Many of them have the following call stack (kernel part trimmed for brevity):

ntdll.dll!ZwWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xf0
KERNELBASE.dll!WaitForMultipleObjects+0xe
cdp.dll!shared::CallbackNotifierListener::ListenerInternal::StartInternal+0x9f
cdp.dll!std::thread::_Invoke<std::tuple<<lambda_10793e1829a048bb2f8cc95974633b56> >,0>+0x2f
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

A function in CDP.dll is waiting for something (WaitForMultipleObjects). I count at least 12 threads doing just that. Perhaps all these waits could be consolidated to a smaller number of threads?

Let’s tackle a different process. Here is an instance of Teams.exe. My teams is minimized to the tray and I have not interacted with it for a while:

Teams threads

62 threads. Many have the same CRT wrapper for a thread created by Teams. Here are several call stacks I observed:

ntdll.dll!ZwRemoveIoCompletion+0x14
KERNELBASE.dll!GetQueuedCompletionStatus+0x4f
skypert.dll!rtnet::internal::SingleThreadIOCP::iocpLoop+0x116
skypert.dll!SplOpaqueUpperLayerThread::run+0x84
skypert.dll!auf::priv::MRMWTransport::process1+0x6c
skypert.dll!auf::ThreadPoolExecutorImp::workLoop+0x160
skypert.dll!auf::tpImpThreadTrampoline+0x47
skypert.dll!spl::threadWinDispatch+0x19
skypert.dll!spl::threadWinEntry+0x17b
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21
ntdll.dll!ZwWaitForAlertByThreadId+0x14
ntdll.dll!RtlSleepConditionVariableCS+0x105
KERNELBASE.dll!SleepConditionVariableCS+0x29
Teams.exe!uv_cond_wait+0x10
Teams.exe!worker+0x8d
Teams.exe!uv__thread_start+0xa2
Teams.exe!thread_start<unsigned int (__cdecl*)(void *),1>+0x50
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

You can check more threads, but you get the idea. Most threads are waiting for something – this is not the ideal activity for a thread. A thread should run (useful) code.

Last example, Word:

57 threads. Word has been minimized for more than an hour now. The clearly common call stack looks like this:

ntdll.dll!ZwWaitForAlertByThreadId+0x14
ntdll.dll!RtlSleepConditionVariableSRW+0x131
KERNELBASE.dll!SleepConditionVariableSRW+0x29
v8jsi.dll!CrashForExceptionInNonABICompliantCodeRange+0x4092f6
v8jsi.dll!CrashForExceptionInNonABICompliantCodeRange+0x11ff2
v8jsi.dll!v8_inspector::V8StackTrace::topScriptIdAsInteger+0x43ad0
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

v8jsi.dll is the React Native v8 engine – it’s creating many threads, most of which are doing nothing. I found it in Outlook and PowerPoint as well.

Many applications today depend on various libraries and frameworks, some of which don’t seem to care too much about using threads economically – examples include Node.js, the Electron framework, even Java and .NET. Threads are not free – there is the ETHREAD and related data structures in the kernel, stack in kernel space, and stack in user space. Context switches and code run by the kernel scheduler when threads change states from Running to Waiting, and from Waiting to Ready are not free, either.

Many desktop/laptop systems today are very powerful and it might seem everything is fine. I don’t think so. Developers use so many layers of abstraction these days, that we sometimes forget there are actual processors that execute the code, and need to use memory and other resources. None of that is free.

Registration is open for the Windows Internals training

My schedule has been a mess in recent months, and continues to be so for the next few months. However, I am opening registration today for the Windows Internals training with some date changes from my initial plan.

Here are the dates and times (all based on London time) – 5 days total:

  • July 6: 4pm to 12am (full day)
  • July 7: 4pm to 8pm
  • July 11: 4pm to 12am (full day)
  • July 12, 13, 14, 18, 19: 4pm to 8pm

Training cost is 800 USD, if paid by an individual, or 1500 USD if paid by a company. Participants from Ukraine (please provide some proof) are welcome with a 90% discount (paying 80 USD, individual payments only).

If you’d like to register, please send me an email to zodiacon@live.com with “Windows Internals training” in the title, provide your full name, company (if any), preferred contact email, and your time zone. The basic syllabus can be found here. if you’ve sent me an email before when I posted about my upcoming classes, you don’t have to do that again – I will send full details soon.

The sessions will be recorded, so can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM me on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Planned Upcoming Classes

Some people asked me if I had a schedule of trainings I plan to do in the coming months. Well, here it is. At this point I would like to gauge interest, and plan the course hours (time zone) to accommodate the majority of participants.

I am moving to classes that are partly full days and partly half days. The full day sessions are going to be recorded for the participants (so that if anyone misses something because of urgent work, inconvenient time zone, etc., the recording should help). The half days are not recorded, and should be easier to handle, since they are only about 4 hours long.

Here is the planned course list with dates (f=full day, all others are half-days). The cost is in USD (paid by individual / paid by a company):

  • COM Programming with C++ (3 days): April 25, 26, 27, 28, May 2, 3 (Cost: 700/1300)
  • Windows System Programming (5 days): May 16 (f), 17, 18, 19, 23 (f), 24, 25, 26 (Cost: 800/1500)
  • Windows Kernel Programming (4 days): June 6 (f), 8, 9, 13 (f), 14 (Cost: 800/1500)
  • Windows Internals (5 days): July 11 (f), 12, 13, 14, 18 (f), 19, 20, 21 (Cost: 800/1500)
  • Advanced Kernel Programming (New!) (4 days): September 12 (f), 13, 14, 15, 19, 20, 21 (Cost: 800/1500)

“Advanced Kernel Programming” is a new class I’m planning, suitable for those who participated in “Windows Kernel Programming” (or have equivalent knowledge). This course will cover file system mini-filters, NDIS filters, and the Windows Filtering Platform (WFP), along with other advanced programming techniques.

I may add more classes after September, but it’s too far from now to make such a commitment.

If you are interested in one or more of these classes, please write an email to zodiacon@live.com, and provide your name, preferred contact email, and your time zone. It’s not a commitment on your part, you may change your mind later on, but it should be genuine, where the dates and topics work for you.

Also, if you have other classes you would be happy if I deliver, you are welcome to suggest them. No promises, of course, but if there is enough interest, I will consider creating and delivering them.

If you’d like a private class for your team, get in touch. Syllabi can be customized as needed.

Have a great year ahead!

Icon Handler with ATL

One of the exercises I gave at the recent COM Programming class was to build an Icon Handler that integrates with the Windows Shell, where DLLs should have an icon based on their “bitness” – whether they’re 64-bit or 32-bit Portable Executable (PE).

The Shell provides many opportunities for extensibility. An Icon Handler is one of the simplest, but still requires writing a full-fledged COM component that implements certain interfaces that the shell expects. Here is the result of using the Icon Handler DLL, showing the folders c:\Windows\System32 and c:\Windows\SysWow64 (large icons for easier visibility).

C:\Windows\System32
C:\Windows\SysWow64

Let’s see how to build such an icon handler. The full code is at zodiacon/DllIconHandler.

The first step is to create a new ATL project in Visual Studio. I’ll be using Visual Studio 2022, but any recent version would work essentially the same way (e.g. VS 2019, or 2017). Locate the ATL project type by searching in the terrible new project dialog introduced in VS 2019 and still horrible in VS 2022.

ATL (Active Template Library) is certainly not the only way to build COM components. “Pure” C++ would work as well, but ATL provides all the COM boilerplate such as the required exported functions, class factories, IUnknown implementations, etc. Since ATL is fairly “old”, it lacks the elegance of other libraries such as WRL and WinRT, as it doesn’t take advantage of C++11 and later features. Still, ATL has withstood the test of time, is robust, and full featured when it comes to COM, something I can’t say for these other alternatives.

If you can’t locate the ATL project, you may not have ATL installed propertly. Make sure the C++ Desktop development workload is installed using the Visual Studio Installer.

Click Next and select a project name and location:

Click Create to launch the ATL project wizard. Leave all defaults (Dynamic Link Library) and click OK. Shell extensions of all kinds must be DLLs, as these are loaded by Explorer.exe. It’s not ideal in terms of Explorer’s stability, as aun unhandled exception can bring down the entire process, but this is necessary to get good performance, as no inter-process calls are made.

Two projects are created, named DllIconHandler and DllIconHandlerPS. The latter is a proxy/stub DLL that maybe useful if cross-apartment COM calls are made. This is not needed for shell extensions, so the PS project should simply be removed from the solution.


A detailed discussion of COM is way beyond the scope of this post.


The remaining project contains the COM DLL required code, such as the mandatory exported function, DllGetClassObject, and the other optional but recommended exports (DllRegisterServer, DllUnregisterServer, DllCanUnloadNow and DllInstall). This is one of the nice benefits of working with ATL project for COM component development: all the COM boilerplate is implemented by ATL.

The next step is to add a COM class that will implement our icon handler. Again, we’ll turn to a wizard provided by Visual Studio that provides the fundamentals. Right-click the project and select Add Item… (don’t select Add Class as it’s not good enough). Select the ATL node on the left and ATL Simple Object on the right. Set the name to something like IconHandler:

Click Add. The ATL New Object wizard opens up. The name typed in the Add New Item dialog is used as a basis for generating names for source code elements (like the C++ class) and COM elements (that would be written into the IDL file and the resulting type library). Since we’re not going to define a new interface (we need to implement explorer-defined interfaces), there is no real need to tweak anything. You can click Finish to generate the class.

Three files are added with this last step: IconHandler.h, IconHandler.cpp and IconHandler.rgs. The C++ source files role is obvious – implementing the Icon Handler. The rgs file contains a script in an ATL-provided “language” indicating what information to write to the Registry when this DLL is registered (and what to remove if it’s unregistered).

The IDL (Interface Definition Language) file has also been modified, adding the definitions of the wizard generated interface (which we don’t need) and the coclass. We’ll leave the IDL alone, as we do need it to generate the type library of our component because the ATL registration code uses it internally.

If you look in IconHandler.h, you’ll see that the class implements the IIconHandler empty interface generated by the wizard that we don’t need. It even derives from IDispatch:

class ATL_NO_VTABLE CIconHandler :
	public CComObjectRootEx<CComSingleThreadModel>,
	public CComCoClass<CIconHandler, &CLSID_IconHandler>,
	public IDispatchImpl<IIconHandler, &IID_IIconHandler, &LIBID_DLLIconHandlerLib, /*wMajor =*/ 1, /*wMinor =*/ 0> {

We can leave the IDispatchImpl-inheritance, since it’s harmless. But it’s useless as well, so let’s delete it and also delete the interfaces IIconHandler and IDispatch from the interface map located further down:

class ATL_NO_VTABLE CIconHandler :
	public CComObjectRootEx<CComSingleThreadModel>,
	public CComCoClass<CIconHandler, &CLSID_IconHandler> {
public:
	BEGIN_COM_MAP(CIconHandler)
	END_COM_MAP()

(I have rearranged the code a bit). Now we need to add the interfaces we truly have to implement for an icon handler: IPersistFile and IExtractIcon. To get their definitions, we’ll add an #include for <shlobj_core.h> (this is documented in MSDN). We add the interfaces to the inheritance hierarchy, the COM interface map, and use the Visual Studio feature to add the interface members for us by right-clicking the class name (CIconHandler), pressing Ctrl+. (dot) and selecting Implement all pure virtuals of CIconHandler. The resulting class header looks something like this (some parts omitted for clarity) (I have removed the virtual keyword as it’s inherited and doesn’t have to be specified in derived types):

class ATL_NO_VTABLE CIconHandler :
	public CComObjectRootEx<CComSingleThreadModel>,
	public CComCoClass<CIconHandler, &CLSID_IconHandler>,
	public IPersistFile,
	public IExtractIcon {
public:
	BEGIN_COM_MAP(CIconHandler)
		COM_INTERFACE_ENTRY(IPersistFile)
		COM_INTERFACE_ENTRY(IExtractIcon)
	END_COM_MAP()

//...
	// Inherited via IPersistFile
	HRESULT __stdcall GetClassID(CLSID* pClassID) override;
	HRESULT __stdcall IsDirty(void) override;
	HRESULT __stdcall Load(LPCOLESTR pszFileName, DWORD dwMode) override;
	HRESULT __stdcall Save(LPCOLESTR pszFileName, BOOL fRemember) override;
	HRESULT __stdcall SaveCompleted(LPCOLESTR pszFileName) override;
	HRESULT __stdcall GetCurFile(LPOLESTR* ppszFileName) override;

	// Inherited via IExtractIconW
	HRESULT __stdcall GetIconLocation(UINT uFlags, PWSTR pszIconFile, UINT cchMax, int* piIndex, UINT* pwFlags) override;
	HRESULT __stdcall Extract(PCWSTR pszFile, UINT nIconIndex, HICON* phiconLarge, HICON* phiconSmall, UINT nIconSize) override;
};

Now for the implementation. The IPersistFile interface seems non-trivial, but fortunately we just need to implement the Load method for an icon handler. This is where we get the file name we need to inspect. To check whether a DLL is 64 or 32 bit, we’ll add a simple enumeration and a helper function to the CIconHandler class:

	enum class ModuleBitness {
		Unknown,
		Bit32,
		Bit64
	};
	static ModuleBitness GetModuleBitness(PCWSTR path);

The implementation of IPersistFile::Load looks something like this:

HRESULT __stdcall CIconHandler::Load(LPCOLESTR pszFileName, DWORD dwMode) {
    ATLTRACE(L"CIconHandler::Load %s\n", pszFileName);

    m_Bitness = GetModuleBitness(pszFileName);
    return S_OK;
}

The method receives the full path of the DLL we need to examine. How do we know that only DLL files will be delivered? This has to do with the registration we’ll make for the icon handler. We’ll register it for DLL file extensions only, so that other file types will not be provided. Calling GetModuleBitness (shown later) performs the real work of determining the DLL’s bitness and stores the result in m_Bitness (a data member of type ModuleBitness).

All that’s left to do is tell explorer which icon to use. This is the role of IExtractIcon. The Extract method can be used to provide an icon handle directly, which is useful if the icon is “dynamic” – perhaps generated by different means in each case. In this example, we just need to return one of two icons which have been added as resources to the project (you can find those in the project source code. This is also an opportunity to provide your own icons).

For our case, it’s enough to return S_FALSE from Extract that causes explorer to use the information returned from GetIconLocation. Here is its implementation:

HRESULT __stdcall CIconHandler::GetIconLocation(UINT uFlags, PWSTR pszIconFile, UINT cchMax, int* piIndex, UINT* pwFlags) {
    if (s_ModulePath[0] == 0) {
        ::GetModuleFileName(_AtlBaseModule.GetModuleInstance(), 
            s_ModulePath, _countof(s_ModulePath));
        ATLTRACE(L"Module path: %s\n", s_ModulePath);
    }
    if (s_ModulePath[0] == 0)
        return S_FALSE;

    if (m_Bitness == ModuleBitness::Unknown)
        return S_FALSE;

    wcscpy_s(pszIconFile, wcslen(s_ModulePath) + 1, s_ModulePath);
    ATLTRACE(L"CIconHandler::GetIconLocation: %s bitness: %d\n", 
        pszIconFile, m_Bitness);
    *piIndex = m_Bitness == ModuleBitness::Bit32 ? 0 : 1;
    *pwFlags = GIL_PERINSTANCE;

    return S_OK;
}

The method’s purpose is to return the current (our icon handler DLL) module’s path and the icon index to use. This information is enough for explorer to load the icon itself from the resources. First, we get the module path to where our DLL has been installed. Since this doesn’t change, it’s only retrieved once (with GetModuleFileName) and stored in a static variable (s_ModulePath).

If this fails (unlikely) or the bitness could not be determined (maybe the file was not a PE at all, but just had such an extension), then we return S_FALSE. This tells explorer to use the default icon for the file type (DLL). Otherwise, we store 0 or 1 in piIndex, based on the IDs of the icons (0 corresponds to the lower of the IDs).

Finally, we need to set a flag inside pwFlags to indicate to explorer that this icon extraction is required for every file (GIL_PERINSTANCE). Otherwise, explorer calls IExtractIcon just once for any DLL file, which is the opposite of what we want.

The final piece of the puzzle (in terms of code) is how to determine whether a PE is 64 or 32 bit. This is not the point of this post, as any custom algorithm can be used to provide different icons for different files of the same type. For completeness, here is the code with comments:

CIconHandler::ModuleBitness CIconHandler::GetModuleBitness(PCWSTR path) {
    auto bitness = ModuleBitness::Unknown;
    //
    // open the DLL as a data file
    //
    auto hFile = ::CreateFile(path, GENERIC_READ, FILE_SHARE_READ, 
        nullptr, OPEN_EXISTING, 0, nullptr);
    if (hFile == INVALID_HANDLE_VALUE)
        return bitness;

    //
    // create a memory mapped file to read the PE header
    //
    auto hMemMap = ::CreateFileMapping(hFile, nullptr, PAGE_READONLY, 0, 0, nullptr);
    ::CloseHandle(hFile);
    if (!hMemMap)
        return bitness;

    //
    // map the first page (where the header is located)
    //
    auto p = ::MapViewOfFile(hMemMap, FILE_MAP_READ, 0, 0, 1 << 12);
    if (p) {
        auto header = ::ImageNtHeader(p);
        if (header) {
            auto machine = header->FileHeader.Machine;
            bitness = header->Signature == IMAGE_NT_OPTIONAL_HDR64_MAGIC ||
                machine == IMAGE_FILE_MACHINE_AMD64 || machine == IMAGE_FILE_MACHINE_ARM64 ?
                ModuleBitness::Bit64 : ModuleBitness::Bit32;
        }
        ::UnmapViewOfFile(p);
    }
    ::CloseHandle(hMemMap);
    
    return bitness;
}

To make all this work, there is still one more concern: registration. Normal COM registration is necessary (so that the call to CoCreateInstance issued by explorer has a chance to succeed), but not enough. Another registration is needed to let explorer know that this icon handler exists, and is to be used for files with the extension “DLL”.

Fortunately, ATL provides a convenient mechanism to add Registry settings using a simple script-like configuration, which does not require any code. The added keys/values have been placed in DllIconHandler.rgs like so:

HKCR
{
	NoRemove DllFile
	{
		NoRemove ShellEx
		{
			IconHandler = s '{d913f592-08f1-418a-9428-cc33db97ed60}'
		}
	}
}

This sets an icon handler in HKEY_CLASSES_ROOT\DllFile\ShellEx, where the IconHandler value specifies the CLSID of our component. You can find the CLSID in the IDL file where the coclass element is defined:

[
	uuid(d913f592-08f1-418a-9428-cc33db97ed60)
]
coclass IconHandler {

Replace your own CLSID if you’re building this project from scratch. Registration itself is done with the RegSvr32 built-in tool. With an ATL project, a successful build also causes RegSvr32 to be invoked on the resulting DLL, thus performing registration. The default behavior is to register in HKEY_CLASSES_ROOT which uses HKEY_LOCAL_MACHINE behind the covers. This requires running Visual Studio elevated (or an elevated command window if called from outside VS). It will register the icon handler for all users on the machine. If you prefer to register for the current user only (which uses HKEY_CURRENT_USER and does not require running elevated), you can set the per-user registration in VS by going to project properties, clinking on the Linker element and setting per-user redirection:

If you’re registering from outside VS, the per-user registration is achieved with:

regsvr32 /n /i:user <dllpath>

This is it! The full source code is available here.

Processes, Threads, and Windows

The relationship between processes and threads is fairly well known – a process contains one or more threads, running within the process, using process wide resources, like handles and address space.

Where do windows (those usually-rectangular elements, not the OS) come into the picture?

The relationship between a process, its threads, and windows, along with desktop and Window Stations, can be summarized in the following figure:

A process, threads, windows, desktops and a Window Station

A window is created by a thread, and that thread is designated as the owner of the window. The creating thread becomes a UI thread (if it’s the first User/GDI call it makes), getting a message queue (provided internally by Win32k.sys in the kernel). Every occurrence in the created window causes a message to be placed in the message queue of the owner thread. This means that if a thread creates 100 windows, it’s responsible for all these windows.

“Responsible” here means that the thread must perform something known as “message pumping” – pulling messages from its message queue and processing them. Failure to do that processing would lead to the infamous “Not Responding” scenario, where all the windows created by that thread become non-responsive. Normally, a thread can read its own message queue only – other threads can’t do it for that thread – this is a single threaded UI scheme used by the Windows OS by default. The simplest message pump would look something like this:

MSG msg;
while(::GetMessage(&msg, nullptr, 0, 0)) {
    ::TranslateMessage(&msg);
    ::DispatchMessage(&msg);
}

The call to GetMessage does not return until there is a message in the queue (filling up the MSG structure), or the WM_QUIT message has been retrieved, causing GetMessage to return FALSE and the loop exited. WM_QUIT is the message typically used to signal an application to close. Here is the MSG structure definition:

typedef struct tagMSG {
  HWND   hwnd;
  UINT   message;
  WPARAM wParam;
  LPARAM lParam;
  DWORD  time;
  POINT  pt;
} MSG, *PMSG;

Once a message has been retrieved, the important call is to DispatchMessage, that looks at the window handle in the message (hwnd) and looks up the window class that this window instance is a member of, and there, finally, the window procedure – the callback to invoke for messages destined to these kinds of windows – is invoked, handling the message as appropriate or passing it along to default processing by calling DefWindowProc. (I may expand on that if there is interest in a separate post).

You can enumerate the windows owned by a thread by calling EnumThreadWindows. Let’s see an example that uses the Toolhelp API to enumerate all processes and threads in the system, and lists the windows created (owned) by each thread (if any).

We’ll start by creating a snapshot of all processes and threads (error handling mostly omitted for clarity):

#include <Windows.h>
#include <TlHelp32.h>
#include <unordered_map>
#include <string>

int main() {
	auto hSnapshot = ::CreateToolhelp32Snapshot(
        TH32CS_SNAPPROCESS | TH32CS_SNAPTHREAD, 0);

We’ll build a map of process IDs to names for easy lookup later on when we show details of a process:

PROCESSENTRY32 pe;
pe.dwSize = sizeof(pe);

// we can skip the idle process
::Process32First(hSnapshot, &pe);

std::unordered_map<DWORD, std::wstring> processes;
processes.reserve(500);

while (::Process32Next(hSnapshot, &pe)) {
	processes.insert({ pe.th32ProcessID, pe.szExeFile });
}

Now we loop through all the threads, calling EnumThreadWindows and showing the results:

THREADENTRY32 te;
te.dwSize = sizeof(te);
::Thread32First(hSnapshot, &te);

static int count = 0;

struct Data {
	DWORD Tid;
	DWORD Pid;
	PCWSTR Name;
};
do {
	if (te.th32OwnerProcessID <= 4) {
		// idle and system processes have no windows
		continue;
	}
	Data d{ 
        te.th32ThreadID, te.th32OwnerProcessID, 
        processes.at(te.th32OwnerProcessID).c_str() 
    };
	::EnumThreadWindows(te.th32ThreadID, [](auto hWnd, auto param) {
		count++;
		WCHAR text[128], className[32];
		auto data = reinterpret_cast<Data*>(param);
		int textLen = ::GetWindowText(hWnd, text, _countof(text));
		int classLen = ::GetClassName(hWnd, className, _countof(className));
		printf("TID: %6u PID: %6u (%ws) HWND: 0x%p [%ws] %ws\n",
			data->Tid, data->Pid, data->Name, hWnd,
			classLen == 0 ? L"" : className,
			textLen == 0 ? L"" : text);
		return TRUE;	// bring in the next window
		}, reinterpret_cast<LPARAM>(&d));
} while (::Thread32Next(hSnapshot, &te));
printf("Total windows: %d\n", count);

EnumThreadWindows requires the thread ID to look at, and a callback that receives a window handle and whatever was passed in as the last argument. Returning TRUE causes the callback to be invoked with the next window until the window list for that thread is complete. It’s possible to return FALSE to terminate the enumeration early.

I’m using a lambda function here, but only a non-capturing lambda can be converted to a C function pointer, so the param value is used to provide context for the lambda, with a Data instance having the thread ID, its process ID, and the process name.

The call to GetClassName returns the name of the window class, or type of window, if you will. This is similar to the relationship between objects and classes in object orientation. For example, all buttons have the class name “button”. Of course, the windows enumerated are only top level windows (have no parent) – child windows are not included. It is possible to enumerate child windows by calling EnumChildWindows.

Going the other way around – from a window handle to its owner thread and *its* owner process is possible with GetWindowThreadProcessId.

Desktops

A window is placed on a desktop, and is bound to it. More precisely, a thread is bound to a desktop once it creates its first window (or has any hook installed with SetWindowsHookEx). Before that happens, a thread can attach itself to a desktop by calling SetThreadDesktop.

Normally, a thread will use the “default” desktop, which is where a logged in user sees his or her windows appearing. It is possible to create additional desktops using the CreateDesktop API. A classic example is the desktops Sysinternals tool. See my post on desktops and windows stations for more information.

It’s possible to redirect a new process to use a different desktop (and optionally a window station) as part of the CreateProcess call. The STARTUPINFO structure has a member named lpDesktop that can be set to a string with the format “WindowStationName\DesktopName” or simply “DesktopName” to keep the parent’s window station. This could look something like this:

STARTUPINFO si = { sizeof(si) };
// WinSta0 is the interactive window station
WCHAR desktop[] = L"Winsta0\\MyOtherDesktop";
si.lpDesktop = desktop;
PROCESS_INFORMATION pi;
::CreateProcess(..., &si, &pi);

“WinSta0” is always the name of the interactive Window Station (one of its desktops can be visible to the user).

The following example creates 3 desktops and launches notepad in the default desktop and the other desktops:

void LaunchInDesktops() {
	HDESK hDesktop[4] = { ::GetThreadDesktop(::GetCurrentThreadId()) };
	for (int i = 1; i <= 3; i++) {
		hDesktop[i] = ::CreateDesktop(
			(L"Desktop" + std::to_wstring(i)).c_str(),
			nullptr, nullptr, 0, GENERIC_ALL, nullptr);
	}
	STARTUPINFO si = { sizeof(si) };
	WCHAR name[32];
	si.lpDesktop = name;
	DWORD len;
	PROCESS_INFORMATION pi;
	for (auto h : hDesktop) {
		::GetUserObjectInformation(h, UOI_NAME, name, sizeof(name), &len);
		WCHAR pname[] = L"notepad";
		if (::CreateProcess(nullptr, pname, nullptr, nullptr, FALSE,
			0, nullptr, nullptr, &si, &pi)) {
			printf("Process created: %u\n", pi.dwProcessId);
			::CloseHandle(pi.hProcess);
			::CloseHandle(pi.hThread);
		}
	}
}

If yoy run the above function, you’ll see one Notepad window on your desktop. If you dig deeper, you’ll find 4 notepad.exe processes in Task Manager. The other three have created their windows in different desktops. Task Manager gives no indication of that, nor does the process list in Process Explorer. However, we see a hint in the list of handles of these “other” notepad processes. Here is one:

Handle named \Desktop1

Curiously enough, desktop objects are not stored as part of the kernel’s Object Manager namespace. If you double-click the handle, copy its address, and look it up in the kernel debugger, this is the result:

lkd> !object 0xFFFFAC8C12035530
Object: ffffac8c12035530  Type: (ffffac8c0f2fdbc0) Desktop
    ObjectHeader: ffffac8c12035500 (new version)
    HandleCount: 1  PointerCount: 32701
    Directory Object: 00000000  Name: Desktop1

Notice the directory object address is zero – it’s not part of any directory. It has meaning in the context of its containing Window Station.

You might think that we can use the EnumThreadWindows as demonstrated at the beginning of this post to locate the other notepad windows. Unfortunately, access is restricted and those notepad windows are not listed (more on that possibly in a future post).

We can try a different approach by using yet another enumration function – EnumDesktopWindows. Given a powerful enough desktop handle, we can enumerate all top level windows in that desktop:

void ListDesktopWindows(HDESK hDesktop) {
	::EnumDesktopWindows(hDesktop, [](auto hWnd, auto) {
		WCHAR text[128], className[32];
		int textLen = ::GetWindowText(hWnd, text, _countof(text));
		int classLen = ::GetClassName(hWnd, className, _countof(className));
		DWORD pid;
		auto tid = ::GetWindowThreadProcessId(hWnd, &pid);
		printf("TID: %6u PID: %6u HWND: 0x%p [%ws] %ws\n",
			tid, pid, hWnd, 
			classLen == 0 ? L"" : className,
			textLen == 0 ? L"" : text);
		return TRUE;
		}, 0);
}

The lambda body is very similar to the one we’ve seen earlier. The difference is that we start with a window handle with no thread ID. But it’s fairly easy to get the owner thread and process with GetWindowThreadProcessId. Here is one way to invoke this function:

auto hDesktop = ::OpenDesktop(L"Desktop1", 0, FALSE, DESKTOP_ENUMERATE);
ListDesktopWindows(hDesktop);

Here is the resulting output:

TID:   3716 PID:  23048 HWND: 0x0000000000192B26 [Touch Tooltip Window] Tooltip
TID:   3716 PID:  23048 HWND: 0x0000000001971B1A [UAC_InputIndicatorOverlayWnd]
TID:   3716 PID:  23048 HWND: 0x00000000002827F2 [UAC Input Indicator]
TID:   3716 PID:  23048 HWND: 0x0000000000142B82 [CiceroUIWndFrame] CiceroUIWndFrame
TID:   3716 PID:  23048 HWND: 0x0000000000032C7A [CiceroUIWndFrame] TF_FloatingLangBar_WndTitle
TID:  51360 PID:  23048 HWND: 0x0000000000032CD4 [Notepad] Untitled - Notepad
TID:   3716 PID:  23048 HWND: 0x0000000000032C8E [CicLoaderWndClass]
TID:  51360 PID:  23048 HWND: 0x0000000000202C62 [IME] Default IME
TID:  51360 PID:  23048 HWND: 0x0000000000032C96 [MSCTFIME UI] MSCTFIME UI

A Notepad window is clearly there. We can repeat the same exercise with the other two desktops, which I have named “Desktop2” and “Desktop3”.

Can we view and interact with these “other” notepads? The SwitchDesktop API exists for that purpose. Given a desktop handle with the DESKTOP_SWITCHDESKTOP access, SwitchDesktop changes the input desktop to the requested desktop, so that input devices redirect to that desktop. This is exactly how the Sysinternals Desktops tool works. I’ll leave the interested reader to try this out.

Window Stations

The last piece of the puzzle is a Window Station. I assume you’ve read my earlier post and understand the basics of Window Stations.

A Thread can be associated with a desktop. A process is associated with a window station, with the constraint being that any desktop used by a thread must be within the process’ window station. In other words, a process cannot use one window station and any one of its threads using a desktop that belongs to a different window station.

A process is associated with a window station upon creation – the one specified in the STARTUPINFO structure or the creator’s window station if not specified. Still, a process can associate itself with a different window station by calling SetProcessWindowStation. The constraint is that the provided window station must be part of the current session. There is one window station per session named “WinSta0”, called the interactive window station. One of its desktop can be the input desktop and have user interaction. All other window stations are non-interactive by definition, which is perfectly fine for processes that don’t require any user interface. In return, they get better isolation, because a window station contains its own set of desktops, its own clipboard and its own atom table. Windows handles, by the way, have Window Station scope.

Just like desktops, window stations can be created by calling CreateWindowStation. Window stations can be enumerated as well (in the current session) with EnumerateWindowStations.

Summary

This discussion of threads, processes, desktops, and window stations is not complete, but hopefully gives you a good idea of how things work. I may elaborate on some advanced aspects of Windows UI system in future posts.

Dynamic Symbolic Links

While teaching a Windows Internals class recently, I came across a situation which looked like a bug to me, but turned out to be something I didn’t know about – dynamic symbolic links.

Symbolic links are Windows kernel objects that point to another object. The weird situation in question was when running WinObj from Sysinternals and navigating to the KenrelObjects object manager directory.

WinObj from Sysinternals

You’ll notice some symbolic link objects that look weird: MemoryErrors, PhysicalMemoryChange, HighMemoryCondition, LowMemoryCondition and a few others. The weird thing that is fairly obvious is that these symbolic link objects have empty targets. Double-clicking any one of them confirms no target, and also shows a curious zero handles, as well as quota change of zero:

Symbolic link properties

To add to the confusion, searching for any of them with Process Explorer yields something like this:

It seems these objects are events, and not symbolic links!

My first instinct was that there is a bug in WinObj (I rewrote it recently for Sysinternals, so was certain I introduced a bug). I ran an old WinObj version, but the result was the same. I tried other tools with similar functionality, and still got the same results. Maybe a bug in Process Explorer? Let’s see in the kernel debugger:

lkd> !object 0xFFFF988110EC0C20
Object: ffff988110ec0c20  Type: (ffff988110efb400) Event
    ObjectHeader: ffff988110ec0bf0 (new version)
    HandleCount: 4  PointerCount: 117418
    Directory Object: ffff828b10689530  Name: HighCommitCondition

Definitely an event and not a symbolic link. What’s going on? I debugged it in WinObj, and indeed the reported object type is a symbolic link. Maybe it’s a bug in the NtQueryDirectoryObject used to query a directory object for an object.

I asked Mark Russinovich, could there be a bug in Windows? Mark remembered that this is not a bug, but a feature of symbolic links, where objects can be created/resolved dynamically when accessing the symbolic link. Let’s see if we can see something in the debugger:

lkd> !object \kernelobjects\highmemorycondition
Object: ffff828b10659510  Type: (ffff988110e9ba60) SymbolicLink
    ObjectHeader: ffff828b106594e0 (new version)
    HandleCount: 0  PointerCount: 1
    Directory Object: ffff828b10656ce0  Name: HighMemoryCondition
    Flags: 0x000010 ( Local )
    Target String is '*** target string unavailable ***'

Clearly, there is target, but notice the flags value 0x10. This is the flag indicating the symbolic link is a dynamic one. To get further information, we need to look at the object with a “symbolic link lenses” by using the data structure the kernel uses to represent symbolic links:

lkd> dt nt!_OBJECT_SYMBOLIC_LINK ffff828b10659510

   +0x000 CreationTime     : _LARGE_INTEGER 0x01d73d87`21bd21e5
   +0x008 LinkTarget       : _UNICODE_STRING "--- memory read error at address 0x00000000`00000005 ---"
   +0x008 Callback         : 0xfffff802`08512250     long  nt!MiResolveMemoryEvent+0

   +0x010 CallbackContext  : 0x00000000`00000005 Void
   +0x018 DosDeviceDriveIndex : 0
   +0x01c Flags            : 0x10
   +0x020 AccessMask       : 0x24

The Callback member shows the function that is being called (MiResolveMemoryEvent) that “resolves” the symbolic link to the relevant event. There are currently 11 such events, their names visible with the following:

lkd> dx (nt!_UNICODE_STRING*)&nt!MiMemoryEventNames,11
(nt!_UNICODE_STRING*)&nt!MiMemoryEventNames,11                 : 0xfffff80207e02e90 [Type: _UNICODE_STRING *]
    [0]              : "\KernelObjects\LowPagedPoolCondition" [Type: _UNICODE_STRING]
    [1]              : "\KernelObjects\HighPagedPoolCondition" [Type: _UNICODE_STRING]
    [2]              : "\KernelObjects\LowNonPagedPoolCondition" [Type: _UNICODE_STRING]
    [3]              : "\KernelObjects\HighNonPagedPoolCondition" [Type: _UNICODE_STRING]
    [4]              : "\KernelObjects\LowMemoryCondition" [Type: _UNICODE_STRING]
    [5]              : "\KernelObjects\HighMemoryCondition" [Type: _UNICODE_STRING]
    [6]              : "\KernelObjects\LowCommitCondition" [Type: _UNICODE_STRING]
    [7]              : "\KernelObjects\HighCommitCondition" [Type: _UNICODE_STRING]
    [8]              : "\KernelObjects\MaximumCommitCondition" [Type: _UNICODE_STRING]
    [9]              : "\KernelObjects\MemoryErrors" [Type: _UNICODE_STRING]
    [10]             : "\KernelObjects\PhysicalMemoryChange" [Type: _UNICODE_STRING]

Creating dynamic symbolic links is only possible from kernel mode, of course, and is undocumented anyway.

At least the conundrum is solved.

Creating Registry Links

The standard Windows Registry contains some keys that are not real keys, but instead are symbolic links (or simply, links) to other keys. For example, the key HKEY_LOCAL_MACHINE\System\CurrentControlSet is a symbolic link to HKEY_LOCAL_MACHINE\System\ControlSet001 (in most cases). When working with the standard Registry editor, RegEdit.exe, symbolic links look like normal keys, in the sense that they behave as the link’s target. The following figure shows the above mentioned keys. They look exactly the same (and they are).

There are several other existing links in the Registry. As another example, the hive HKEY_CURRENT_CONFIG is a link to (HKLM is HKEY_LOCAL_MACHINE) HKLM\SYSTEM\CurrentControlSet\Hardware Profiles\Current.

But how to do you create such links yourself? The official Microsoft documentation has partial details on how to do it, and it misses two critical pieces of information to make it work.

Let’s see if we can create a symbolic link. One rule of Registry links, is that the link must point to somewhere within the same hive where the link is created; we can live with that. For demonstration purposes, we’ll create a link in HKEY_CURRENT_USER named DesktopColors that links to HKEY_CURRENT_USER\Control Panel\Desktop\Colors.

The first step is to create the key and specify it to be a link rather than a normal key (error handling omitted):

HKEY hKey;
RegCreateKeyEx(HKEY_CURRENT_USER, L"DesktopColors", 0, nullptr,
	REG_OPTION_CREATE_LINK, KEY_WRITE, nullptr, &hKey, nullptr);

The important part is that REG_OPTION_CREATE_LINK flag that indicates this is supposed to be a link rather than a standard key. The KEY_WRITE access mask is required as well, as we are about to set the link’s target.

Now comes the first tricky part. The documentation states that the link’s target should be written to a value named “SymbolicLinkValue” and it must be an absolute registry path. Sounds easy enough, right? Wrong. The issue here is the “absolute path” – you might think that it should be something like “HKEY_CURRENT_USER\Control Panel\Desktop\Colors” just like we want, but hey – maybe it’s supposed to be “HKCU” instead of “HKEY_CURRENT_USER” – it’s just a string after all.

It turns out both these variants are wrong. The “absolute path” required here is a native Registry path that is not visible in RegEdit.exe, but it is visible in my own Registry editing tool, RegEditX.exe, downloadable from https://github.com/zodiacon/AllTools. Here is a screenshot, showing the “real” Registry vs. the view we get with RegEdit.

This top view is the “real” Registry is seen by the Windows kernel. Notice there is no HKEY_CURRENT_USER, there is a USER key where subkeys exist that represent users on this machine based on their SIDs. These are mostly visible in the standard Registry under the HKEY_USERS hive.

The “absolute path” needed is based on the real view of the Registry. Here is the code that writes the correct path based on my (current user’s) SID:

WCHAR path[] = L"\\REGISTRY\\USER\\S-1-5-21-2575492975-396570422-1775383339-1001\\Control Panel\\Desktop\\Colors";
RegSetValueEx(hKey, L"SymbolicLinkValue", 0, REG_LINK, (const BYTE*)path,
    wcslen(path) * sizeof(WCHAR));

The above code shows the second (undocumented, as far as I can tell) piece of crucial information – the length of the link path (in bytes) must NOT include the NULL terminator. Good luck guessing that 🙂

And that’s it. We can safely close the key and we’re done.

Well, almost. If you try to delete your newly created key using RegEdit.exe – the target is deleted, rather than the link key itself! So, how do you delete the key link? (My RegEditX does not support this yet).

The standard RegDeleteKey and RegDeleteKeyEx APIs are unable to delete a link. Even if they’re given a key handle opened with REG_OPTION_OPEN_LINK – they ignore it and go for the target. The only API that works is the native NtDeleteKey function (from NtDll.Dll).

First, we add the function’s declaration and the NtDll import:

extern "C" int NTAPI NtDeleteKey(HKEY);

#pragma comment(lib, "ntdll")

Now we can delete a link key like so:

HKEY hKey;
RegOpenKeyEx(HKEY_CURRENT_USER, L"DesktopColors", REG_OPTION_OPEN_LINK, 
    DELETE, &hKey);
NtDeleteKey(hKey);

As a final note, RegCreateKeyEx cannot open an existing link key – it can only create one. This in contrast to standard keys that can be created OR opened with RegCreateKeyEx. This means that if you want to change an existing link’s target, you have to call RegOpenKeyEx first (with REG_OPTION_OPEN_LINK) and then make the change (or delete the link key and re-create it).

Isn’t Registry fun?

How can I close a handle in another process?

Many of you are probably familiar with Process Explorer‘s ability to close a handle in any process. How can this be accomplished programmatically?

The standard CloseHandle function can close a handle in the current process only, and most of the time that’s a good thing. But what if you need, for whatever reason, to close a handle in another process?

There are two routes than can be taken here. The first one is using a kernel driver. If this is a viable option, then nothing can prevent you from doing the deed. Process Explorer uses that option, since it has a kernel driver (if launched with admin priveleges at least once). In this post, I will focus on user mode options, some of which are applicable to kernel mode as well.

The first issue to consider is how to locate the handle in question, since its value is unknown in advance. There must be some criteria for which you know how to identify the handle once you stumble upon it. The easiest (and probably most common) case is a handle to a named object.

Let take a concrete example, which I believe is now a classic, Windows Media Player. Regardless of what opnions you may have regarding WMP, it still works. One of it quirks, is that it only allows a single instance of itself to run. This is accomplished by the classic technique of creating a named mutex when WMP comes up, and if it turns out the named mutex already exists (presumabley created by an already existing instance of itself), send a message to its other instance and then exits.

The following screenshot shows the handle in question in a running WMP instance.

wmp1

This provides an opportunity to close that mutex’ handle “behind WMP’s back” and then being able to launch another instance. You can try this by manually closing the handle with Process Explorer and then launch another WMP instance successfully.

If we want to achieve this programmatically, we have to locate the handle first. Unfortunately, the documented Windows API does not provide a way to enumerate handles, not even in the current process. We have to go with the (officially undocumented) Native API if we want to enumerate handles. There two routes we can use:

  1. Enumerate all handles in the system with NtQuerySystemInformation, search for the handle in the PID of WMP.
  2. Enumerate all handles in the WMP process only, searching for the handle yet again.
  3. Inject code into the WMP process to query handles one by one, until found.

Option 3 requires code injection, which can be done by using the CreateRemoteThreadEx function, but requires a DLL that we inject. This technique is very well-known, so I won’t repeat it here. It has the advantage of not requring some of the native APIs we’ll be using shortly.

Options 1 and 2 look very similar, and for our purposes, they are. Option 1 retrieves too much information, so it’s probably better to go with option 2.

Let’s start at the beginning: we need to locate the WMP process. Here is a function to do that, using the Toolhelp API for process enumeration:

#include <windows.h>
#include <TlHelp32.h>
#include <stdio.h>

DWORD FindMediaPlayer() {
	HANDLE hSnapshot = ::CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
	if (hSnapshot == INVALID_HANDLE_VALUE)
		return 0;

	PROCESSENTRY32 pe;
	pe.dwSize = sizeof(pe);

	// skip the idle process
	::Process32First(hSnapshot, &pe);
	
	DWORD pid = 0;
	while (::Process32Next(hSnapshot, &pe)) {
		if (::_wcsicmp(pe.szExeFile, L"wmplayer.exe") == 0) {
			// found it!
			pid = pe.th32ProcessID;
			break;
		}
	}
	::CloseHandle(hSnapshot);
	return pid;
}


int main() {
	DWORD pid = FindMediaPlayer();
	if (pid == 0) {
		printf("Failed to locate media player\n");
		return 1;
	}
	printf("Located media player: PID=%u\n", pid);
	return 0;
}

Now that we have located WMP, let’s get all handles in that process. The first step is opening a handle to the process with PROCESS_QUERY_INFORMATION and PROCESS_DUP_HANDLE (we’ll see why that’s needed in a little bit):

HANDLE hProcess = ::OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_DUP_HANDLE,
	FALSE, pid);
if (!hProcess) {
	printf("Failed to open WMP process handle (error=%u)\n",
		::GetLastError());
	return 1;
}

If we can’t open a proper handle, then something is terribly wrong. Maybe WMP closed in the meantime?

Now we need to work with the native API to query the handles in the WMP process. We’ll have to bring in some definitions, which you can find in the excellent phnt project on Github (I added extern "C" declaration because we use a C++ file).

#include <memory>

#pragma comment(lib, "ntdll")

#define NT_SUCCESS(status) (status >= 0)

#define STATUS_INFO_LENGTH_MISMATCH      ((NTSTATUS)0xC0000004L)

enum PROCESSINFOCLASS {
	ProcessHandleInformation = 51
};

typedef struct _PROCESS_HANDLE_TABLE_ENTRY_INFO {
	HANDLE HandleValue;
	ULONG_PTR HandleCount;
	ULONG_PTR PointerCount;
	ULONG GrantedAccess;
	ULONG ObjectTypeIndex;
	ULONG HandleAttributes;
	ULONG Reserved;
} PROCESS_HANDLE_TABLE_ENTRY_INFO, * PPROCESS_HANDLE_TABLE_ENTRY_INFO;

// private
typedef struct _PROCESS_HANDLE_SNAPSHOT_INFORMATION {
	ULONG_PTR NumberOfHandles;
	ULONG_PTR Reserved;
	PROCESS_HANDLE_TABLE_ENTRY_INFO Handles[1];
} PROCESS_HANDLE_SNAPSHOT_INFORMATION, * PPROCESS_HANDLE_SNAPSHOT_INFORMATION;

extern "C" NTSTATUS NTAPI NtQueryInformationProcess(
	_In_ HANDLE ProcessHandle,
	_In_ PROCESSINFOCLASS ProcessInformationClass,
	_Out_writes_bytes_(ProcessInformationLength) PVOID ProcessInformation,
	_In_ ULONG ProcessInformationLength,
	_Out_opt_ PULONG ReturnLength);

The #include <memory> is for using unique_ptr<> as we’ll do soon enough. The #parma links the NTDLL import library so that we don’t get an “unresolved external” when calling NtQueryInformationProcess. Some people prefer getting the functions address with GetProcAddress so that linking with the import library is not necessary. I think using GetProcAddress is important when using a function that may not exist on the system it’s running on, otherwise the process will crash at startup, when the loader (code inside NTDLL.dll) tries to locate a function. It does not care if we check dynamically whether to use the function or not – it will crash. Using GetProcAddress will just fail and the code can handle it. In our case, NtQueryInformationProcess existed since the first Windows NT version, so I chose to go with the simplest route.

Our next step is to enumerate the handles with the process information class I plucked from the full list in the phnt project (ntpsapi.h file):

ULONG size = 1 << 10;
std::unique_ptr<BYTE[]> buffer;
for (;;) {
	buffer = std::make_unique<BYTE[]>(size);
	auto status = ::NtQueryInformationProcess(hProcess, ProcessHandleInformation, 
		buffer.get(), size, &size);
	if (NT_SUCCESS(status))
		break;
	if (status == STATUS_INFO_LENGTH_MISMATCH) {
		size += 1 << 10;
		continue;
	}
	printf("Error enumerating handles\n");
	return 1;
}

The Query* style functions in the native API request a buffer and return STATUS_INFO_LENGTH_MISMATCH if it’s not large enough or not of the correct size. The code allocates a buffer with make_unique<BYTE[]> and tries its luck. If the buffer is not large enough, it receives back the required size and then reallocates the buffer before making another call.

Now we need to step through the handles, looking for our mutex. The information returned from each handle does not include the object’s name, which means we have to make yet another native API call, this time to NtQyeryObject along with some extra required definitions:

typedef enum _OBJECT_INFORMATION_CLASS {
	ObjectNameInformation = 1
} OBJECT_INFORMATION_CLASS;

typedef struct _UNICODE_STRING {
	USHORT Length;
	USHORT MaximumLength;
	PWSTR  Buffer;
} UNICODE_STRING;

typedef struct _OBJECT_NAME_INFORMATION {
	UNICODE_STRING Name;
} OBJECT_NAME_INFORMATION, * POBJECT_NAME_INFORMATION;

extern "C" NTSTATUS NTAPI NtQueryObject(
	_In_opt_ HANDLE Handle,
	_In_ OBJECT_INFORMATION_CLASS ObjectInformationClass,
	_Out_writes_bytes_opt_(ObjectInformationLength) PVOID ObjectInformation,
	_In_ ULONG ObjectInformationLength,
	_Out_opt_ PULONG ReturnLength);

NtQueryObject has several information classes, but we only need the name. But what handle do we provide NtQueryObject? If we were going with option 3 above and inject code into WMP’s process, we could loop with handle values starting from 4 (the first legal handle) and incrementing the loop handle by four.

Here we are in an external process, so handing out the handles provided by NtQueryInformationProcess does not make sense. What we have to do is duplicate each handle into our own process, and then make the call. First, we set up a loop for all handles and duplicate each one:

auto info = reinterpret_cast<PROCESS_HANDLE_SNAPSHOT_INFORMATION*>(buffer.get());
for (ULONG i = 0; i < info->NumberOfHandles; i++) {
	HANDLE h = info->Handles[i].HandleValue;
	HANDLE hTarget;
	if (!::DuplicateHandle(hProcess, h, ::GetCurrentProcess(), &hTarget, 
		0, FALSE, DUPLICATE_SAME_ACCESS))
		continue;	// move to next handle
	}

We duplicate the handle from WMP’s process (hProcess) to our own process. This function requires the handle to the process opened with PROCESS_DUP_HANDLE.

Now for the name: we need to call NtQueryObject with our duplicated handle and buffer that should be filled with UNICODE_STRING and whatever characters make up the name.

BYTE nameBuffer[1 << 10];
auto status = ::NtQueryObject(hTarget, ObjectNameInformation, 
	nameBuffer, sizeof(nameBuffer), nullptr);
::CloseHandle(hTarget);
if (!NT_SUCCESS(status))
	continue;

Once we query for the name, the handle is not needed and can be closed, so we don’t leak handles in our own process. Next, we need to locate the name and compare it with our target name. But what is the target name? We see in Process Explorer how the name looks. It contains the prefix used by any process (except UWP processes): “\Sessions\<session>\BasedNameObjects\<thename>”. We need the session ID and the “real” name to build our target name:

WCHAR targetName[256];
DWORD sessionId;
::ProcessIdToSessionId(pid, &sessionId);
::swprintf_s(targetName,
	L"\\Sessions\\%u\\BaseNamedObjects\\Microsoft_WMP_70_CheckForOtherInstanceMutex", 
	sessionId);
auto len = ::wcslen(targetName);

This code should come before the loop begins, as we only need to build it once.

Not for the real comparison of names:

auto name = reinterpret_cast<UNICODE_STRING*>(nameBuffer);
if (name->Buffer && 
	::_wcsnicmp(name->Buffer, targetName, len) == 0) {
	// found it!
}

The name buffer is cast to a UNICODE_STRING, which is the standard string type in the native API (and the kernel). It has a Length member which is in bytes (not characters) and does not have to be NULL-terminated. This is why the function used is _wcsnicmp, which can be limited in its search for a match.

Assuming we find our handle, what do we do with it? Fortunately, there is a trick we can use that allows closing a handle in another process: call DuplicateHandle again, but add the DUPLICATE_CLOSE_SOURCE to close the source handle. Then close our own copy, and that’s it! The mutex is gone. Let’s do it:

// found it!
::DuplicateHandle(hProcess, h, ::GetCurrentProcess(), &hTarget,
	0, FALSE, DUPLICATE_CLOSE_SOURCE);
::CloseHandle(hTarget);
printf("Found it! and closed it!\n");
return 0;

This is it. If we get out of the loop, it means we failed to locate the handle with that name. The general technique of duplicating a handle and closing the source is applicable to kernel mode as well. It does require a process handle with PROCESS_DUP_HANDLE to make it work, which is not always possible to get from user mode. For example, protected and PPL (protected processes light) processes cannot be opened with this access mask, even by administrators. In kernel mode, on the other hand, any process can be opened with full access.