誰在死鎖Mutex——用Windbg查詢Mutex死鎖所有者執行緒

gufeng99發表於2015-07-01
05
Aug 2006

By Ingo Rammer

I'm currently toying with the idea of creating a small frontend to SOS.DLL (CDB/WinDbg) and KD. This frontend should simplify some production debugging tasks based on crash dumps. (In my prototype, I'm already happily clicking myself through managed code threads, callstacks, parameters, locals and browsing objects' contents just like in Visual Studio, but solely based on crash dumps.)

A lot of debugging work is in general based on identifying a set of rather common issues. But even if you are tracking down a very common issue, it's still a tough call to work through 25 managed code callstacks to find out - for example - why a certain thread might be blocking a number of other threads. This was the rationale behind some of the features I'm thinking about implementing in the debugging frontend tool: to simplify and automate some manual tasks. But this tool is not (yet) ready for writing about, so lets look at the manual process for finding out whos blocking your thread.

For the analysis of thread blocks based on WaitHandles, I've found two kinds of possible solutions: one which is not 100% deterministic but works on crash dumps (but only for named synchronization objects) and another one which is deterministic but only works on the live system with a kernel debugger. I'll show you both approaches in this post.

First Approach - User Mode only, but works on Dumps

Note: This approach is not deterministic and not accurate but based on the automated equivalent of trial and error in the form of!DumpStackObjects. It's essentially based on looking for WaitHandle references on a callstack and similar means to get the identifier of a named synchronization object. The only good news is that it works on dumps generated by ADPlus (and of course on live systems as well).

Ok. Let's get startet. Suppose you have a process in which several threads end up blocking others for periods longer than you would expect and you basically want to know which threads block on resources held by which other threads. You've created a -hang dump with ADPlus and are ready to examine the dump in WinDbg or CDB.

First, you will need to find out which threads are actually waiting for something. To do this, you first load SOS.DLL and then look at the states of your managed threads:

0:013> .loadby sos mscorwks     // load the extension sos.dll in directory of mscorwks
0:013> !threads                   // list all managed threads
ThreadCount: 12
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ   State     GC       Context       Domain   Count APT Exception
   0    1  b70 00180fd0   200a020 Enabled  00000000:00000000 0014c100     0 MTA
   2    2 1b54 00152480      b220 Enabled  00000000:00000000 0014c100     0 MTA (Finalizer)
   3    3 1ba0 0019a1c0   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   4    4 1f9c 0019c138   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   5    5  210 0019cd18   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   6    6 13e0 0019d8d0   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   7    7 17e0 0019e488   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   8    8 1204 0019f040   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   9    9 1ec0 0019fbf8      b020 Disabled 00000000:00000000 0014c100     0 MTA
  10    a 1a48 001a07b0      b020 Disabled 00000000:00000000 0014c100     0 MTA
  11    b 1558 001a1368   200b020 Enabled  00000000:00000000 0014c100     0 MTA
  12    c 1d88 001a1f20   200b020 Enabled  00000000:00000000 0014c100     0 MTA

The threads with state 200a020 are currently waiting. As this does not necessarily mean that they are waiting for a WaitHandle (they could also just have called Thread.Sleep()), you now also need to find out which of these blocked threads are actually blocking on a WaitHandle.

To do this, you will have to look at the top managed code stackframe in each of the blocked threads. (Please note: You have to specify the native thread ID for the next command. This ID is contained in the first column of !threads, and is not the CLR thread id contained in the second column!)

0:007> ~0e !clrstack            // list the managed call stack of thread 0 (CLR thread #1)
OS Thread Id: 0xb70 (0)
ESP       EIP
0012f420 7c90eb94 [HelperMethodFrame: 0012f420] System.Threading.Thread.SleepInternal(Int32)
0012f474 793d80f5 System.Threading.Thread.Sleep(Int32)
0012f478 00ca00fb ResourceLock.Program.Main(System.String[])
0012f69c 79e88f63 [GCFrame: 0012f69c]

As you can see in this first example, the specified thread does not block on a WaitHandle, so let's continue our search. (Note: instead of manually using ~<THREAD_NUMBER>e !clrstack for each thread, you could also call ~*e !clrstack, do dump the complete managed code callstacks for all threads.)

0:007> ~7e !clrstack           // list the managed call stack of thread 7 (also CLR thread #7)
OS Thread Id: 0x17e0 (7)
ESP       EIP
011cf588 7c90eb94 [HelperMethodFrame_1OBJ: 011cf588] System.Threading.WaitHandle.WaitOneNative(Microsoft.Win32.SafeHandles.SafeWaitHandle, U
Int32, Boolean, Boolean)
011cf634 793d424e System.Threading.WaitHandle.WaitOne(Int64, Boolean)
011cf64c 793d4193 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
011cf65c 793d420e System.Threading.WaitHandle.WaitOne()
011cf660 00ca0250 ResourceLock.Program.ThreadProc()
011cf6b4 793d7a7b System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
011cf6bc 793683dd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
011cf6d4 793d7b5c System.Threading.ThreadHelper.ThreadStart()
011cf8f8 79e88f63 [GCFrame: 011cf8f8]

Here, the top level call stack elements contain calls to WaitHandle.WaitOne() or WaitOneNative(). This thread is currently waiting for a synchronization object.

To retrieve details of the underlying sync object, you first have to look at the unmanaged callstack for the same thread. You will usually send a command like "kb 5" to the debugger after selecting the current thread with ~7 s.

0:007> ~7 s                   // select thread 7 if its not the current thread
eax=00000000 ebx=011cf350 ecx=79690098 edx=79690048 esi=00000000 edi=7ffde000
eip=7c90eb94 esp=011cf328 ebp=011cf3c4 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023 ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
ntdll!KiFastSystemCallRet:
7c90eb94 c3              ret
0:007> kb 5                   // print five stack frames
ChildEBP RetAddr  Args to Child
011cf324 7c90e9ab 7c8094f2 00000001 011cf350 ntdll!KiFastSystemCallRet
011cf328 7c8094f2 00000001 011cf350 00000000 ntdll!ZwWaitForMultipleObjects+0xc
011cf3c4 79f8ead4 00000001 011cf614 00000001 KERNEL32!WaitForMultipleObjectsEx+0x12c
011cf42c 79f17522 00000001 011cf614 00000001 mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0x6f
011cf44c 79f17493 00000001 011cf614 00000001 mscorwks!Thread::DoAppropriateAptStateWait+0x3c 

Please note: This output is what youll receive if you have debug symbols configured correctly. While its definitely not as nice as the one with symbols, you can still get the same information (minus the method names) even if you do not have debug symbols installed:

0:007> kb 5                   // print five stack frames
ChildEBP RetAddr  Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
011cf3c4 79f8ead4 00000001 011cf614 00000001 ntdll!KiFastSystemCallRet
011cf42c 79f17522 00000001 011cf614 00000001 mscorwks!StrongNameFreeBuffer+0x57fc
011cf44c 79f17493 00000001 011cf614 00000001 mscorwks!CreateAssemblyCache+0x9474
011cf4d0 79f1732f 00000001 011cf614 00000001 mscorwks!CreateAssemblyCache+0x93e5
011cf520 7a07b49c 00000001 011cf614 00000001 mscorwks!CreateAssemblyCache+0x9281

Here, I'll ignore the first three lines in any case, no matter if we have symbols or not. The next line is usually a call to WaitForMultipleObjectEx or a comparable function. This function is defined like this:

DWORD WaitForMultipleObjectsEx(
  DWORD nCount,
  const HANDLE* lpHandles,
  BOOL bWaitAll,
  DWORD dwMilliseconds,
  BOOL bAlertable
);

In the third, fourth and fifth field of the output of kb 5, we can get the first three arguments to the method. We therefore know that nCount = 1 and lpHandles = 011cf614. This means that the .NET Framework has passed an array of handles which contains 1 (nCount) entry and starts at the location 011cf614 (lpHandles) to WaitForMultipleObjectsEx.

You can then call"dd <address> L<number_of_handles>"(for example"dd 011cf614 L00000001") to the debugger to generate a memory dump to retrieve the list of handles.

0:007> dd 011cf614 L1     // dump 1 dword at location 011cf614
011cf614  00000684

In this command, the first field in each line contains the memory address and the subsequent fields (only one in our case) contain the handle to the object you are waiting for.

For each handle, you can then then call"!handle <handle> f", for example"handle 00000684 f"to retrieve handle information:

0:007> !handle 00000684 f            // dump detailed information (flags = f) for handle 684
Handle 684
  Type          Mutant
  Attributes    0
  GrantedAccess 0x1f0001:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState
  HandleCount   4
  PointerCount  8
  Name         \BaseNamedObjects\{1CA3A6E4-68E7-4847-9602-5669BC40D01E}
  Object Specific Information
    Mutex is Owned

In this case, you now know that the thread is waiting for a Mutex (Mutant is the kernel-level name for a Mutex) which has been named "{1CA3A6E4-68E7-4847-9602-5669BC40D01E}".

The next step is to find out who currently owns this mutex. This would be relatively straightforward in a live debug as we could use a kernel debugger to identify the owning thread (which I'll explain later). In a crash dump however, we are somewhat limited as we have to hunt for this information in a trial-and-error way. We are also limited to owner threads in the same process if the named mutex is owned by a thread in another process, we won't be able to get there with our single-process memory dump.

This trial-and-error information retrieval comes in the form of !DumpStackObjects (which can be abbreviated to !dso), an extension command which walks the stack of the selected thread and checks if each value it finds on the stack could be a managed object.

0:007> ~0e !dso                     // dump stack objects for thread #0
OS Thread Id: 0xb70 (0)
ESP/REG  Object   Name
0012f408 01273450 System.IO.StreamWriter
0012f428 01274768 System.Char[]
0012f478 01271b38 System.Object[]    (System.String[])
0012f534 01271b38 System.Object[]    (System.String[])
0012f6e0 01271b38 System.Object[]    (System.String[])
0012f708 01271b38 System.Object[]    (System.String[])

Theres not a lot going on in this thread. Certainly no Mutex here, let's go on:

0:007> ~4e !dso                     // dump stack objects for thread #4
OS Thread Id: 0x1f9c (4)
ESP/REG  Object   Name
00ecf748 01273b30 System.String    Thread 4 sleeping
00ecf770 01273450 System.IO.StreamWriter
00ecf790 01273b64 System.Char[]
00ecf804 01273a40 System.Threading.ThreadStart
00ecf808 01273a40 System.Threading.ThreadStart
00ecf80c 01273a94 System.Threading.ThreadHelper
00ecf814 01271db8 System.String    Thread
00ecf818 01273a94 System.Threading.ThreadHelper
00ecf81c 01271d98 System.Threading.ContextCallback
00ecf820 01273a94 System.Threading.ThreadHelper
00ecf82c 01273a94 System.Threading.ThreadHelper
00ecf834 01273ac8 System.Threading.ExecutionContext
00ecf844 01273ac8 System.Threading.ExecutionContext
00ecf848 01273a94 System.Threading.ThreadHelper
00ecf850 01273a94 System.Threading.ThreadHelper
00ecf914 01273aa8 System.Threading.ThreadStart
00ecfac0 01273aa8 System.Threading.ThreadStart
00ecfad0 01273aa8 System.Threading.ThreadStart

Thread #4 looks more interesting but contains no reference to any synchronization object, let's continue again:

0:007> ~3e !dso                     // dump stack objects for thread #3
OS Thread Id: 0x1ba0 (3)
ESP/REG  Object   Name
00daf7c8 012739d8 System.String    Thread 3 sleeping
00daf7f0 01273450 System.IO.StreamWriter
00daf810 01273a0c System.Char[]
00daf880 0127383c System.Threading.Mutex
00daf884 01271db8 System.String    Thread
00daf888 01271db8 System.String    Thread
00daf88c 01271db8 System.String    Thread
00daf898 01271cb8 System.Threading.ThreadHelper
00daf89c 01271d98 System.Threading.ContextCallback
00daf8a0 01271cb8 System.Threading.ThreadHelper
00daf8ac 01271cb8 System.Threading.ThreadHelper
00daf8b4 01271d2c System.Threading.ExecutionContext
00daf8c4 01271d2c System.Threading.ExecutionContext
00daf8c8 01271cb8 System.Threading.ThreadHelper
00daf8d0 01271cb8 System.Threading.ThreadHelper
00daf994 01271ccc System.Threading.ThreadStart
00dafb40 01271ccc System.Threading.ThreadStart
00dafb50 01271ccc System.Threading.ThreadStart

At first glance, thread #3 looks innocent too. Only when examining it more closely, you'll find a Mutex as the fourth stack object. (Of course, this is now total guesswork as we don't know if the thread ever really acquired the mutex or if it is just looking at it. The thread might also have already released the mutex as well but I don't know if we can find an answer to these questions without looking at kernel memory.)

First, lets look if the method acquires a mutex at all. To do this, we first need to find out the method which owns this section of the stack. To do this, we take the value of ESP for our Mutex (which is the address on the stack which references the mutex) which is 00daf880 in our case and compare it to the managed code stack dump of the same thread:

0:007> ~3 s                   // select thread #3
eax=7a38cf40 ebx=00000000 ecx=79ed75c0 edx=fbffffff esi=00000000 edi=00daf700
eip=7c90eb94 esp=00daf6d0 ebp=00daf728 iopl=0         nv up ei pl nz na pe nc
cs=001b  ss=0023 ds=0023  es=0023  fs=003b  gs=0000             efl=00000206
ntdll!KiFastSystemCallRet:
7c90eb94 c3              ret
0:003> !clrstack               // dump the managed code stack for this method
OS Thread Id: 0x1ba0 (3)
ESP       EIP
00daf808 7c90eb94 [HelperMethodFrame: 00daf808] System.Threading.Thread.SleepInternal(Int32)
00daf85c 793d80f5 System.Threading.Thread.Sleep(Int32)
00daf860 00ca02c3 ResourceLock.Program.ThreadProc()
00daf8b4 793d7a7b System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
00daf8bc 793683dd System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00daf8d4 793d7b5c System.Threading.ThreadHelper.ThreadStart()
00dafaf8 79e88f63 [GCFrame: 00dafaf8]

In this stack dump, you can see that the stack address of the reference to the mutex (00daf880) lies between the third and the fourth frame's ESP. You can deduce that the object is created in the third frame, in the method ResourceLock.Program.ThreadProc().

To verify this, you can have a look at the IL code of this method. First you need to convert the Instruction Pointer (IP) to a method description. The IP is 00ca02c3 according to the output of !clrstack (second field in the stack frame for ThreadProc()).

0:003> !IP2MD 00ca02c3        // convert an instruction pointer (IP) to a method description (MethodDesc)
MethodDesc: 00912ff8
Method Name: ResourceLock.Program.ThreadProc()
Class: 009112c4
MethodTable: 00913010
mdToken: 06000003
Module: 00912c14
IsJitted: yes
m_CodeOrIL: 00ca0188

The resulting MethodDesc is 00912ff8. You can use this a parameter to !DumpIL to retrieve the IL code of this method:

0:003> !dumpil 00912ff8        // get the IL code for a method
ilAddr = 004020b4
IL_0000: nop
IL_0001: call System.Threading.Thread::get_CurrentThread
  ...
IL_004a: newobj System.Threading.Mutex::.ctor
IL_004f: stloc.2
IL_0050: ldloc.2
IL_0051: callvirt System.Threading.WaitHandle::WaitOne

IL_0044 to IL_51 roughly translate to the following C# fragment:

Mutex mtx = new Mutex (...);
mtx.WaitOne();

So, yes, the method definitely acquires at least one mutex at one point. But to find out if the the Mutex we've found on the stack really references the same named synchronization object the other thread is blocking on, we need to examine it more closely. First, let's have a look at the contents of the mutex by using !DumpObj (which we abbreviate to !do).

0:003> !do 0127383c     // dump the mutex. Address is from ~3e !dso
Name: System.Threading.Mutex
MethodTable: 7910d018
EEClass: 7910cfa0
Size: 24(0x18) bytes
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
790f9c18  4000184       4        System.Object  0 instance 00000000 __identity
790fe160  40005a6        c        System.IntPtr  0 instance     1752 waitHandle
7910cf3c  40005a7        8 ...es.SafeWaitHandle  0 instance 012738d8 safeWaitHandle
79104f64  40005a8       10       System.Boolean  0 instance        1 hasThreadAffinity
790fe160  40005a9     984        System.IntPtr  0   shared   static InvalidHandle
    >> Domain:Value 0014c100:-1 <<
79104f64  40005ee      990       System.Boolean  0   shared   static dummyBool
    >> Domain:Value 0014c100:NotInit  <<
7910d018  40005ef      14c ...m.Threading.Mutex  0   shared   static s_ReservedMutex
    >> Domain:Value 0014c100:NotInit  <<

The second field is called waitHandle and contains the value 1752. This is the decimal handle to the underlying mutex. Lets convert this to hex (6d8) and have a look at the handle:

0:003> !handle 6d8 f          // look at handle 6d8 (decimal 1752)
Handle 6d8
  Type          Mutant
  Attributes    0
  GrantedAccess 0x1f0001:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState
  HandleCount   4
  PointerCount  8
  Name         \BaseNamedObjects\{1CA3A6E4-68E7-4847-9602-5669BC40D01E}
  Object Specific Information
    Mutex is Owned

As you can see, thread #3 acquired the mutex with the name {1CA3A6E4-68E7-4847-9602-5669BC40D01E} using wait handle 6d8, and this is the exact same named mutex which thread #7 is waiting for on wait handle 684. It seems that we got lucky this time and could find the mutex in our crash dump. Whats left as an exercise is to dump the !clrstack of thread #3 and examine the top stackframes using !IP2MD and !DumpIL to find out why the mutex has not been released in time.

The Deterministic Second Approach - or: Now give me my Kernel Debugger!

If someone would have told me just two years ago that I'd ever start a kernel debugger, I wouldn't have believed him: after all, I'm not writing device drivers and that's all you need KD.EXE for and it's quite difficult to setup kernel debugging in the first place, right? Well, turns out that neither is the case.

If you only want to examine kernel structures on your local machine, you can start a kernel debugger as easily as KD.EXE /KL (and it's part of the same Debugging Tools for Windows package which contains WinDbg and CDB). Theres no need to setup a second machine and connect the debugger to it or anything like this. The only really important part is that contrary to WinDbg and CDB you absolutely have to have debug symbols configured correctly for what we are doing now. Usually its enough to just create a directory like c:\symbols, set _NT_SYMBOL_PATH to the matching entry SRV*c:\symbols*http://msdl.microsoft.com/download/symbols and start KD. The necessary symbols will be downloaded automatically behind the scenes whenever they are needed for the first time. (Note that this also means that KD will block from time to time while it's downloading symbols on demand.)

C:\debug>set _NT_SYMBOL_PATH=SRV*C:\symbols*http://msdl.microsoft.com/download/symbols 
C:\debug>kd /KL

Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.

Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE
Symbol search path is: SRV*C:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
*******************************************************************************
WARNING: Local kernel debugging requires booting with /debug to work optimally.
*******************************************************************************
Windows XP Kernel Version 2600 (Service Pack 2) UP Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 2600.xpsp_sp2_gdr.050301-1519
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055a420
Debug session time: Sat Aug  5 11:19:11.543 2006 (GMT+2)
System Uptime: 6 days 14:14:43.119
lkd>

Well, allright then. Let's start to hunt for the mutex. And don't forget: you can run WinDbg/CDB *and* KD at the same time to debug the user-mode and the kernel-mode parts side by side!

First, we need to get information about the process which we are debugging. The EXE in my example is called ResourceLock.exe, but you can use the !process command in KD to retrieve the necessary entry point for any process on your machine: (I'll use red color coding for KD commands and blue color coding for CDB commands)

lkd> !process 0 0 ResourceLock.exe         // retrieve information for all processes running ResourceLock.exe
PROCESS 8900e020  SessionId: 0 Cid: 1cd0    Peb: 7ffde000  ParentCid: 12a0
    DirBase: 0cb83000  ObjectTable: e72125c8  HandleCount: 122.
    Image: ResourceLock.exe

The !process command can also be used to retrieve additional information. It accepts two numeric parameters: the process address and a list of flags. If you specify the process address we've retrieved above (8900e020) and the value 2 for flags, you'll get detailed information about the individual threads and their synchronization objects:

Important: If you are running WinDbg/CDB and KD side-by-side for the same process, you must switch WinDbg/CDB in to run mode (F5 or g) to retrieve this information.

lkd> !process 8900e020 2            // retrieve thread info for process
PROCESS 8900e020  SessionId: 0 Cid: 1cd0    Peb: 7ffde000  ParentCid: 12a0
    DirBase: 0cb83000 ObjectTable: e72125c8  HandleCount: 122.
    Image: ResourceLock.exe
 
        THREAD 89285020  Cid 1cd0.0b70  Teb: 7ffdd000 Win32Thread: e52fe420 WAIT: (DelayExecution) UserMode Alertable
            89285110 NotificationTimer
 
        THREAD 8a02f280  Cid 1cd0.0268  Teb: 7ffdc000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable
            891da5f8 SynchronizationEvent
            8a13d6f8 SynchronizationEvent
            89d50020 SynchronizationEvent
 
        THREAD 88f9fd58  Cid 1cd0.1b54  Teb: 7ffdb000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Non-Alertable
            8a2fe310 NotificationEvent
            8974b6f0 SynchronizationEvent
 
        THREAD 89d86da8  Cid 1cd0.1ba0  Teb: 7ffda000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            89d86e98 NotificationTimer
 
        THREAD 8902e020  Cid 1cd0.1f9c  Teb: 7ffd9000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            8902e110 NotificationTimer
 
        THREAD 87b86020  Cid 1cd0.0210  Teb: 7ffd8000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            87b86110 NotificationTimer
 
        THREAD 8971b828  Cid 1cd0.13e0  Teb: 7ffd7000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            8971b918  NotificationTimer
 
        THREAD 87ba5020  Cid 1cd0.17e0  Teb: 7ffd6000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable
            89d48fc0  Mutant - owning thread 89d86da8
 
        THREAD 88724da8  Cid 1cd0.1204  Teb: 7ffd5000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            88724e98 NotificationTimer
 
        THREAD 8869e548  Cid 1cd0.1ec0  Teb: 7ffd4000 Win32Thread: 00000000 READY
        THREAD 88e28020  Cid 1cd0.1a48  Teb: 7ffaf000 Win32Thread: 00000000 READY
        THREAD 88ab5020  Cid 1cd0.1558  Teb: 7ffae000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            88ab5110 NotificationTimer
 
        THREAD 88827360  Cid 1cd0.1d88  Teb: 7ffad000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable
            89d48fc0  Mutant - owning thread 89d86da8

Fantastic, isn't it? Everything we've been looking for with trial-and-error analysis with WinDbg/CDB is contained in this single output.

But let's just look at the three important threads for this list (the two which have a Mutant in their list of kernel objects, and the thread which is specified as the owner for this mutant):

        THREAD 89d86da8  Cid 1cd0.1ba0  Teb: 7ffda000 Win32Thread: 00000000 WAIT: (DelayExecution) UserMode Alertable
            89d86e98 NotificationTimer 
        THREAD 87ba5020  Cid 1cd0.17e0  Teb: 7ffd6000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable
            89d48fc0  Mutant - owning thread 89d86da8 
        THREAD 88827360  Cid 1cd0.1d88  Teb: 7ffad000 Win32Thread: 00000000 WAIT: (UserRequest) UserMode Alertable
            89d48fc0  Mutant - owning thread 89d86da8

WAIT: (DelayExecution) in the first line means that the thread is sleeping. The next line indicates the kernel object which the thread is waiting for. It is a NotificationTimer and its a safe bet to assume that this is the timer which has caused the DelayExecution in the first place. By the way: If you've forgotten to switch your user mode debugger into run mode - just like it happens to me all the time - your threads will be marked as WAIT: (Suspended) KernelMode Non-Alertable. In this case, just hit F5 or g in your user mode debugger and re-try the !process command.

The second and third thread in this list are both in WAIT: (UserRequest). The synchronization object is in both cases the Mutant which is currently owned by thread 89d86da8. This owner thread is the first thread in our list. To match this information back to our user-mode information, we need to look at the thread's Cid, which is 1cd0.1ba0 in our case. Of this Cid, the first part (1cd0) is the Process ID and the second part (1ba0) is the Thread ID.

If you then switch back to the user mode debugger and break into the application (CTRL+BREAK), you can find the corresponding CLR thread by using !threads:

0:013> !threads     // list all CLR threads
ThreadCount: 12
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ   State     GC       Context       Domain   Count APT Exception
   0    1  b70 00180fd0   200a020 Enabled  00000000:00000000 0014c100     0 MTA
   2    2 1b54 00152480      b220 Enabled  00000000:00000000 0014c100     0 MTA (Finalizer)
   3    3 1ba0 0019a1c0   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   4    4 1f9c 0019c138   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   5    5  210 0019cd18   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   6    6 13e0 0019d8d0   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   7    7 17e0 0019e488   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   8    8 1204 0019f040   200b020 Enabled  00000000:00000000 0014c100     0 MTA
   9    9 1ec0 0019fbf8     b020  Disabled 00000000:00000000 0014c100     0 MTA
  10    a 1a48 001a07b0     b020  Disabled 00000000:00000000 0014c100     0 MTA
  11    b 1558 001a1368   200b020 Enabled  00000000:00000000 0014c100     0 MTA
  12    c 1d88 001a1f20   200b020 Enabled  00000000:00000000 0014c100     0 MTA

The third column in the thread list is called OSID and contains the operating system thread id. If you look for the thread ID 1ba0 which we retrieved from the kernel debugger's Cid, you will find that thread #3 is the one who blocks the Mutex. This is the same information which we got from the first approach, but now its a 100% deterministic and a lot simpler as well. I just needed to let go of my fear of kernel debugging before venturing in this direction ...

轉自:http://weblogs.thinktecture.com/ingo/2006/08/who-is-blocking-that-mutex---fun-with-windbg-cdb-and-kd.html


相關文章