如何在 NET 程式萬種死法中有效的生成 Dump (上)

一線碼農發表於2021-04-15

一:背景

相信很多人都知道通過 工作管理員 抓取dump,雖然簡單粗暴,但無法滿足程式的無數種死法,比如:

  • 記憶體膨脹,程式爆炸

  • CPU爆高,程式累死

  • 應用無響應,使用者氣死

  • 意外退出,和人生一樣

既然手工太弱雞,那有什麼好的工具呢? 除了 adplus,本文推薦一款神器 procdump, 下載地址:https://docs.microsoft.com/zh-cn/sysinternals/downloads/procdump ,還能支援 linux ???,具體怎麼安裝就不細說了。

二:記憶體膨脹,程式爆炸

記憶體膨脹 這種情況我相信很有朋友都遇到過,我見過最多的案例就是用了小快取 static,然後有意無意的忘記釋放,導致無限堆積終爆炸,那這種怎麼用 procdump 去抓呢?

為了方便演示,我先寫一個無限分配記憶體的例子。


        static void Main(string[] args)
        {
            List<string> list = new List<string>();

            for (int i = 0; i < int.MaxValue; i++)
            {
                list.Add(string.Join(",", Enumerable.Range(0, 10000)));
            }

            Console.ReadLine();
        }

將程式跑起來後,設定 procdump 在記憶體超過 1G 的時候自動抓取全記憶體 dump,使用如下命令.


C:\Windows\system32>procdump  ConsoleApp2 -m 1024 -ma E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug

ProcDump v10.0 - Sysinternals process dump utility
Copyright (C) 2009-2020 Mark Russinovich and Andrew Richards
Sysinternals - www.sysinternals.com

Process:               ConsoleApp2.exe (24112)
Process image:         E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe
CPU threshold:         n/a
Performance counter:   n/a
Commit threshold:      >= 1024 MB
Threshold seconds:     10
Hung window check:     Disabled
Log debug strings:     Disabled
Exception monitor:     Disabled
Exception filter:      [Includes]
                       *
                       [Excludes]
Terminate monitor:     Disabled
Cloning type:          Disabled
Concurrent limit:      n/a
Avoid outage:          n/a
Number of dumps:       1
Dump folder:           E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\
Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSS
Queue to WER:          Disabled
Kill after dump:       Disabled


Press Ctrl-C to end monitoring without terminating the process.

[21:23:43] Commit:    1087Mb
[21:23:43] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_212343.dmp
[21:23:43] Dump 1 writing: Estimated dump file size is 1179 MB.
[21:23:44] Dump 1 complete: 1179 MB written in 1.3 seconds
[21:23:44] Dump count reached.

從最後五行可以看出,當記憶體達到 1087M 的時候自動生成了 dump 檔案,接下來用 windbg 看一看。

  • 檢視當前 process 的記憶體佔用量,使用 !address -summary 即可

0:000> !address -summary

                                     
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free                                     63          b30b4000 (   2.798 GB)           69.94%
<unknown>                               228          48547000 (   1.130 GB)  93.99%   28.25%
Image                                   210           4115000 (  65.082 MB)   5.29%    1.59%
Stack                                    21            700000 (   7.000 MB)   0.57%    0.17%
Heap                                     12            170000 (   1.438 MB)   0.12%    0.04%
Other                                     7             5a000 ( 360.000 kB)   0.03%    0.01%
TEB                                       7             13000 (  76.000 kB)   0.01%    0.00%
PEB                                       1              3000 (  12.000 kB)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE                             250          47121000 (   1.110 GB)  92.36%   27.76%
MEM_IMAGE                               217           411e000 (  65.117 MB)   5.29%    1.59%
MEM_MAPPED                               19           1cfd000 (  28.988 MB)   2.35%    0.71%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE                                 63          b30b4000 (   2.798 GB)           69.94%
MEM_COMMIT                              357          47f12000 (   1.124 GB)  93.49%   28.10%
MEM_RESERVE                             129           502a000 (  80.164 MB)   6.51%    1.96%

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE                          177          437d5000 (   1.055 GB)  87.70%   26.36%
PAGE_EXECUTE_READ                        35           33c7000 (  51.777 MB)   4.21%    1.26%
PAGE_READONLY                            90            c41000 (  12.254 MB)   1.00%    0.30%
PAGE_WRITECOPY                           34            70b000 (   7.043 MB)   0.57%    0.17%
PAGE_READWRITE|PAGE_GUARD                14             23000 ( 140.000 kB)   0.01%    0.00%
PAGE_EXECUTE_READWRITE                    7              7000 (  28.000 kB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free                                        80010000          7f130000 (   1.986 GB)
<unknown>                                   438e1000           200f000 (  32.059 MB)
Image                                       660e0000            f55000 (  15.332 MB)
Stack                                         e00000             fd000 (1012.000 kB)
Heap                                          c97000             98000 ( 608.000 kB)
Other                                       ff2c0000             33000 ( 204.000 kB)
TEB                                           990000              3000 (  12.000 kB)
PEB                                           98d000              3000 (  12.000 kB)

看到上面 PAGE_READWRITE 行的 (1.055 GB) 嗎? 和剛才 Console 中的 1087M 遙相呼應,沒毛病。

  • 尋找大物件,在託管堆中使用 !dumpheap -stat -min 1024 即可

||0:0:000> !dumpheap -stat -min 1024
Statistics:
      MT    Count    TotalSize Class Name
65d42788        2        13044 System.Object[]
65d42d74        2        98328 System.String[]
65d42c60       73      1082988 System.Char[]
65d424e4    11452   1119913984 System.String

從輸出的最後一行可以看出,System.String 有1w多個,接下來可以增加 -type 屬性篩選出 >10k 的字串。


0:000> !dumpheap -type System.String -min 10240
 Address       MT     Size
03c75568 65d424e4    97792     
03c8d378 65d424e4    97792    
4a855060 65d424e4    97792     

Statistics:
      MT    Count    TotalSize Class Name
65d424e4    11452   1119913984 System.String
Total 11452 objects

0:000> !gcroot 4a855060
Thread 36e4:
*** WARNING: Unable to verify checksum for ConsoleApp2.exe
    00b3f358 012108d1 ConsoleApp2.Program.Main(System.String[]) [E:\net5\ConsoleApp1\ConsoleApp2\Program.cs @ 18]
        ebp+18: 00b3f370
            ->  02c71fd8 System.Collections.Generic.List`1[[System.String, mscorlib]]
            ->  02cce2ec System.String[]
            ->  4a855060 System.String

Found 1 unique roots (run '!GCRoot -all' to see all roots).

從最後的 !gcroot 看,確實是被 Program.cs:18 行的 List 所持有,到此水落石出。

三:CPU爆高,程式累死

說起CPU爆高的案例,我發現更多的是在 非託管堆 上,比如GC回收,爭搶鎖等,很少有人能傻到在 託管層 上把cpu搞起來。

對了,分析CPU 爆高有一個小技巧,那就是連續抓 dump 快照,看兩個 dump 中的執行緒執行情況,這時候就非常適合 procdump,先來看測試程式碼。


    class Program
    {
        static void Main(string[] args)
        {
            Parallel.For(0, int.MaxValue, (i) =>
            {
                while (true)
                {

                }
            });

            Console.ReadLine();
        }
    }

現在我設定 連續 5s 內 CPU 超過 70% 抓取 dump,直到 2 個為止


C:\Windows\system32>procdump  ConsoleApp2 -s 5 -n 2 -c 70 E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug

ProcDump v10.0 - Sysinternals process dump utility
Copyright (C) 2009-2020 Mark Russinovich and Andrew Richards
Sysinternals - www.sysinternals.com

Process:               ConsoleApp2.exe (22152)
Process image:         E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe
CPU threshold:         >= 70% of system
Performance counter:   n/a
Commit threshold:      n/a
Threshold seconds:     5
Hung window check:     Disabled
Log debug strings:     Disabled
Exception monitor:     Disabled
Exception filter:      [Includes]
                       *
                       [Excludes]
Terminate monitor:     Disabled
Cloning type:          Disabled
Concurrent limit:      n/a
Avoid outage:          n/a
Number of dumps:       2
Dump folder:           E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\
Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSS
Queue to WER:          Disabled
Kill after dump:       Disabled


Press Ctrl-C to end monitoring without terminating the process.

[22:25:47] CPU: 95% 1s
[22:25:48] CPU: 100% 2s
[22:25:50] CPU: 96% 3s
[22:25:51] CPU: 98% 4s
[22:25:52] CPU: 99% 5s (Trigger)
[22:25:53] Dump 1 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222553.dmp
[22:25:54] Dump 1 complete: 5 MB written in 0.3 seconds
[22:25:56] CPU: 88% 1s
[22:25:58] CPU: 93% 2s
[22:26:00] CPU: 89% 3s
[22:26:02] CPU: 89% 4s
[22:26:04] CPU: 95% 5s (Trigger)
[22:26:05] Dump 2 initiated: E:\net5\ConsoleApp1\ConsoleApp2\bin\Debug\ConsoleApp2.exe_210323_222605.dmp
[22:26:06] Dump 2 complete: 5 MB written in 0.4 seconds
[22:26:07] Dump count reached.

從最後輸出中可以看到,連續 5s CPU 超過了 70% 抓取了 dump,總共來了2個。

現在 dump 有了,接下來用兩個 windbg 例項開啟,驗證下 dump 的生成時間,如下圖所示:

從圖中可以看到,兩個 dump 生成時間相隔 12s,而且通過 !runaway 發現下面的執行緒:

  • 14:2cb8
  • 19:3f8c
  • ...

都執行了長達 10s ,這說明什麼?說明這二個執行緒應該在某個地方死迴圈了。。。對吧。。。

切到 14 號執行緒通過 !clrstack 看呼叫堆疊即可,都是死在 ConsoleApp2.Program+c.b__0_0(Int32) 這裡出不來。。。

四:總結

感覺篇幅有點長了,就先說到這裡吧,有興趣的話,可以把 procdump 拉下來玩一玩 ?。

更多高質量乾貨:參見我的 GitHub: dotnetfly

圖片名稱

相關文章