一、背景
近期有一個專案在執行當中出現一些問題,程式順利啟動,但是觀察一陣子後發現記憶體使用總量在很緩慢地升高,
雖然偶爾還會往下降一些,但是總體還是不斷上升;記憶體執行6個小時候從33M上升到80M;
程式存在記憶體洩漏是確定無疑的了,大概出問題的方向也知道,就是程式新加入一個採集協議(BACnet協議,MSTP_DLL),
但是怎麼把具體洩漏位置找出來卻非常麻煩,因為這個協議是封裝在一個C語言寫的動態庫中,想要單步除錯好像不太可能,
況且原始碼也不再我這裡;
如果到此為止,推脫給其他同事找問題,那聯合除錯費時不說。其他同事也身兼數職,不大可能有時間除錯,
那專案推進肯定停滯;那沒辦法了,只能硬著頭皮上;網上了解一番,對於這種記憶體洩漏問題,比較好的處理方式就是
抓取記憶體快照,然後分析資料提交記錄,使用檢視使用堆疊等資訊;所以基於以上原因,選擇了windbg核心除錯工具;
先分析一下看看,說不定可以發現問題;
二、windbg注意事項
1、首先要安裝對版本,即你的程式是32位還是64位,對於的windbg版本也要一致,否則會報錯;詳情瞭解:點選這裡
2、需要用64位的工作管理員抓32位的dump檔案,那不能直接在工作管理員右鍵“建立轉儲檔案“,需要執行(C:\Windows\SysWOW64\taskmgr.exe)
3、或者直接在windbg上使用命令儲存,先附加到程式,然後使用命令:(.dump /ma c:\xxx.dmp),這樣就將快照儲存在C盤了;
4、最重要的,要確保你的機器能連線外網;由於windbg的使用需要線上更新符號檔案,但是這個地址剛好被國家防火牆遮蔽;
三、windbg必要設定
1、首先我先抓取2個記憶體快照檔案(中間相隔一段時間),如下
2、開啟windbg,設定符號下載路徑
將33.dmp直接拖進工作區即可,然後開啟選單File -> Symbol File Path
輸入地址:SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
四、分析檔案
1、分別開啟兩個dmp檔案,輸入命令!dumpheap -stat檢視各種型別的記憶體分配情況
33.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll >!dumpheap -stat ..... 61f87928 2292 34012 System.RuntimeType[] 5d2dbe74 267 34176 System.Data.DataColumn 61fd75e0 668 37408 System.Reflection.RuntimePropertyInfo 61f8426c 702 48976 System.Int32[] 5d2dcc24 70 72520 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][] 61f883e4 1242 84456 System.Reflection.RuntimeParameterInfo 61f8839c 2045 89980 System.Signature 0a7566bc 596 92976 HG.MacamUnit.Entity.TSubSysNodes 61f82788 723 117736 System.Object[] 61f89850 8 131696 System.Int64[] 61fd8938 2792 167520 System.Reflection.RuntimeMethodInfo 007988d0 220 434392 Free 61f824e4 12187 738904 System.String 61f85c40 2138 743067 System.Byte[] 61f82c60 294 6629796 System.Char[] Total 55014 objects
80.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll
>!dumpheap -stat
.....
61f83698 876 24528 System.RuntimeType
61f84ec0 159 26472 System.Collections.Hashtable+bucket[]
61fc9020 631 27764 System.Reflection.RtFieldInfo
61f95be8 46 28392 System.Reflection.Emit.__FixupData[]
61f87928 2292 34012 System.RuntimeType[]
61fd75e0 668 37408 System.Reflection.RuntimePropertyInfo
5d2dcc24 42 43512 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][]
61f8426c 595 45868 System.Int32[]
61f883e4 1242 84456 System.Reflection.RuntimeParameterInfo
61f8839c 2045 89980 System.Signature
61f82788 622 113684 System.Object[]
61f89850 8 131696 System.Int64[]
61fd8938 2769 166140 System.Reflection.RuntimeMethodInfo
61f824e4 9800 676596 System.String
61f85c40 2064 705655 System.Byte[]
61f82c60 195 2369402 System.Char[]
007988d0 114 3338792 Free
Total 47306 objects
著重分析(紅色部分)這兩個檔案的記憶體分配情況,似乎差別不大,完全看不出來80-33=近50M的記憶體消耗在哪裡;
但認真思考一下,這樣好像也沒有問題,因為System.***這種型別是C#環境獨有的,已知C#沒有記憶體洩漏,所以這裡沒有體現應該是正常的;
那C語言介面檔案裡邊的問題該如何找出來呢?
2、再來試試!heap -s,檢視各種堆的記憶體提交資料量
33.dmp
0:047> !heap -s
LFH Key : 0x343fce0b
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
00780000 00000002 8192 4636 8192 209 2484 4 0 e LFH
002e0000 00001002 256 4 256 2 1 1 0 0
00280000 00001002 1088 72 1088 5 2 2 0 0
00c70000 00041002 256 4 256 2 1 1 0 0
002d0000 00001002 1088 132 1088 8 23 2 0 0
00450000 00001002 256 4 256 0 1 1 0 0
07230000 00041002 256 4 256 2 1 1 0 0
00c10000 00001002 256 216 256 3 39 1 0 0 LFH
09b50000 00001002 256 80 256 39 28 1 0 0
09d00000 00001002 64 4 64 2 1 1 0 0
09ef0000 00001002 1088 72 1088 6 2 2 0 0
004c0000 00001002 1088 192 1088 15 140 2 0 0
09760000 00041002 256 28 256 4 4 1 0 0
09ed0000 00001002 64 12 64 1 1 1 0 0
0b210000 00001002 3136 1456 3136 52 84 3 0 0 LFH
0a700000 00001002 256 212 256 2 1 1 0 0
0e1e0000 00011002 256 4 256 0 1 1 0 0
0d030000 00001002 256 16 256 3 1 1 0 0
11b30000 00001002 1088 388 1088 0 1 2 0 0
-----------------------------------------------------------------------------
80.dmp
0:051> !heap -s LFH Key : 0x343fce0b Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 00780000 00000002 8192 4808 8192 225 2505 4 0 f1 LFH 002e0000 00001002 256 4 256 2 1 1 0 0 00280000 00001002 1088 132 1088 4 6 2 0 0 00c70000 00041002 256 4 256 2 1 1 0 0 002d0000 00001002 1088 168 1088 12 26 2 0 0 00450000 00001002 256 4 256 0 1 1 0 0 07230000 00041002 256 4 256 2 1 1 0 0 00c10000 00001002 256 228 256 26 69 1 0 0 LFH 09b50000 00001002 256 80 256 39 25 1 0 0 09d00000 00001002 64 4 64 2 1 1 0 0 09ef0000 00001002 1088 132 1088 6 5 2 0 0 004c0000 00001002 1088 220 1088 26 173 2 0 0 09760000 00041002 256 28 256 4 8 1 0 0 09ed0000 00001002 64 12 64 1 1 1 0 0 0b210000 00001002 3136 1456 3136 74 71 3 0 0 LFH 0a700000 00001002 256 212 256 2 1 1 0 0 0e1e0000 00011002 256 4 256 0 1 1 0 0 0d030000 00001002 256 16 256 1 1 1 0 0 11b30000 00001002 47808 46068 47808 396 6836 7 0 0 -----------------------------------------------------------------------------
這次有異常了,可以看到11b30000這一行記憶體提交變化很大 47808 - 1088 = 46720;
這次可以肯定問題就在這個堆裡邊;
3、進去看看11b30000,使用命令:!heap -stat -h 11b30000
80.dmp
0:051> !heap -stat -h 11b30000
heap @ 11b30000
group-by: TOTSIZE max-display: 20
size #blocks total ( %) (percent of total busy bytes)
1f0 102d9 - 1f58470 (92.48)
18 102b0 - 184080 (4.47)
10 102ae - 102ae0 (2.98)
214 13 - 277c (0.03)
1000 2 - 2000 (0.02)
800 2 - 1000 (0.01)
220 1 - 220 (0.00)
1d7 1 - 1d7 (0.00)
80 3 - 180 (0.00)
a4 1 - a4 (0.00)
24 4 - 90 (0.00)
14 4 - 50 (0.00)
4a 1 - 4a (0.00)
25 2 - 4a (0.00)
48 1 - 48 (0.00)
46 1 - 46 (0.00)
41 1 - 41 (0.00)
3e 1 - 3e (0.00)
3c 1 - 3c (0.00)
37 1 - 37 (0.00)
可以看到前面3項幾乎佔據99%的記憶體提交記錄;尤其以記憶體塊大小為1f0的資料塊使用最多記憶體;
到目前為止,我們知道了幾項有效資訊,有大小分別為1f0、18、10的三種資料塊,不斷申請出新空間;
但是這樣還不夠,根據一個記憶體塊的大小並不能準確定位是哪裡出了問題,這是一個結構體?還是字串?還是陣列?
都不知道,所以有必要進去看看,有哪些地方使用到了這些資料塊
4、檢視使用了1f0資料塊大小的位置列表,使用命令:!heap -flt s [size]
80.dmp 0:051> !heap -flt s 1f0 _DPH_HEAP_ROOT @ 5a1000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ 780000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 0078e5b8 0045 0000 [00] 0078e5e0 001f0 - (busy) _DPH_HEAP_ROOT @ 9d11000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ 4c0000 _DPH_HEAP_ROOT @ af41000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ b210000 0cf61680 0045 0045 [00] 0cf616a8 001f0 - (busy) _DPH_HEAP_ROOT @ d871000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ d030000 _DPH_HEAP_ROOT @ 11631000 Freed and decommitted blocks DPH_HEAP_BLOCK : VirtAddr VirtSize Busy allocations DPH_HEAP_BLOCK : UserAddr UserSize - VirtAddr VirtSize _HEAP @ 11b30000 11b312e8 0045 0045 [00] 11b31310 001f0 - (busy) 11b315a8 0045 0045 [00] 11b315d0 001f0 - (busy) 11b356f8 0045 0045 [00] 11b35720 001f0 - (busy) 11b35920 0045 0045 [00] 11b35948 001f0 - (busy) 11b36f30 0045 0045 [00] 11b36f58 001f0 - (busy) 11b37b58 0045 0045 [00] 11b37b80 001f0 - (busy) 11b37e18 0045 0045 [00] 11b37e40 001f0 - (busy) 11b3e4f0 0045 0045 [00] 11b3e518 001f0 - (busy) 11b3f570 0045 0045 [00] 11b3f598 001f0 - (busy) 11b3f830 0045 0045 [00] 11b3f858 001f0 - (busy) 11b3faf0 0045 0045 [00] 11b3fb18 001f0 - (busy) 11b3fdb0 0046 0045 [00] 11b3fdd8 001f0 - (busy) 12890578 0045 0046 [00] 128905a0 001f0 - (busy) ......
可以看到有很多堆都有使用到1f0大小的記憶體塊,但是隻有最後一個堆 _DPH_HEAP_ROOT @ 11631000
是記錄最多的,滿屏都是,這裡只能截斷,選取一部分看看
5、檢視呼叫堆疊,使用命令:!heap -p -a [address]
80.dmp 0:051> !heap -p -a 11b3fdd8 address 11b3fdd8 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 11b3fdb0 0046 0000 [00] 11b3fdd8 001f0 - (busy) Trace: 083a 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 0:051> !heap -p -a 11b3fdd8 address 11b3fdd8 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 11b3fdb0 0046 0000 [00] 11b3fdd8 001f0 - (busy) Trace: 083a 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 0:051> !heap -p -a 11b3fb18 address 11b3fb18 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 11b3faf0 0045 0000 [00] 11b3fb18 001f0 - (busy) Trace: 083a 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a
隨意挑選幾個檢視呼叫堆疊,似乎沒有有用的特徵資訊,verifier、ntdll、msvcr90這些都是作業系統核心級別的函式;
並不能暴露出使用1f0大小的資料塊大概位置,這就有點難辦了,難道此路不通?如果不找到有效堆疊資訊,想定位
內心洩漏點,靠單步除錯會相當麻煩。。。
不急,先看看,這些地方記憶體塊內容是什麼,說不定能找到一些有效特徵資訊;
使用命令:db [UserPtr]
80.dmp 0:051> db 11b3fb18 11b3fb18 00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb28 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb38 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb48 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb58 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb68 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb78 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fb88 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 0:051> db 11b3fdd8 11b3fdd8 00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fde8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fdf8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe08 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe18 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe28 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe38 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe48 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 0:051> db 11b3fdd8 11b3fdd8 00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fde8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fdf8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe08 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe18 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe28 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe38 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................ 11b3fe48 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
結果是令人失望的;
顯示這些基本都是空白記憶體,裡邊已經沒有任何有效資訊,,
陷入死衚衕裡了,難道到此為止?
還不死心,我們再看看這些地址有沒有引用跟,如果有引用跟,也可以列印堆疊資訊
使用命令:!gcroot [UserPtr]
80.dmp 0:051> !gcroot 11b3fb18 Found 0 unique roots (run '!GCRoot -all' to see all roots). 0:051> !gcroot 11b3fdd8 Found 0 unique roots (run '!GCRoot -all' to see all roots). 0:051> !gcroot 11b3fdd8 Found 0 unique roots (run '!GCRoot -all' to see all roots).
願望是美好的,這個大小位1f0的資料塊被申請了0x102d9次,使用!gcroot命令檢視得到貌似都是無引用的野資料
我們再來看看,這個 _DPH_HEAP_ROOT @ 11631000堆的建立堆疊
80.dmp 0:051> dt ntdll!_DPH_HEAP_ROOT CreateStackTrace 11631000 +0x0b8 CreateStackTrace : 0x04d54f8c _RTL_TRACE_BLOCK 0:051> dds 0x04d54f8c 04d54f8c 04d1b714 04d54f90 0000f801 04d54f94 000f0000 04d54f98 74058969 verifier!AVrfDebugPageHeapCreate+0x439 04d54f9c 77cbcea2 ntdll!RtlCreateHeap+0x41 04d54fa0 757356bc KERNELBASE!HeapCreate+0x50 04d54fa4 66463a4a msvcr90!_heap_init+0x1b 04d54fa8 66422bb4 msvcr90!__p__tzname+0x2a 04d54fac 66422d5e msvcr90!_CRTDLL_INIT+0x1e 04d54fb0 77c79264 ntdll!LdrpCallInitRoutine+0x14 04d54fb4 77c7fe97 ntdll!LdrpRunInitializeRoutines+0x26f 04d54fb8 77c7ea4e ntdll!LdrpLoadDll+0x472 04d54fbc 77cbd3df ntdll!LdrLoadDll+0xc7 04d54fc0 75732e6a KERNELBASE!LoadLibraryExW+0x233 04d54fc4 7562483c kernel32!LoadLibraryW+0x11 04d54fc8 6d3d18de*** WARNING: Unable to verify checksum for Win32Project1.dll *** ERROR: Symbol file could not be found. Defaulted to export symbols for Win32Project1.dll - Win32Project1+0x18de 04d54fcc 6d3d28fc Win32Project1!BACNet::Init+0x5c 04d54fd0 6d3d5925 Win32Project1!Init+0x25 04d54fd4 66639972*** WARNING: Unable to verify checksum for SMDB.dll *** ERROR: Symbol file could not be found. Defaulted to export symbols for SMDB.dll - SMDB!LogPop+0x12 04d54fd8 66639452 SMDB!CreateSharedMemory+0x12 04d54fdc 6d8e47bd clrjit!Compiler::impImportBlockCode+0x2aac [f:\dd\ndp\clr\src\jit32\importer.cpp @ 10258] 04d54fe0 6d8c2e6b clrjit!Compiler::impImportBlock+0x5f [f:\dd\ndp\clr\src\jit32\importer.cpp @ 13246] 04d54fe4 6d8c306a clrjit!Compiler::impImport+0x235 [f:\dd\ndp\clr\src\jit32\importer.cpp @ 14195] 04d54fe8 6d8c364f clrjit!Compiler::compCompile+0x62 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 2491] 04d54fec 6d8c4276 clrjit!Compiler::compCompileHelper+0x32f [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3615] 04d54ff0 6d8c43fc clrjit!Compiler::compCompile+0x2ab [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3086] 04d54ff4 6d8c45c8 clrjit!jitNativeCode+0x1f6 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 4057] 04d54ff8 6d8c377d clrjit!CILJit::compileMethod+0x7d [f:\dd\ndp\clr\src\jit32\ee_il_dll.cpp @ 180] 04d54ffc 633b39b3 clr!invokeCompileMethodHelper+0x10b 04d55000 633b3a8b clr!invokeCompileMethod+0x3d 04d55004 633b3ae8 clr!CallCompileMethodWithSEHWrapper+0x39 04d55008 633b3d97 clr!UnsafeJitFunction+0x431
動態庫Win32Project1.dll是對MSTP_DLL動態庫的再次封裝可以確定不存在記憶體洩漏問題;
看到這個堆是在於硬體裝置通訊的時候,初始化時CLR建立的執行緒;
不過知道這個好像也沒有什麼用,因為我們本來就知道是BACnet協議通訊的動態庫有問題;
只能說明是初始化之後產生的記憶體洩漏;
但是為什麼這些無跟指標沒有被垃圾回收?
但是仔細一想,好像也是正常,因為這些是可以明確的在C語言編寫的動態庫裡申請的記憶體,屬於不受託管的記憶體;
C#垃圾回收也只能回收託管記憶體,所以這部分資料不主動釋放,那就會永遠在那裡;
但是現在,好像陷入死衚衕了,找不到思路,既然如此就先放放,先看看其他兩個資料塊的呼叫情況;
6、!heap -flt s 18
80.dmp > !heap -flt s 18 ... 16f45098 000a 000a [00] 16f450c0 00018 - (busy) 16f45358 000a 000a [00] 16f45380 00018 - (busy) 16f45618 000a 000a [00] 16f45640 00018 - (busy) 16f458d8 000a 000a [00] 16f45900 00018 - (busy) 16f45b98 000a 000a [00] 16f45bc0 00018 - (busy) 16f46080 000a 000a [00] 16f460a8 00018 - (busy) 16f46118 000a 000a [00] 16f46140 00018 - (busy) 16f461b0 000a 000a [00] 16f461d8 00018 - (busy) 16f46248 000a 000a [00] 16f46270 00018 - (busy) 16f462e0 000a 000a [00] 16f46308 00018 - (busy) 16f46378 000a 000a [00] 16f463a0 00018 - (busy) 16f46410 000a 000a [00] 16f46438 00018 - (busy) 16f464a8 000b 000a [00] 16f464d0 00018 - (busy) 16f46548 000a 000b [00] 16f46570 00018 - (busy) 16f46808 000a 000a [00] 16f46830 00018 - (busy) 16f46ac8 000a 000a [00] 16f46af0 00018 - (busy) 16f46d88 000a 000a [00] 16f46db0 00018 - (busy) 16f47048 000a 000a [00] 16f47070 00018 - (busy) 16f47308 000a 000a [00] 16f47330 00018 - (busy) ...
7、隨意挑幾個看看,命令:!heap -p -a [UserPtr]
80.dmp 0:051> !heap -p -a invalid address passed to `-p -a'0:051> !heap -p -a 16f460a8 address 16f460a8 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 16f46080 000a 0000 [00] 16f460a8 00018 - (busy) Trace: 074b 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a *** ERROR: Symbol file could not be found. Defaulted to export symbols for MSTP_DLL.dll - 669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091 0:051> !heap -p -a 16f46570 address 16f46570 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 16f46548 000a 0000 [00] 16f46570 00018 - (busy) 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091 0:051> !heap -p -a 16f46308 address 16f46308 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 16f462e0 000a 0000 [00] 16f46308 00018 - (busy) Trace: 074b 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091
這次很順利,這個記憶體使用的地方實在MSTP_DLL的 MSTP_Get_RPM_ACK_Data裡邊;這個就是我們要找的最終的記憶體洩漏點資訊;
同樣操作堆10大小的資料塊操作一遍
80.dmp > !heap -flt s 10 ... 15359fa0 0009 0009 [00] 15359fc8 00010 - (busy) 1535a2a0 0009 0009 [00] 1535a2c8 00010 - (busy) 1535a560 0009 0009 [00] 1535a588 00010 - (busy) 1535aee8 0009 0009 [00] 1535af10 00010 - (busy) 1535af80 0009 0009 [00] 1535afa8 00010 - (busy) 1535b018 0009 0009 [00] 1535b040 00010 - (busy) 1535b360 0009 0009 [00] 1535b388 00010 - (busy) 1535b620 0009 0009 [00] 1535b648 00010 - (busy) 1535c420 0009 0009 [00] 1535c448 00010 - (busy) 1535d220 0009 0009 [00] 1535d248 00010 - (busy) 1535d4e0 0009 0009 [00] 1535d508 00010 - (busy) 1535d7a0 0009 0009 [00] 1535d7c8 00010 - (busy) 1535da60 0009 0009 [00] 1535da88 00010 - (busy) 1535dd20 0009 0009 [00] 1535dd48 00010 - (busy) 1535dfe0 0009 0009 [00] 1535e008 00010 - (busy) 1535e2a0 0009 0009 [00] 1535e2c8 00010 - (busy) 1535e560 0009 0009 [00] 1535e588 00010 - (busy) 1535e820 0009 0009 [00] 1535e848 00010 - (busy) 1535eae0 0009 0009 [00] 1535eb08 00010 - (busy) 1535eda0 0009 0009 [00] 1535edc8 00010 - (busy) 1535f060 0009 0009 [00] 1535f088 00010 - (busy) 1535f320 0009 0009 [00] 1535f348 00010 - (busy) 1535f5e0 0009 0009 [00] 1535f608 00010 - (busy) ...
80.dmp 0:051> !heap -p -a 1535eb08 address 1535eb08 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 1535eae0 0009 0000 [00] 1535eb08 00010 - (busy) Trace: 0817 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b 0:051> !heap -p -a 1535f088 address 1535f088 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 1535f060 0009 0000 [00] 1535f088 00010 - (busy) Trace: 0817 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b 0:051> !heap -p -a 1535f348 address 1535f348 found in _HEAP @ 11b30000 HEAP_ENTRY Size Prev Flags UserPtr UserSize - state 1535f320 0009 0000 [00] 1535f348 00010 - (busy) Trace: 0817 7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7 74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e 77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030 77ccab8e ntdll!RtlpAllocateHeap+0x000000c4 77c73461 ntdll!RtlAllocateHeap+0x0000023a 664668e5 msvcr90!_calloc_impl+0x00000125 66463c5a msvcr90!calloc+0x0000001a 669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b
這次也順利拿到另一個記憶體洩漏的位置資訊在MSTP_DLL的 MSTP_Get_RP_ACK_Data裡邊;
MSTP_Get_RP_ACK_Data
MSTP_Get_RPM_ACK_Data
這兩個方法其實是讀取模組點數值或者收集模組資訊的時候返回的一個資料指標;
現在很明顯這兩個方法返回的指標可能是有問題的,裡邊非常大的可能存在記憶體洩漏;
7、驗證
跟同事找到原來的MSTP_DLL的原始碼,找到以上兩個方法體
可以看到當初那位同事設計這個方法的時候,很明顯有2個錯誤;
1)返回的指標只見宣告記憶體空間,不見釋放;
2)返回資料的指標不應該在方法體中的返回值中傳出來,應該寫在方法引數中,外部宣告,傳進去賦值,然後外部使用,再外部釋放
3)兩個方法體都一樣的問題
五、整理
1)我們知道有三處記憶體洩漏,分別大小是1f0、18、10
2)三者佔據99%的新增不釋放的記憶體消耗
3)我們已經找到其中兩個洩漏位置,還剩下一個
4)1f0是重中之重,佔據記憶體消耗92%,不解決這個BUG,問題基本就相當於沒解決
5)無法找到1f0的呼叫堆疊資訊,無明顯特徵資訊,無引用跟;
5)emmmmm? (第二聲)
好像被我們錯過了一個資訊,
是否還記得最開始那一段?
80.dmp 0:051> !heap -stat -h 11b30000 heap @ 11b30000 group-by: TOTSIZE max-display: 20 size #blocks total ( %) (percent of total busy bytes) 1f0 102d9 - 1f58470 (92.48) 18 102b0 - 184080 (4.47) 10 102ae - 102ae0 (2.98)
這幾個資料很接近,都是申請次數大小,也就是說著三個資料塊被申請的次數差不多。。
鑑於此,我們再去看看33M記憶體的時候這幾個次數的值是多少
33.dmp 0:047> !heap -s LFH Key : 0x343fce0b Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 00780000 00000002 8192 4636 8192 209 2484 4 0 e LFH 002e0000 00001002 256 4 256 2 1 1 0 0 00280000 00001002 1088 72 1088 5 2 2 0 0 00c70000 00041002 256 4 256 2 1 1 0 0 002d0000 00001002 1088 132 1088 8 23 2 0 0 00450000 00001002 256 4 256 0 1 1 0 0 07230000 00041002 256 4 256 2 1 1 0 0 00c10000 00001002 256 216 256 3 39 1 0 0 LFH 09b50000 00001002 256 80 256 39 28 1 0 0 09d00000 00001002 64 4 64 2 1 1 0 0 09ef0000 00001002 1088 72 1088 6 2 2 0 0 004c0000 00001002 1088 192 1088 15 140 2 0 0 09760000 00041002 256 28 256 4 4 1 0 0 09ed0000 00001002 64 12 64 1 1 1 0 0 0b210000 00001002 3136 1456 3136 52 84 3 0 0 LFH 0a700000 00001002 256 212 256 2 1 1 0 0 0e1e0000 00011002 256 4 256 0 1 1 0 0 0d030000 00001002 256 16 256 3 1 1 0 0 11b30000 00001002 1088 388 1088 0 1 2 0 0 ----------------------------------------------------------------------------- 0:047> !heap -stat -h 11b30000 heap @ 11b30000 group-by: TOTSIZE max-display: 20 size #blocks total ( %) (percent of total busy bytes) 1f0 1f2 - 3c4e0 (86.13) 18 1c9 - 2ad8 (3.82) 1000 2 - 2000 (2.86) 10 1c7 - 1c70 (2.54) 214 c - 18f0 (2.23) 800 2 - 1000 (1.43) 220 1 - 220 (0.19) 1d7 1 - 1d7 (0.16) 80 3 - 180 (0.13) a4 1 - a4 (0.06) 24 4 - 90 (0.05) 14 4 - 50 (0.03) 4a 1 - 4a (0.03) 25 2 - 4a (0.03) 48 1 - 48 (0.03) 46 1 - 46 (0.02) 41 1 - 41 (0.02) 3e 1 - 3e (0.02) 3c 1 - 3c (0.02) 37 1 - 37 (0.02)
分別是1f2、1c9、1c7;
1f0:102d9 - 1f2 = 65767
18:102b0 - 1c9 = 65767
10:102ae - 1c7 = 65767
居然申請的次數一模一樣!
穩了!這個1f0可以斷定與其他兩個緊密相關;首先懷疑的就是
MSTP_Get_RP_ACK_Data
MSTP_Get_RPM_ACK_Data
1)這兩個方法體中使用到的所有子方法體有沒有申請空間的語句;
2)申請的空間大小是不是就是1f0;
依據上面的推測,再次閱讀那2個方法體;
經過分析BACNET_APPLICATION_DATA_VALUE結構體大小剛好就是1f0
好了,搞定
如果對你有幫助,請點贊、評論;