記一次使用windbg排查記憶體洩漏的過程

漂亮的貓發表於2020-05-31

一、背景

  近期有一個專案在執行當中出現一些問題,程式順利啟動,但是觀察一陣子後發現記憶體使用總量在很緩慢地升高,

雖然偶爾還會往下降一些,但是總體還是不斷上升;記憶體執行6個小時候從33M上升到80M;

  程式存在記憶體洩漏是確定無疑的了,大概出問題的方向也知道,就是程式新加入一個採集協議(BACnet協議,MSTP_DLL),

但是怎麼把具體洩漏位置找出來卻非常麻煩,因為這個協議是封裝在一個C語言寫的動態庫中,想要單步除錯好像不太可能,

況且原始碼也不再我這裡;

  如果到此為止,推脫給其他同事找問題,那聯合除錯費時不說。其他同事也身兼數職,不大可能有時間除錯,

那專案推進肯定停滯;那沒辦法了,只能硬著頭皮上;網上了解一番,對於這種記憶體洩漏問題,比較好的處理方式就是

抓取記憶體快照,然後分析資料提交記錄,使用檢視使用堆疊等資訊;所以基於以上原因,選擇了windbg核心除錯工具;

先分析一下看看,說不定可以發現問題;

 

二、windbg注意事項

1、首先要安裝對版本,即你的程式是32位還是64位,對於的windbg版本也要一致,否則會報錯;詳情瞭解:點選這裡

2、需要用64位的工作管理員抓32位的dump檔案,那不能直接在工作管理員右鍵“建立轉儲檔案“,需要執行(C:\Windows\SysWOW64\taskmgr.exe)

3、或者直接在windbg上使用命令儲存,先附加到程式,然後使用命令:(.dump /ma c:\xxx.dmp),這樣就將快照儲存在C盤了;

4、最重要的,要確保你的機器能連線外網;由於windbg的使用需要線上更新符號檔案,但是這個地址剛好被國家防火牆遮蔽;

 

三、windbg必要設定

1、首先我先抓取2個記憶體快照檔案(中間相隔一段時間),如下

 

 

2、開啟windbg,設定符號下載路徑

將33.dmp直接拖進工作區即可,然後開啟選單File -> Symbol File Path

輸入地址:SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

 

四、分析檔案

1、分別開啟兩個dmp檔案,輸入命令!dumpheap -stat檢視各種型別的記憶體分配情況

33.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll >!dumpheap -stat ..... 61f87928 2292 34012 System.RuntimeType[] 5d2dbe74 267 34176 System.Data.DataColumn 61fd75e0 668 37408 System.Reflection.RuntimePropertyInfo 61f8426c 702 48976 System.Int32[] 5d2dcc24 70 72520 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][] 61f883e4 1242 84456 System.Reflection.RuntimeParameterInfo 61f8839c 2045 89980 System.Signature 0a7566bc 596 92976 HG.MacamUnit.Entity.TSubSysNodes 61f82788 723 117736 System.Object[] 61f89850 8 131696 System.Int64[] 61fd8938 2792 167520 System.Reflection.RuntimeMethodInfo 007988d0 220 434392 Free 61f824e4 12187 738904 System.String 61f85c40 2138 743067 System.Byte[] 61f82c60 294 6629796 System.Char[] Total 55014 objects
80.dmp
>.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\SOS.dll
>!dumpheap -stat
.....
61f83698      876        24528 System.RuntimeType
61f84ec0      159        26472 System.Collections.Hashtable+bucket[]
61fc9020      631        27764 System.Reflection.RtFieldInfo
61f95be8       46        28392 System.Reflection.Emit.__FixupData[]
61f87928     2292        34012 System.RuntimeType[]
61fd75e0      668        37408 System.Reflection.RuntimePropertyInfo
5d2dcc24       42        43512 System.Data.RBTree`1+Node[[System.Data.DataRow, System.Data]][]
61f8426c      595        45868 System.Int32[]
61f883e4     1242        84456 System.Reflection.RuntimeParameterInfo
61f8839c     2045        89980 System.Signature
61f82788      622       113684 System.Object[]
61f89850        8       131696 System.Int64[]
61fd8938     2769       166140 System.Reflection.RuntimeMethodInfo
61f824e4     9800       676596 System.String
61f85c40     2064       705655 System.Byte[]
61f82c60      195      2369402 System.Char[]
007988d0      114      3338792      Free
Total 47306 objects  

著重分析(紅色部分)這兩個檔案的記憶體分配情況,似乎差別不大,完全看不出來80-33=近50M的記憶體消耗在哪裡;

但認真思考一下,這樣好像也沒有問題,因為System.***這種型別是C#環境獨有的,已知C#沒有記憶體洩漏,所以這裡沒有體現應該是正常的;

那C語言介面檔案裡邊的問題該如何找出來呢?

 

2、再來試試!heap -s,檢視各種堆的記憶體提交資料量

33.dmp

0:047> !heap -s
LFH Key                   : 0x343fce0b
Termination on corruption : ENABLED
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                    (k)     (k)    (k)     (k) length      blocks cont. heap 
-----------------------------------------------------------------------------
00780000 00000002    8192   4636   8192    209  2484     4    0      e   LFH
002e0000 00001002     256      4    256      2     1     1    0      0      
00280000 00001002    1088     72   1088      5     2     2    0      0      
00c70000 00041002     256      4    256      2     1     1    0      0      
002d0000 00001002    1088    132   1088      8    23     2    0      0      
00450000 00001002     256      4    256      0     1     1    0      0      
07230000 00041002     256      4    256      2     1     1    0      0      
00c10000 00001002     256    216    256      3    39     1    0      0   LFH
09b50000 00001002     256     80    256     39    28     1    0      0      
09d00000 00001002      64      4     64      2     1     1    0      0      
09ef0000 00001002    1088     72   1088      6     2     2    0      0      
004c0000 00001002    1088    192   1088     15   140     2    0      0      
09760000 00041002     256     28    256      4     4     1    0      0      
09ed0000 00001002      64     12     64      1     1     1    0      0      
0b210000 00001002    3136   1456   3136     52    84     3    0      0   LFH
0a700000 00001002     256    212    256      2     1     1    0      0      
0e1e0000 00011002     256      4    256      0     1     1    0      0      
0d030000 00001002     256     16    256      3     1     1    0      0      
11b30000 00001002    1088    388   1088      0     1     2    0      0      
-----------------------------------------------------------------------------

 

80.dmp

0:051> !heap -s LFH Key : 0x343fce0b Termination on corruption : ENABLED Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast (k) (k) (k) (k) length blocks cont. heap ----------------------------------------------------------------------------- 00780000 00000002 8192 4808 8192 225 2505 4 0 f1 LFH 002e0000 00001002 256 4 256 2 1 1 0 0 00280000 00001002 1088 132 1088 4 6 2 0 0 00c70000 00041002 256 4 256 2 1 1 0 0 002d0000 00001002 1088 168 1088 12 26 2 0 0 00450000 00001002 256 4 256 0 1 1 0 0 07230000 00041002 256 4 256 2 1 1 0 0 00c10000 00001002 256 228 256 26 69 1 0 0 LFH 09b50000 00001002 256 80 256 39 25 1 0 0 09d00000 00001002 64 4 64 2 1 1 0 0 09ef0000 00001002 1088 132 1088 6 5 2 0 0 004c0000 00001002 1088 220 1088 26 173 2 0 0 09760000 00041002 256 28 256 4 8 1 0 0 09ed0000 00001002 64 12 64 1 1 1 0 0 0b210000 00001002 3136 1456 3136 74 71 3 0 0 LFH 0a700000 00001002 256 212 256 2 1 1 0 0 0e1e0000 00011002 256 4 256 0 1 1 0 0 0d030000 00001002 256 16 256 1 1 1 0 0 11b30000 00001002 47808 46068 47808 396 6836 7 0 0 -----------------------------------------------------------------------------

這次有異常了,可以看到11b30000這一行記憶體提交變化很大 47808 - 1088 = 46720;

這次可以肯定問題就在這個堆裡邊;

3、進去看看11b30000,使用命令:!heap -stat -h 11b30000

80.dmp


0:051> !heap -stat -h 11b30000 
 heap @ 11b30000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    1f0 102d9 - 1f58470  (92.48)
    18 102b0 - 184080  (4.47)
    10 102ae - 102ae0  (2.98)
    214 13 - 277c  (0.03)
    1000 2 - 2000  (0.02)
    800 2 - 1000  (0.01)
    220 1 - 220  (0.00)
    1d7 1 - 1d7  (0.00)
    80 3 - 180  (0.00)
    a4 1 - a4  (0.00)
    24 4 - 90  (0.00)
    14 4 - 50  (0.00)
    4a 1 - 4a  (0.00)
    25 2 - 4a  (0.00)
    48 1 - 48  (0.00)
    46 1 - 46  (0.00)
    41 1 - 41  (0.00)
    3e 1 - 3e  (0.00)
    3c 1 - 3c  (0.00)
    37 1 - 37  (0.00)

 

可以看到前面3項幾乎佔據99%的記憶體提交記錄;尤其以記憶體塊大小為1f0的資料塊使用最多記憶體;

到目前為止,我們知道了幾項有效資訊,有大小分別為1f0、18、10的三種資料塊,不斷申請出新空間;

但是這樣還不夠,根據一個記憶體塊的大小並不能準確定位是哪裡出了問題,這是一個結構體?還是字串?還是陣列?

都不知道,所以有必要進去看看,有哪些地方使用到了這些資料塊

 

4、檢視使用了1f0資料塊大小的位置列表,使用命令:!heap -flt s [size]

80.dmp

0:051> !heap -flt s 1f0
    _DPH_HEAP_ROOT @ 5a1000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ 780000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        0078e5b8 0045 0000  [00]   0078e5e0    001f0 - (busy)
    _DPH_HEAP_ROOT @ 9d11000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ 4c0000
    _DPH_HEAP_ROOT @ af41000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ b210000
        0cf61680 0045 0045  [00]   0cf616a8    001f0 - (busy)
    _DPH_HEAP_ROOT @ d871000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ d030000
    _DPH_HEAP_ROOT @ 11631000
    Freed and decommitted blocks
      DPH_HEAP_BLOCK : VirtAddr VirtSize
    Busy allocations
      DPH_HEAP_BLOCK : UserAddr  UserSize - VirtAddr VirtSize
    _HEAP @ 11b30000
        11b312e8 0045 0045  [00]   11b31310    001f0 - (busy)
        11b315a8 0045 0045  [00]   11b315d0    001f0 - (busy)
        11b356f8 0045 0045  [00]   11b35720    001f0 - (busy)
        11b35920 0045 0045  [00]   11b35948    001f0 - (busy)
        11b36f30 0045 0045  [00]   11b36f58    001f0 - (busy)
        11b37b58 0045 0045  [00]   11b37b80    001f0 - (busy)
        11b37e18 0045 0045  [00]   11b37e40    001f0 - (busy)
        11b3e4f0 0045 0045  [00]   11b3e518    001f0 - (busy)
        11b3f570 0045 0045  [00]   11b3f598    001f0 - (busy)
        11b3f830 0045 0045  [00]   11b3f858    001f0 - (busy)
        11b3faf0 0045 0045  [00]   11b3fb18    001f0 - (busy)
        11b3fdb0 0046 0045  [00]   11b3fdd8    001f0 - (busy)
        12890578 0045 0046  [00]   128905a0    001f0 - (busy)
        ......  

可以看到有很多堆都有使用到1f0大小的記憶體塊,但是隻有最後一個堆 _DPH_HEAP_ROOT @ 11631000

是記錄最多的,滿屏都是,這裡只能截斷,選取一部分看看  

5、檢視呼叫堆疊,使用命令:!heap -p -a [address]

80.dmp

0:051> !heap -p -a 11b3fdd8
    address 11b3fdd8 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        11b3fdb0 0046 0000  [00]   11b3fdd8    001f0 - (busy)
        Trace: 083a
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a

 
0:051> !heap -p -a 11b3fdd8
    address 11b3fdd8 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        11b3fdb0 0046 0000  [00]   11b3fdd8    001f0 - (busy)
        Trace: 083a
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a

 
0:051> !heap -p -a 11b3fb18
    address 11b3fb18 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        11b3faf0 0045 0000  [00]   11b3fb18    001f0 - (busy)
        Trace: 083a
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a  

  隨意挑選幾個檢視呼叫堆疊,似乎沒有有用的特徵資訊,verifier、ntdll、msvcr90這些都是作業系統核心級別的函式;

並不能暴露出使用1f0大小的資料塊大概位置,這就有點難辦了,難道此路不通?如果不找到有效堆疊資訊,想定位

內心洩漏點,靠單步除錯會相當麻煩。。。

  不急,先看看,這些地方記憶體塊內容是什麼,說不定能找到一些有效特徵資訊;

使用命令:db [UserPtr]

80.dmp

0:051> db 11b3fb18
11b3fb18  00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb28  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb38  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb48  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb58  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb68  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb78  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fb88  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
0:051> db 11b3fdd8
11b3fdd8  00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fde8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fdf8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe08  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe18  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe28  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe38  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe48  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
0:051> db 11b3fdd8
11b3fdd8  00 00 04 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fde8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fdf8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe08  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe18  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe28  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe38  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
11b3fe48  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................  

結果是令人失望的;

顯示這些基本都是空白記憶體,裡邊已經沒有任何有效資訊,,

陷入死衚衕裡了,難道到此為止?

還不死心,我們再看看這些地址有沒有引用跟,如果有引用跟,也可以列印堆疊資訊

使用命令:!gcroot [UserPtr]

80.dmp

0:051> !gcroot 11b3fb18
Found 0 unique roots (run '!GCRoot -all' to see all roots).
0:051> !gcroot 11b3fdd8
Found 0 unique roots (run '!GCRoot -all' to see all roots).
0:051> !gcroot 11b3fdd8
Found 0 unique roots (run '!GCRoot -all' to see all roots).

 願望是美好的,這個大小位1f0的資料塊被申請了0x102d9次,使用!gcroot命令檢視得到貌似都是無引用的野資料

我們再來看看,這個 _DPH_HEAP_ROOT @ 11631000堆的建立堆疊

80.dmp

0:051> dt ntdll!_DPH_HEAP_ROOT CreateStackTrace 11631000
   +0x0b8 CreateStackTrace : 0x04d54f8c _RTL_TRACE_BLOCK
0:051> dds 0x04d54f8c 
04d54f8c  04d1b714
04d54f90  0000f801
04d54f94  000f0000
04d54f98  74058969 verifier!AVrfDebugPageHeapCreate+0x439
04d54f9c  77cbcea2 ntdll!RtlCreateHeap+0x41
04d54fa0  757356bc KERNELBASE!HeapCreate+0x50
04d54fa4  66463a4a msvcr90!_heap_init+0x1b
04d54fa8  66422bb4 msvcr90!__p__tzname+0x2a
04d54fac  66422d5e msvcr90!_CRTDLL_INIT+0x1e
04d54fb0  77c79264 ntdll!LdrpCallInitRoutine+0x14
04d54fb4  77c7fe97 ntdll!LdrpRunInitializeRoutines+0x26f
04d54fb8  77c7ea4e ntdll!LdrpLoadDll+0x472
04d54fbc  77cbd3df ntdll!LdrLoadDll+0xc7
04d54fc0  75732e6a KERNELBASE!LoadLibraryExW+0x233
04d54fc4  7562483c kernel32!LoadLibraryW+0x11
04d54fc8  6d3d18de*** WARNING: Unable to verify checksum for Win32Project1.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for Win32Project1.dll - 
 Win32Project1+0x18de
04d54fcc  6d3d28fc Win32Project1!BACNet::Init+0x5c
04d54fd0  6d3d5925 Win32Project1!Init+0x25
04d54fd4  66639972*** WARNING: Unable to verify checksum for SMDB.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for SMDB.dll - 
 SMDB!LogPop+0x12
04d54fd8  66639452 SMDB!CreateSharedMemory+0x12
04d54fdc  6d8e47bd clrjit!Compiler::impImportBlockCode+0x2aac [f:\dd\ndp\clr\src\jit32\importer.cpp @ 10258]
04d54fe0  6d8c2e6b clrjit!Compiler::impImportBlock+0x5f [f:\dd\ndp\clr\src\jit32\importer.cpp @ 13246]
04d54fe4  6d8c306a clrjit!Compiler::impImport+0x235 [f:\dd\ndp\clr\src\jit32\importer.cpp @ 14195]
04d54fe8  6d8c364f clrjit!Compiler::compCompile+0x62 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 2491]
04d54fec  6d8c4276 clrjit!Compiler::compCompileHelper+0x32f [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3615]
04d54ff0  6d8c43fc clrjit!Compiler::compCompile+0x2ab [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 3086]
04d54ff4  6d8c45c8 clrjit!jitNativeCode+0x1f6 [f:\dd\ndp\clr\src\jit32\compiler.cpp @ 4057]
04d54ff8  6d8c377d clrjit!CILJit::compileMethod+0x7d [f:\dd\ndp\clr\src\jit32\ee_il_dll.cpp @ 180]
04d54ffc  633b39b3 clr!invokeCompileMethodHelper+0x10b
04d55000  633b3a8b clr!invokeCompileMethod+0x3d
04d55004  633b3ae8 clr!CallCompileMethodWithSEHWrapper+0x39
04d55008  633b3d97 clr!UnsafeJitFunction+0x431

動態庫Win32Project1.dll是對MSTP_DLL動態庫的再次封裝可以確定不存在記憶體洩漏問題;

看到這個堆是在於硬體裝置通訊的時候,初始化時CLR建立的執行緒;

不過知道這個好像也沒有什麼用,因為我們本來就知道是BACnet協議通訊的動態庫有問題;

只能說明是初始化之後產生的記憶體洩漏;

 

但是為什麼這些無跟指標沒有被垃圾回收? 

但是仔細一想,好像也是正常,因為這些是可以明確的在C語言編寫的動態庫裡申請的記憶體,屬於不受託管的記憶體;

C#垃圾回收也只能回收託管記憶體,所以這部分資料不主動釋放,那就會永遠在那裡;

但是現在,好像陷入死衚衕了,找不到思路,既然如此就先放放,先看看其他兩個資料塊的呼叫情況;

6、!heap -flt s 18

80.dmp

> !heap -flt s 18
...
        16f45098 000a 000a  [00]   16f450c0    00018 - (busy)
        16f45358 000a 000a  [00]   16f45380    00018 - (busy)
        16f45618 000a 000a  [00]   16f45640    00018 - (busy)
        16f458d8 000a 000a  [00]   16f45900    00018 - (busy)
        16f45b98 000a 000a  [00]   16f45bc0    00018 - (busy)
        16f46080 000a 000a  [00]   16f460a8    00018 - (busy)
        16f46118 000a 000a  [00]   16f46140    00018 - (busy)
        16f461b0 000a 000a  [00]   16f461d8    00018 - (busy)
        16f46248 000a 000a  [00]   16f46270    00018 - (busy)
        16f462e0 000a 000a  [00]   16f46308    00018 - (busy)
        16f46378 000a 000a  [00]   16f463a0    00018 - (busy)
        16f46410 000a 000a  [00]   16f46438    00018 - (busy)
        16f464a8 000b 000a  [00]   16f464d0    00018 - (busy)
        16f46548 000a 000b  [00]   16f46570    00018 - (busy)
        16f46808 000a 000a  [00]   16f46830    00018 - (busy)
        16f46ac8 000a 000a  [00]   16f46af0    00018 - (busy)
        16f46d88 000a 000a  [00]   16f46db0    00018 - (busy)
        16f47048 000a 000a  [00]   16f47070    00018 - (busy)
        16f47308 000a 000a  [00]   16f47330    00018 - (busy)
...

7、隨意挑幾個看看,命令:!heap -p -a [UserPtr]  

80.dmp

0:051> !heap -p -a 
invalid address  passed to `-p -a'0:051> !heap -p -a 16f460a8    
    address 16f460a8 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        16f46080 000a 0000  [00]   16f460a8    00018 - (busy)
        Trace: 074b
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for MSTP_DLL.dll - 
        669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091

 
0:051> !heap -p -a 16f46570    
    address 16f46570 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        16f46548 000a 0000  [00]   16f46570    00018 - (busy)
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091

 
0:051> !heap -p -a 16f46308
    address 16f46308 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        16f462e0 000a 0000  [00]   16f46308    00018 - (busy)
        Trace: 074b
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669baea1 MSTP_DLL!MSTP_Get_RPM_ACK_Data+0x00000091  

這次很順利,這個記憶體使用的地方實在MSTP_DLL的 MSTP_Get_RPM_ACK_Data裡邊;這個就是我們要找的最終的記憶體洩漏點資訊;

同樣操作堆10大小的資料塊操作一遍

80.dmp

> !heap -flt s 10
...
        15359fa0 0009 0009  [00]   15359fc8    00010 - (busy)
        1535a2a0 0009 0009  [00]   1535a2c8    00010 - (busy)
        1535a560 0009 0009  [00]   1535a588    00010 - (busy)
        1535aee8 0009 0009  [00]   1535af10    00010 - (busy)
        1535af80 0009 0009  [00]   1535afa8    00010 - (busy)
        1535b018 0009 0009  [00]   1535b040    00010 - (busy)
        1535b360 0009 0009  [00]   1535b388    00010 - (busy)
        1535b620 0009 0009  [00]   1535b648    00010 - (busy)
        1535c420 0009 0009  [00]   1535c448    00010 - (busy)
        1535d220 0009 0009  [00]   1535d248    00010 - (busy)
        1535d4e0 0009 0009  [00]   1535d508    00010 - (busy)
        1535d7a0 0009 0009  [00]   1535d7c8    00010 - (busy)
        1535da60 0009 0009  [00]   1535da88    00010 - (busy)
        1535dd20 0009 0009  [00]   1535dd48    00010 - (busy)
        1535dfe0 0009 0009  [00]   1535e008    00010 - (busy)
        1535e2a0 0009 0009  [00]   1535e2c8    00010 - (busy)
        1535e560 0009 0009  [00]   1535e588    00010 - (busy)
        1535e820 0009 0009  [00]   1535e848    00010 - (busy)
        1535eae0 0009 0009  [00]   1535eb08    00010 - (busy)
        1535eda0 0009 0009  [00]   1535edc8    00010 - (busy)
        1535f060 0009 0009  [00]   1535f088    00010 - (busy)
        1535f320 0009 0009  [00]   1535f348    00010 - (busy)
        1535f5e0 0009 0009  [00]   1535f608    00010 - (busy)
...
80.dmp

0:051> !heap -p -a 1535eb08    
    address 1535eb08 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        1535eae0 0009 0000  [00]   1535eb08    00010 - (busy)
        Trace: 0817
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b

 
0:051> !heap -p -a 1535f088    
    address 1535f088 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        1535f060 0009 0000  [00]   1535f088    00010 - (busy)
        Trace: 0817
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b

 
0:051> !heap -p -a 1535f348    
    address 1535f348 found in
    _HEAP @ 11b30000
      HEAP_ENTRY Size Prev Flags    UserPtr UserSize - state
        1535f320 0009 0000  [00]   1535f348    00010 - (busy)
        Trace: 0817
        7405a6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
        74058f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
        77d10fe6 ntdll!RtlDebugAllocateHeap+0x00000030
        77ccab8e ntdll!RtlpAllocateHeap+0x000000c4
        77c73461 ntdll!RtlAllocateHeap+0x0000023a
        664668e5 msvcr90!_calloc_impl+0x00000125
        66463c5a msvcr90!calloc+0x0000001a
        669bb07b MSTP_DLL!MSTP_Get_RP_ACK_Data+0x0000003b  

這次也順利拿到另一個記憶體洩漏的位置資訊在MSTP_DLL的 MSTP_Get_RP_ACK_Data裡邊;  

MSTP_Get_RP_ACK_Data

MSTP_Get_RPM_ACK_Data

這兩個方法其實是讀取模組點數值或者收集模組資訊的時候返回的一個資料指標;

現在很明顯這兩個方法返回的指標可能是有問題的,裡邊非常大的可能存在記憶體洩漏;

7、驗證

跟同事找到原來的MSTP_DLL的原始碼,找到以上兩個方法體

 

 可以看到當初那位同事設計這個方法的時候,很明顯有2個錯誤;

1)返回的指標只見宣告記憶體空間,不見釋放;

2)返回資料的指標不應該在方法體中的返回值中傳出來,應該寫在方法引數中,外部宣告,傳進去賦值,然後外部使用,再外部釋放

3)兩個方法體都一樣的問題

 

五、整理

1)我們知道有三處記憶體洩漏,分別大小是1f0、18、10 

2)三者佔據99%的新增不釋放的記憶體消耗

3)我們已經找到其中兩個洩漏位置,還剩下一個

4)1f0是重中之重,佔據記憶體消耗92%,不解決這個BUG,問題基本就相當於沒解決

5)無法找到1f0的呼叫堆疊資訊,無明顯特徵資訊,無引用跟;

5)emmmmm? (第二聲)

 

好像被我們錯過了一個資訊, 

是否還記得最開始那一段?

80.dmp

0:051> !heap -stat -h 11b30000 
 heap @ 11b30000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    1f0 102d9 - 1f58470  (92.48)
    18 102b0 - 184080  (4.47)
    10 102ae - 102ae0  (2.98)  

 這幾個資料很接近,都是申請次數大小,也就是說著三個資料塊被申請的次數差不多。。

鑑於此,我們再去看看33M記憶體的時候這幾個次數的值是多少

33.dmp

0:047> !heap -s
LFH Key                   : 0x343fce0b
Termination on corruption : ENABLED
  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                    (k)     (k)    (k)     (k) length      blocks cont. heap 
-----------------------------------------------------------------------------
00780000 00000002    8192   4636   8192    209  2484     4    0      e   LFH
002e0000 00001002     256      4    256      2     1     1    0      0      
00280000 00001002    1088     72   1088      5     2     2    0      0      
00c70000 00041002     256      4    256      2     1     1    0      0      
002d0000 00001002    1088    132   1088      8    23     2    0      0      
00450000 00001002     256      4    256      0     1     1    0      0      
07230000 00041002     256      4    256      2     1     1    0      0      
00c10000 00001002     256    216    256      3    39     1    0      0   LFH
09b50000 00001002     256     80    256     39    28     1    0      0      
09d00000 00001002      64      4     64      2     1     1    0      0      
09ef0000 00001002    1088     72   1088      6     2     2    0      0      
004c0000 00001002    1088    192   1088     15   140     2    0      0      
09760000 00041002     256     28    256      4     4     1    0      0      
09ed0000 00001002      64     12     64      1     1     1    0      0      
0b210000 00001002    3136   1456   3136     52    84     3    0      0   LFH
0a700000 00001002     256    212    256      2     1     1    0      0      
0e1e0000 00011002     256      4    256      0     1     1    0      0      
0d030000 00001002     256     16    256      3     1     1    0      0      
11b30000 00001002    1088    388   1088      0     1     2    0      0      
-----------------------------------------------------------------------------
0:047> !heap -stat -h 11b30000 
 heap @ 11b30000
group-by: TOTSIZE max-display: 20
    size     #blocks     total     ( %) (percent of total busy bytes)
    1f0 1f2 - 3c4e0  (86.13)
    18 1c9 - 2ad8  (3.82)
    1000 2 - 2000  (2.86)
    10 1c7 - 1c70  (2.54)
    214 c - 18f0  (2.23)
    800 2 - 1000  (1.43)
    220 1 - 220  (0.19)
    1d7 1 - 1d7  (0.16)
    80 3 - 180  (0.13)
    a4 1 - a4  (0.06)
    24 4 - 90  (0.05)
    14 4 - 50  (0.03)
    4a 1 - 4a  (0.03)
    25 2 - 4a  (0.03)
    48 1 - 48  (0.03)
    46 1 - 46  (0.02)
    41 1 - 41  (0.02)
    3e 1 - 3e  (0.02)
    3c 1 - 3c  (0.02)
    37 1 - 37  (0.02)  

 分別是1f2、1c9、1c7;

1f0:102d9 - 1f2 = 65767

18:102b0 - 1c9 = 65767

10:102ae - 1c7 = 65767

居然申請的次數一模一樣!

穩了!這個1f0可以斷定與其他兩個緊密相關;首先懷疑的就是

MSTP_Get_RP_ACK_Data

MSTP_Get_RPM_ACK_Data

1)這兩個方法體中使用到的所有子方法體有沒有申請空間的語句;

2)申請的空間大小是不是就是1f0;

 

依據上面的推測,再次閱讀那2個方法體;

 

 

 

 

經過分析BACNET_APPLICATION_DATA_VALUE結構體大小剛好就是1f0 

好了,搞定

 

如果對你有幫助,請點贊、評論;  

 

 

相關文章