記一次 .NET 某婦產醫院 WPF記憶體溢位分析

一線碼農發表於2021-12-10

一:背景

1. 講故事

上個月有位朋友通過部落格園的短訊息找到我,說他的程式存在記憶體溢位情況,尋求如何解決。

要解決還得通過 windbg 分析啦。

二:Windbg 分析

1. 為什麼會記憶體溢位

大家都知道記憶體溢位對應著 .NET 中的 OutOfMemoryException 異常,這種異常有可能是託管程式碼手工丟擲的,也有可能是CLR層面丟擲的,言外之意就是可以通過兩種方式排查。

  • 託管執行緒是否掛載著異常?

0:000> !t
ThreadCount:      23
UnstartedThread:  0
BackgroundThread: 5
PendingThread:    0
DeadThread:       17
Hosted Runtime:   no
                                                                         Lock  
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 362c 00fac868     26020 Preemptive  7ED701A0:00000000 00fa6b60 0     STA 
   5    2 2d70 00fbeba0     2b220 Preemptive  7EBA7AC0:00000000 00fa6b60 0     MTA (Finalizer) 
   7    3 3264 061c8890   102a220 Preemptive  00000000:00000000 00fa6b60 0     MTA (Threadpool Worker) 
  17   15 3f98 19682b90   202b220 Preemptive  7EBB0830:00000000 00fa6b60 0     MTA 
XXXX   16    0 2845fb00     35820 Preemptive  00000000:00000000 00fa6b60 0     Ukn 
  18   14  a7c 2842b1c8   202b220 Preemptive  00000000:00000000 00fa6b60 0     MTA 
XXXX    6    0 2c9b3778   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   18    0 288a1318   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   23    0 288a22f0   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   10    0 2ccf3550   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   21    0 288a1860   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   12    0 288a1da8   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   11    0 2c993640   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX    8    0 2ccf3a98     35820 Preemptive  00000000:00000000 00fa6b60 0     Ukn 
XXXX    9    0 2ccf2030   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX    7    0 2c9aed88   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   26    0 28898308   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   25    0 2c492c68   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX    4    0 2c993b88   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   20    0 2c9af2d0   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   17    0 2c9afd60   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
XXXX   24    0 2c9b1280   1039820 Preemptive  00000000:00000000 00fa6b60 0     Ukn (Threadpool Worker) 
  23   22 2658 2c9b02a8   1029220 Preemptive  7ED5BFF8:00000000 00fa6b60 0     MTA (Threadpool Worker) 

從輸出資訊看,這些執行緒並沒有掛載任何託管異常,我去。。。

  • 是否在 CLR 上丟擲

這主要是看 託管堆(heap) 上的記憶體分配或者gc回收造成的記憶體不足,可以用 !ao 命令。


0:000> !ao
There was no managed OOM due to allocations on the GC heap

從輸出資訊看也沒有任何異常,尷尬了???。。。 尼瑪,那到底是因為什麼呢?

2. 探索溢位原因

出現這種尷尬情況,我只能懷疑生成這個dump的時候並沒有get到那個點,或者是我的知識邊界有限,不過天無絕人之路,不在那個 也肯定在那個 附近,對吧,接下來用 !address -summary 看一下記憶體使用的歸類資訊。


0:000> !address -summary

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
<unknown>                              1520          4c185000 (   1.189 GB)  65.57%   59.45%
Image                                  4306          1f140000 ( 497.250 MB)  26.78%   24.28%
Free                                   1133           bf17000 ( 191.090 MB)            9.33%
Heap                                    617           7626000 ( 118.148 MB)   6.36%    5.77%
Stack                                    72           1740000 (  23.250 MB)   1.25%    1.14%
Other                                    34             7b000 ( 492.000 kB)   0.03%    0.02%
TEB                                      24             30000 ( 192.000 kB)   0.01%    0.01%
PEB                                       1              3000 (  12.000 kB)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_MAPPED                              549          34b60000 ( 843.375 MB)  45.42%   41.18%
MEM_PRIVATE                            1718          20424000 ( 516.141 MB)  27.80%   25.20%
MEM_IMAGE                              4307          1f155000 ( 497.332 MB)  26.78%   24.28%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_COMMIT                             4904          66ddd000 (   1.607 GB)  88.64%   80.37%
MEM_RESERVE                            1670           d2fc000 ( 210.984 MB)  11.36%   10.30%
MEM_FREE                               1133           bf17000 ( 191.090 MB)            9.33%

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READONLY                          2272          382cf000 ( 898.809 MB)  48.41%   43.89%
PAGE_READWRITE                         1572          1eead000 ( 494.676 MB)  26.64%   24.15%
PAGE_EXECUTE_READ                       218           dd59000 ( 221.348 MB)  11.92%   10.81%
PAGE_WRITECOPY                          449           133e000 (  19.242 MB)   1.04%    0.94%
PAGE_EXECUTE_READWRITE                  188            ab4000 (  10.703 MB)   0.58%    0.52%
PAGE_NOACCESS                           156             9c000 ( 624.000 kB)   0.03%    0.03%
PAGE_READWRITE | PAGE_GUARD              48             78000 ( 480.000 kB)   0.03%    0.02%
PAGE_READWRITE | PAGE_WRITECOMBINE        1              2000 (   8.000 kB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
<unknown>                                   1d200000           a001000 ( 160.004 MB)
Image                                        fed1000           36e4000 (  54.891 MB)
Free                                        33dfe000           1082000 (  16.508 MB)
Heap                                        3da84000            a1b000 (  10.105 MB)
Stack                                        1a10000             fd000 (1012.000 kB)
Other                                       7fa40000             33000 ( 204.000 kB)
TEB                                           a4c000              3000 (  12.000 kB)
PEB                                           a3d000              3000 (  12.000 kB)

從上面的 MEM_COMMIT=1.607 GB 80.37% 資訊看,當前記憶體佔用 1.6G,佔比 80.37%,可以看出它受到了一個 2G記憶體 的限制,而且從 !t 輸出中的記憶體地址看,當前是 32bit 程式,所以這是一個經典的: 64系統跑著32位程式被2G記憶體限制 的問題。

3. 如何突破 2G 限制

要尋找答案,還得看最權威的 MSDN: https://docs.microsoft.com/en-us/windows/win32/memory/memory-limits-for-windows-releases?redirectedfrom=MSDN

破局 還得設定程式的 IMAGE_FILE_LARGE_ADDRESS_AWARE 標記。

關於具體怎麼設定,我找了三種方法。

  • 使用 LargeAddressAware 安裝包

參見 github: https://github.com/KirillOsenkov/LargeAddressAware

  • 使用 editbin

可以在 vs 的生成事件中輸入 editbin /largeaddressaware $(TargetPath)

  • 使用程式碼方式

這種可以直接給生成好的 exe 增加 LargeAddressAware 標記,除了標記,還能檢測,??


using System;
using System.IO;

namespace PEFile
{
    public class LargeAddressAware
    {
        public static bool IsLargeAddressAware(string filePath)
        {
            bool isLargeAddressAware = false;
            PrepareStream(filePath, (stream, binaryReader) => isLargeAddressAware = (binaryReader.ReadInt16() & 0x20) != 0);
            return isLargeAddressAware;
        }

        public static void SetLargeAddressAware(string filePath)
        {
            PrepareStream(filePath, (stream, binaryReader) =>
            {
                var value = binaryReader.ReadInt16();
                if ((value & 0x20) == 0)
                {
                    value = (short)(value | 0x20);
                    stream.Position -= 2;
                    var binaryWriter = new BinaryWriter(stream);
                    binaryWriter.Write(value);
                    binaryWriter.Flush();
                }
            });
        }

        private static void PrepareStream(string filePath, Action<Stream, BinaryReader> action)
        {
            using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))
            {
                if (stream.Length < 0x3C)
                {
                    return;
                }

                var binaryReader = new BinaryReader(stream);

                // MZ header
                if (binaryReader.ReadInt16() != 0x5A4D)
                {
                    return;
                }

                stream.Position = 0x3C;
                var peHeaderLocation = binaryReader.ReadInt32();

                stream.Position = peHeaderLocation;

                // PE header
                if (binaryReader.ReadInt32() != 0x4550)
                {
                    return;
                }

                stream.Position += 0x12;

                action(stream, binaryReader);
            }
        }
    }
}

更多辦法參考: https://stackoverflow.com/questions/639540/how-much-memory-can-a-32-bit-process-access-on-a-64-bit-operating-system

三:總結

總的來說,2G 記憶體限制 是一個 32bit 程式所必須面對的問題,知道了就好解決了,最後有一個問題要解釋下,為什麼 commit 記憶體高達 1.6G,這是因為醫療類的軟體,大多是 FastReport + DevExpress 這些重量級的經典搭配以及大量的圖片資源佔用了太多 native memory。

圖片名稱

相關文章