教程:官方文件https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-memory-leak
環境:Linux、Docker、.NET Core 3.1 SDK及更高版本
示例程式碼:https://github.com/dotnet/samples/tree/master/core/diagnostics/DiagnosticScenarios
一 執行官方示例
示例中包含了洩漏記憶體、死鎖執行緒、CPU佔用過高等介面,方便學習。
1,Clone程式碼並編譯
[root@localhost ~]# cd Diagnostic_scenarios_sample_debug_target
[root@localhost Diagnostic_scenarios_sample_debug_target]# dotnet build -c Release
Microsoft (R) Build Engine version 16.7.0+7fb82e5b2 for .NET
...
Build succeeded.
0 Warning(s)
0 Error(s)
Time Elapsed 00:00:05.62
2,建立Dockerfile構建映象
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base
WORKDIR /app
EXPOSE 80
COPY bin/Release/netcoreapp3.1 .
ENTRYPOINT ["dotnet", "DiagnosticScenarios.dll"]
[root@localhost Diagnostic_scenarios_sample_debug_target]# vim Dockerfile
[root@localhost Diagnostic_scenarios_sample_debug_target]# docker build -t dumptest .
Sending build context to Docker daemon 1.47MB
Step 1/5 : FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base
...
Successfully built 928e512d9be4
Successfully tagged dumptest:latest
[root@localhost Diagnostic_scenarios_sample_debug_target]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
dumptest latest 928e512d9be4 5 seconds ago 267MB
3,啟動容器
--privileged=true
引數的作用是給容器賦予root許可權,後面createdump需要該支援。
[root@localhost Diagnostic_scenarios_sample_debug_target]# docker run --name diagnostic --privileged=true -p 888:80 -d dumptest
a68efbfbfb13117142bac9b0c6fb9f4c5124e4490eaf4850dd17fdc612e2cfda
[root@localhost Diagnostic_scenarios_sample_debug_target]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a68efbfbfb13 dumptest "dotnet DiagnosticSc…" 24 seconds ago Up 23 seconds 0.0.0.0:888->80/tcp diagnostic
看看能否訪問成功 http://192.168.0.161:888/api/values
二 生成dump轉儲檔案
很多時候問題出現在生產環境,而測試環境又很難重現,所以問題出現時優先生成一個dump檔案,然後去恢復服務(重啟解決大部分問題啊?),最後再來慢慢分析問題是一個比較合理的方案。
1,製造問題
通過示例中提供的介面生成一些死鎖執行緒和記憶體洩漏。
http://192.168.0.161:888/api/diagscenario/deadlock
http://192.168.0.161:888/api/diagscenario/memleak/101024
2,建立dump檔案
進入示例容器,通過find
找到netcore自帶的createdump工具;
執行createdump路徑 PID
命令建立dump檔案(如果容器內只有一個應用,一般PID預設為1,也可以使用top命令來檢視PID)
容器佔用越大createdump越慢,建立完之後退出容器,將coredump.1檔案拷貝到宿主機慢慢分析。
[root@localhost dumpfile]# docker exec -it diagnostic bash
root@6b6f0a7ebe10:/app# find / -name createdump
/usr/share/dotnet/shared/Microsoft.NETCore.App/3.1.10/createdump
root@6b6f0a7ebe10:/app# /usr/share/dotnet/shared/Microsoft.NETCore.App/3.1.10/createdump 1
Writing minidump with heap to file /tmp/coredump.1
Written 2920263680 bytes (712955 pages) to core file
root@6b6f0a7ebe10:/app# exit
[root@localhost dumpfile]# docker cp 6b6f0a7ebe10:/tmp/coredump.1 /mnt/dumpfile/coredump.1
[root@localhost dumpfile]# ls
coredump.1
三 分析dump檔案
dotnet-dump
工具依賴dotnet的sdk,如果宿主機中安裝了sdk可以直接在宿主機中分析;
如果不想汙染宿主機環境可以拉取一個sdk映象mcr.microsoft.com/dotnet/core/sdk:3.1
,建立一個臨時環境用於分析。
1,建立一個用於分析的臨時容器
需要把用於存放coredump.1檔案的目錄掛載到容器,或者自己cp進去。
建立好容器之後需要安裝dotnet-dump
工具
[root@iZwz9883hajros0tZ ~]# docker run --rm -it -v /mnt/dumpfile:/tmp/coredump mcr.microsoft.com/dotnet/core/sdk:3.1
root@6e560e9f5d55:/# cd /tmp/coredump
root@6e560e9f5d55:/tmp/coredump# ls
coredump.1
root@6e560e9f5d55:/tmp/coredump# dotnet tool install -g dotnet-dump
Tools directory '/root/.dotnet/tools' is not currently on the PATH environment variable.
...
export PATH="$PATH:/root/.dotnet/tools"
You can invoke the tool using the following command: dotnet-dump
Tool 'dotnet-dump' (version '5.0.152202') was successfully installed.
root@6e560e9f5d55:/tmp/coredump# export PATH="$PATH:/root/.dotnet/tools"
root@6e560e9f5d55:/tmp/coredump#
2,分析死鎖
利用clrstack
命令輸出呼叫堆疊。
root@6e560e9f5d55:/tmp/coredump# dotnet-dump analyze coredump.1
Loading core dump: coredump.1 ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
> clrstack -all
OS Thread Id: 0x311
Child SP IP Call Site
00007FCDE06DF530 00007fd1521e600c [GCFrame: 00007fcde06df530]
00007FCDE06DF620 00007fd1521e600c [GCFrame: 00007fcde06df620]
00007FCDE06DF680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcde06df680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
00007FCDE06DF7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1()
00007FCDE06DF800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCDE06DF820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCDE06DF870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCDE06DFBD0 00007fd1512f849f [GCFrame: 00007fcde06dfbd0]
00007FCDE06DFCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcde06dfca0]
OS Thread Id: 0x312
Child SP IP Call Site
00007FCDDFEDE530 00007fd1521e600c [GCFrame: 00007fcddfede530]
00007FCDDFEDE620 00007fd1521e600c [GCFrame: 00007fcddfede620]
00007FCDDFEDE680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddfede680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
00007FCDDFEDE7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1()
00007FCDDFEDE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCDDFEDE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCDDFEDE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCDDFEDEBD0 00007fd1512f849f [GCFrame: 00007fcddfedebd0]
00007FCDDFEDECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddfedeca0]
...
300多個執行緒的呼叫堆疊大多數執行緒共享一個公共呼叫堆疊,
該呼叫堆疊似乎顯示請求傳入了死鎖方法,而死鎖方法繼而又呼叫了 Monitor.ReliableEnter,此方法表示這些執行緒正試圖進入鎖定,然而這個obj可能已被其他執行緒獲取了排它鎖了。
光看這個呼叫堆疊,太難看出問題... 但是結合原始碼就很容易看出DeadlockFunc這個方法早就已經將o1,o2兩個object造成了交叉死鎖,然後後面啟動的300個執行緒都試圖獲取獲取鎖定,結果就是無限等待。
00007FCDDEEDC530 00007fd1521e600c [GCFrame: 00007fcddeedc530]
00007FCDDEEDC620 00007fd1521e600c [GCFrame: 00007fcddeedc620]
00007FCDDEEDC680 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fcddeedc680] System.Threading.Monitor.ReliableEnter(System.Object, Boolean ByRef)
00007FCDDEEDC7D0 00007FD0DB4F765A testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_1()
00007FCDDEEDC800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCDDEEDC820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCDDEEDC870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCDDEEDCBD0 00007fd1512f849f [GCFrame: 00007fcddeedcbd0]
00007FCDDEEDCCA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fcddeedcca0]
利用syncblk
命令找出實際持有排它鎖的執行緒。
> syncblk
Index SyncBlock MonitorHeld Recursion Owning Thread Info SyncBlock Owner
25 0000000000E59BB8 603 1 00007FCE8C00FB30 1e8 14 00007fcea81ffb98 System.Object
26 0000000000E59C00 3 1 00007FD0C8001FA0 1e9 15 00007fcea81ffbb0 System.Object
-----------------------------
Total 326
Free 307
Owning Thread Info
列下面有3個子列,第一列是地址,第二列是作業系統執行緒 ID,第三列是執行緒索引,也可以通過threads
命令拿到執行緒索引。
通過setthread
命令將執行緒切換到0x1e8上,然後用clrstack
檢視它的呼叫堆疊。
> setthread 14
> clrstack
OS Thread Id: 0x1e8 (14)
Child SP IP Call Site
00007FCE77FFE500 00007fd1521e600c [GCFrame: 00007fce77ffe500]
00007FCE77FFE5F0 00007fd1521e600c [GCFrame: 00007fce77ffe5f0]
00007FCE77FFE650 00007fd1521e600c [HelperMethodFrame_1OBJ: 00007fce77ffe650] System.Threading.Monitor.Enter(System.Object)
00007FCE77FFE7A0 00007FD0DB4F5D0C testwebapi.Controllers.DiagScenarioController.DeadlockFunc()
00007FCE77FFE7E0 00007FD0DB4F5B27 testwebapi.Controllers.DiagScenarioController.<deadlock>b__3_0()
00007FCE77FFE800 00007FD0D7A25862 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 44]
00007FCE77FFE820 00007FD0DB03686D System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/ExecutionContext.cs @ 201]
00007FCE77FFE870 00007FD0D7A2597E System.Threading.ThreadHelper.ThreadStart() [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 93]
00007FCE77FFEBD0 00007fd1512f849f [GCFrame: 00007fce77ffebd0]
00007FCE77FFECA0 00007fd1512f849f [DebuggerU2MCatchHandlerFrame: 00007fce77ffeca0]
這個執行緒的呼叫堆疊跟上面的300多個比起來多了一些資訊,可以看出它通過DiagScenarioController.DeadlockFunc中的Monitor.Enter方法獲得了某個object得排它鎖。
3,分析記憶體洩漏
SOS命令:https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/dotnet-dump#dotnet-dump-analyze
檢查當前所有託管型別的統計資訊 -min [byte]
可選引數可以限制統計範圍
root@6e560e9f5d55:/tmp/coredump# dotnet-dump analyze coredump.1
Loading core dump: coredump.1 ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
> dumpheap -stat
Statistics:
MT Count TotalSize Class Name
00007fd0dc2b3ab0 1 24 System.Collections.Generic.ObjectEqualityComparer`1[[System.Threading.ThreadPoolWorkQueue+WorkStealingQueue, System.Private.CoreLib]]
00007fd0dc29ca28 1 24 System.Collections.Generic.ObjectEqualityComparer`1[[Microsoft.AspNetCore.Mvc.ModelBinding.Metadata.DefaultModelMetadata, Microsoft.AspNetCore.Mvc.Core]]
...
00007fd0d7f85510 482 160864 System.Object[]
0000000000da9d80 19191 7512096 Free
00007fd0dc2b3718 2 8388656 testwebapi.Controllers.Customer[]
00007fd0dadbeb58 1010240 24245760 testwebapi.Controllers.Customer
00007fd0d7f90f90 1012406 95190854 System.String
可在此處看到大多數是String或Customer物件。String物件佔用了90MB左右的空間
使用方法表mt
分析佔用最多的System.String型別
>dumpheap -mt 00007f61876a0f90
00007fcfabc376f8 00007fd0d7f90f90 94
00007fcfabc37770 00007fd0d7f90f90 94
00007fcfabc377e8 00007fd0d7f90f90 94
00007fcfabc37860 00007fd0d7f90f90 94
00007fcfabc378d8 00007fd0d7f90f90 94
...
Statistics:
MT Count TotalSize Class Name
00007fd0d7f90f90 1012406 95190854 System.String
Total 1012406 objects
一百萬個物件,大小都是94byte,使用gcroot
命令隨便檢視幾個物件的根。
> gcroot -all 00007fcfabc378d8
HandleTable:
00007FD1508815F8 (pinned handle)
-> 00007FD0A7FFF038 System.Object[]
-> 00007FCEA830A230 testwebapi.Controllers.Processor
-> 00007FCEA830A248 testwebapi.Controllers.CustomerCache
-> 00007FCEA830A260 System.Collections.Generic.List`1[[testwebapi.Controllers.Customer, DiagnosticScenarios]]
-> 00007FD0B821F0A8 testwebapi.Controllers.Customer[]
-> 00007FCFABC378C0 testwebapi.Controllers.Customer
-> 00007FCFABC378D8 System.String
Found 1 roots.
>
基本上就是Customer,CustomerCache,Processor這幾個物件造成的記憶體佔用過高。
下一篇利用perf分析容器中CPU佔用的問題。
參考:
除錯記憶體洩漏教程 | Microsoft Docs https://docs.microsoft.com/zh-cn/dotnet/core/diagnostics/debug-memory-leak
dotnet core除錯docker下生成的dump檔案 https://www.cnblogs.com/iamsach/p/10118628.html