DolphinScheduler整合Arthas實現介面呼叫監控,提升排程任務可靠性

海豚调度發表於2024-11-06

本文介紹了在Apache DolphinScheduler中嵌入Arthas的方法,以實現對介面呼叫的監控。Arthas是一款強大的 Java診斷工具,能夠幫助開發者實時檢視應用程式的執行狀態、效能瓶頸和方法呼叫情況。在DolphinScheduler中整合Arthas,可以方便地捕獲任務排程時的關鍵呼叫資訊,及時發現並解決效能問題,提高系統的穩定性。本文將詳細說明如何在DolphinScheduler環境下啟動Arthas,監控特定介面的呼叫,並分析收集到的效能資料,從而提升任務排程的可靠性和可維護性。

手動安裝

https://arthas.aliyun.com/download/latest_version?mirror=aliyun
arthas-packaging-3.7.2-bin.zip

cp arthas-packaging-3.7.2-bin.zip /opt/arthas
cd /opt/arthas
unzip arthas-packaging-3.7.2-bin.zip

java -jar arthas-boot.jar

選擇對應的程序號

報錯解決

報錯1

[ERROR] Start arthas failed, exception stack trace: 
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)
        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)
        at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)
        at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)
        at com.taobao.arthas.core.Arthas.main(Arthas.java:161)

解決 :

進入 ${DOLPINSCHEUDLER_HOME}/api-server/bin下,在 jvm_args_env.sh 中新增如下 :
-XX:+StartAttachListener

報錯2

Picked up JAVA_TOOL_OPTIONS: 
java.io.IOException: well-known file /tmp/.java_pid731688 is not secure: file should be owned by the current user (which is 0) but is owned by 989
        at sun.tools.attach.LinuxVirtualMachine.checkPermissions(Native Method)
        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:117)
        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:78)
        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:250)
        at com.taobao.arthas.core.Arthas.attachAgent(Arthas.java:102)
        at com.taobao.arthas.core.Arthas.<init>(Arthas.java:27)
        at com.taobao.arthas.core.Arthas.main(Arthas.java:161)
[ERROR] Start arthas failed, exception stack trace: 
[ERROR] attach fail, targetPid: 731688

解決

arthas啟動的服務和dolpinscheduler啟動服務所屬的使用者要一樣,不然有如上的報錯

Watch

Watch 用於監控方法的具體執行細節,如引數、返回值等

watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObj
[arthas@731688]$ watch org.apache.dolphinscheduler.api.controller.UsersController queryUserList returnObj
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 126 ms, listenerId: 2
method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExit
ts=2024-08-27 02:04:01; [cost=4.918943ms] result=@Result[
    code=@Integer[0],
    msg=@String[成功],
    data=@PageInfo[PageInfo(totalList=[User(id=1, userName=admin, userPassword=null, email=825193156@qq.com, phone=, userType=ADMIN_USER, tenantId=1, state=1, tenantCode=hdfs, queueName=default, alertGroup=null, queue=default, timeZone=null, createTime=Fri Jul 19 04:19:31 GMT-05:00 2024, updateTime=Mon Aug 12 22:15:58 GMT-05:00 2024)], total=1, totalPage=1, pageSize=10, currentPage=1, pageNo=0)],
]
method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExit
ts=2024-08-27 02:04:18; [cost=6.905345ms] result=@Result[
    code=@Integer[0],
    msg=@String[成功],
    data=@PageInfo[PageInfo(totalList=[User(id=1, userName=admin, userPassword=null, email=825193156@qq.com, phone=, userType=ADMIN_USER, tenantId=1, state=1, tenantCode=hdfs, queueName=default, alertGroup=null, queue=default, timeZone=null, createTime=Fri Jul 19 04:19:31 GMT-05:00 2024, updateTime=Mon Aug 12 22:15:58 GMT-05:00 2024)], total=1, totalPage=1, pageSize=10, currentPage=1, pageNo=0)],
]
method=org.apache.dolphinscheduler.api.controller.UsersController.queryUserList location=AtExit
ts=2024-08-27 02:04:27; [cost=5.803269ms] result=@Result[
    code=@Integer[0],
    msg=@String[成功],
    data=@PageInfo[PageInfo(totalList=[User(id=1, userName=admin, userPassword=null, email=825193156@qq.com, phone=, userType=ADMIN_USER, tenantId=1, state=1, tenantCode=hdfs, queueName=default, alertGroup=null, queue=default, timeZone=null, createTime=Fri Jul 19 04:19:31 GMT-05:00 2024, updateTime=Mon Aug 12 22:15:58 GMT-05:00 2024)], total=1, totalPage=1, pageSize=10, currentPage=1, pageNo=0)],
]

Trace

Trace 用於監控方法呼叫的深度,包括呼叫了哪些方法以及每個方法的執行時間。

[arthas@973263]$ trace org.apache.dolphinscheduler.api.controller.UsersController queryUserList 
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 319 ms, listenerId: 1
`---ts=2024-08-27 10:33:08;thread_name=qtp1836984213-26;id=26;is_daemon=false;priority=5;TCCL=sun.misc.Launcher$AppClassLoader@439f5b3d
    `---[13.962731ms] org.apache.dolphinscheduler.api.controller.UsersController:queryUserList()
        +---[0.18% 0.025123ms ] org.apache.dolphinscheduler.api.controller.UsersController:checkPageParams() #130
        +---[0.09% 0.012549ms ] org.apache.dolphinscheduler.plugin.task.api.utils.ParameterUtils:handleEscapes() #131
        `---[96.47% 13.469876ms ] org.apache.dolphinscheduler.api.service.UsersService:queryUserList() #132

Dump

heapdump arthas-output/dump.hprof 生成堆轉儲檔案:

[arthas@973263]$ heapdump arthas-output/dump.hprof
Dumping heap to arthas-output/dump.hprof ...
Heap dump file created

使用MAT進行記憶體洩漏分析。

檢視jvm記憶體變化

memory檢視JVM記憶體

[arthas@973263]$ memory 
Memory                                                         used                 total                max                  usage                
heap                                                           485M                 900M                 900M                 53.91%               
ps_eden_space                                                  277M                 327M                 358M                 77.61%               
ps_survivor_space                                              61M                  61M                  61M                  99.98%               
ps_old_gen                                                     146M                 512M                 512M                 28.54%               
nonheap                                                        162M                 188M                 -1                   85.96%               
code_cache                                                     11M                  32M                  240M                 4.89%                
metaspace                                                      135M                 140M                 -1                   96.67%               
compressed_class_space                                         14M                  15M                  1024M                1.43%                
direct                                                         949K                 949K                 -                    100.00%              
mapped                                                         0K                   0K                   -                    0.00% 

檢視CPU使用率

dashboard 可以檢視CPU使用率,檢視是哪個執行緒的,透過 thread -n 執行緒id檢視:

file

轉載自Journey

原文連結:https://segmentfault.com/a/1190000045219355

本文由 白鯨開源 提供釋出支援!

相關文章