一、無法/var/crash生成檔案
1、參考配置:
https://cloud.tencent.cn/developer/article/2367955
2、BCoe8.2調整配置
3、手動生成crash
i.參考:引數詳解
https://blog.csdn.net/tombaby_come/article/details/134038949
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger
注意:執行上述配置,主機重啟,開始轉儲記憶體中資料到/var/crash目錄中。
4、檢查kdump
i.參考:kdump原理
https://zhuanlan.zhihu.com/p/684699511
二、crash工具和vmlinux核心一致性檢查
1、檢查/boot/vmlinuz-4.19.0-240.23.35.el8_2.bclinux.x86_64和/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux的md5值必需保持一致
2、主機核心vmlinux位置
/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux
3、異常當機vmcore檔案所在位置
/var/crash/127.0.0.1-2024-05-06-03\:24\:36/vmcore
三、分析vmcore
1、crash工具開啟vmcore
[root@NewOSBC8 127.0.0.1-2024-05-06-03:24:36]# crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux crash 7.2.7-3.el8.1 Copyright (C) 2002-2020 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel relocated [178MB]: patching 97096 gdb minimal_symbol values KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 2 DATE: Mon May 6 03:24:31 2024 UPTIME: 00:12:44 LOAD AVERAGE: 0.00, 0.02, 0.03 TASKS: 346 NODENAME: NewOSBC8.2 RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64 VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023 MACHINE: x86_64 (1796 Mhz) MEMORY: 2 GB PANIC: "sysrq: SysRq : Trigger a crash" PID: 2289 COMMAND: "bash" TASK: ffff8d1122bf0000 [THREAD_INFO: ffff8d1122bf0000] CPU: 0 STATE: TASK_RUNNING (SYSRQ) crash> bt PID: 2289 TASK: ffff8d1122bf0000 CPU: 0 COMMAND: "bash" #0 [ffffa2ab80cefbe8] machine_kexec at ffffffff8c25fabe #1 [ffffa2ab80cefc40] __crash_kexec at ffffffff8c3658ba #2 [ffffa2ab80cefd00] crash_kexec at ffffffff8c36678d #3 [ffffa2ab80cefd18] oops_end at ffffffff8c2259fd #4 [ffffa2ab80cefd38] no_context at ffffffff8c26fd4e #5 [ffffa2ab80cefd90] do_page_fault at ffffffff8c270872 #6 [ffffa2ab80cefdc0] page_fault at ffffffff8cc0122e [exception RIP: sysrq_handle_crash+18] RIP: ffffffff8c74eb12 RSP: ffffa2ab80cefe78 RFLAGS: 00010246 RAX: ffffffff8c74eb00 RBX: 0000000000000063 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8d1131017108 RDI: 0000000000000063 RBP: 0000000000000004 R8: 00000000000005ce R9: 000000000000002d R10: 0000000000000000 R11: ffffa2ab80cefd30 R12: 0000000000000000 R13: 0000000000000000 R14: ffffffff8d53c3e0 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffa2ab80cefe78] __handle_sysrq.cold.10 at ffffffff8c74f6f8 #8 [ffffa2ab80cefea8] write_sysrq_trigger at ffffffff8c74f5bb #9 [ffffa2ab80cefeb8] proc_reg_write at ffffffff8c55de29 #10 [ffffa2ab80cefed0] vfs_write at ffffffff8c4e0db5 #11 [ffffa2ab80ceff00] ksys_write at ffffffff8c4e102f #12 [ffffa2ab80ceff38] do_syscall_64 at ffffffff8c2041ab #13 [ffffa2ab80ceff50] entry_SYSCALL_64_after_hwframe at ffffffff8cc000ad RIP: 00007f515c78ab28 RSP: 00007ffc1172a678 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f515c78ab28 RDX: 0000000000000002 RSI: 000055b65d8c05c0 RDI: 0000000000000001 RBP: 000055b65d8c05c0 R8: 000000000000000a R9: 00007f515c81bc80 R10: 000000000000000a R11: 0000000000000246 R12: 00007f515ca5b6c0 R13: 0000000000000002 R14: 00007f515ca56880 R15: 0000000000000002 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b crash> dis -l sysrq_handle_crash+18 /usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159 0xffffffff8c74eb12 <sysrq_handle_crash+18>: movb $0x1,0x0 crash> dis -l 0xffffffff8c74eb12 /usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159 0xffffffff8c74eb12 <sysrq_handle_crash+18>: movb $0x1,0x0 crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 458790 1.8 GB ---- FREE 194411 759.4 MB 42% of TOTAL MEM USED 264379 1 GB 57% of TOTAL MEM SHARED 50717 198.1 MB 11% of TOTAL MEM BUFFERS 530 2.1 MB 0% of TOTAL MEM CACHED 103545 404.5 MB 22% of TOTAL MEM SLAB 31239 122 MB 6% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 532479 2 GB ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 532479 2 GB 100% of TOTAL SWAP COMMIT LIMIT 761874 2.9 GB ---- COMMITTED 511634 2 GB 67% of TOTAL LIMIT crash> sys KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 2 DATE: Mon May 6 03:24:31 2024 UPTIME: 00:12:44 LOAD AVERAGE: 0.00, 0.02, 0.03 TASKS: 346 NODENAME: NewOSBC8.2 RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64 VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023 MACHINE: x86_64 (1796 Mhz) MEMORY: 2 GB PANIC: "sysrq: SysRq : Trigger a crash" crash> p cpu_info:1 per_cpu(cpu_info, 1) = $1 = { x86 = 23 '\027', x86_vendor = 2 '\002', x86_model = 104 'h', x86_stepping = 1 '\001', x86_tlbsize = 3072, x86_virt_bits = 48 '0', x86_phys_bits = 45 '-', x86_coreid_bits = 0 '\000', cu_id = 255 '\377', extended_cpuid_level = 2147483680, cpuid_level = 16, x86_capability = {126614527, 802421759, 0, 129319184, 4277678595, 0, 4195321, 376123396, 557056, 563872169, 15, 0, 0, 17584641, 4, 0, 4194308, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 229696, 0}, x86_vendor_id = "AuthenticAMD\000\000\000", x86_model_id = "AMD Ryzen 7 5700U with Radeon Graphics\000 \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", x86_cache_size = 512, x86_cache_alignment = 64, x86_cache_max_rmid = -1, x86_cache_occ_scale = -1, x86_power = 256, loops_per_jiffy = 1796624, x86_max_cores = 1, apicid = 2, initial_apicid = 2, x86_clflush_size = 64, booted_cores = 1, phys_proc_id = 2, logical_proc_id = 1, cpu_core_id = 0, cpu_index = 1, microcode = 0, x86_cache_bits = 45 '-', initialized = 1, cpuinfo_x86_extended_size_rh = 0, _rh = { cpu_die_id = 0, logical_die_id = 1, vmx_capability = {0, 0, 0} } } crash> ps 1489 PID PPID CPU TASK ST %MEM VSZ RSS COMM 1489 1382 0 ffff8d110eb20000 IN 11.9 3106588 249348 llvmpipe-1 crash>
crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vm linux
vmcore生成時間:DATE: Mon May 6 03:24:31 2024
中斷原因:PANIC: "sysrq: SysRq : Trigger a crash"
2、檢視中斷暫存器地址和函式RIP
i.分析當時正在執行哪些應用呼叫函式sysrq_handle_crash,導致中斷卡死問題;
ii.參考:
https://blog.csdn.net/weixin_43564241/article/details/130692946
3、檢視使用者層應用的呼叫程式碼
i.透過“[exception RIP: sysrq_handle_crash+18]”標黃部分檢視呼叫程式碼;
4、檢視當機時記憶體使用情況
5、使用者側觸發
i.手動觸發了記憶體中資料的轉儲到/var/crash中。