Linux基礎——BClinux8.2 排查vmcore異常當機問題

gkhost發表於2024-05-06

一、無法/var/crash生成檔案

1、參考配置:

https://cloud.tencent.cn/developer/article/2367955

2、BCoe8.2調整配置

3、手動生成crash

i.參考:引數詳解

https://blog.csdn.net/tombaby_come/article/details/134038949

echo 1 > /proc/sys/kernel/sysrq

echo c > /proc/sysrq-trigger

注意:執行上述配置,主機重啟,開始轉儲記憶體中資料到/var/crash目錄中。

4、檢查kdump

i.參考:kdump原理

https://zhuanlan.zhihu.com/p/684699511

二、crash工具和vmlinux核心一致性檢查

1、檢查/boot/vmlinuz-4.19.0-240.23.35.el8_2.bclinux.x86_64和/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux的md5值必需保持一致

2、主機核心vmlinux位置

/usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux

3、異常當機vmcore檔案所在位置

/var/crash/127.0.0.1-2024-05-06-03\:24\:36/vmcore

三、分析vmcore

1、crash工具開啟vmcore

[root@NewOSBC8 127.0.0.1-2024-05-06-03:24:36]# crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux

crash 7.2.7-3.el8.1
Copyright (C) 2002-2020  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [178MB]: patching 97096 gdb minimal_symbol values

      KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Mon May  6 03:24:31 2024
      UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03
       TASKS: 346
    NODENAME: NewOSBC8.2
     RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64
     VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023
     MACHINE: x86_64  (1796 Mhz)
      MEMORY: 2 GB
       PANIC: "sysrq: SysRq : Trigger a crash"
         PID: 2289
     COMMAND: "bash"
        TASK: ffff8d1122bf0000  [THREAD_INFO: ffff8d1122bf0000]
         CPU: 0
       STATE: TASK_RUNNING (SYSRQ)

crash> bt
PID: 2289   TASK: ffff8d1122bf0000  CPU: 0   COMMAND: "bash"
 #0 [ffffa2ab80cefbe8] machine_kexec at ffffffff8c25fabe
 #1 [ffffa2ab80cefc40] __crash_kexec at ffffffff8c3658ba
 #2 [ffffa2ab80cefd00] crash_kexec at ffffffff8c36678d
 #3 [ffffa2ab80cefd18] oops_end at ffffffff8c2259fd
 #4 [ffffa2ab80cefd38] no_context at ffffffff8c26fd4e
 #5 [ffffa2ab80cefd90] do_page_fault at ffffffff8c270872
 #6 [ffffa2ab80cefdc0] page_fault at ffffffff8cc0122e
    [exception RIP: sysrq_handle_crash+18]
    RIP: ffffffff8c74eb12  RSP: ffffa2ab80cefe78  RFLAGS: 00010246
    RAX: ffffffff8c74eb00  RBX: 0000000000000063  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff8d1131017108  RDI: 0000000000000063
    RBP: 0000000000000004   R8: 00000000000005ce   R9: 000000000000002d
    R10: 0000000000000000  R11: ffffa2ab80cefd30  R12: 0000000000000000
    R13: 0000000000000000  R14: ffffffff8d53c3e0  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffa2ab80cefe78] __handle_sysrq.cold.10 at ffffffff8c74f6f8
 #8 [ffffa2ab80cefea8] write_sysrq_trigger at ffffffff8c74f5bb
 #9 [ffffa2ab80cefeb8] proc_reg_write at ffffffff8c55de29
#10 [ffffa2ab80cefed0] vfs_write at ffffffff8c4e0db5
#11 [ffffa2ab80ceff00] ksys_write at ffffffff8c4e102f
#12 [ffffa2ab80ceff38] do_syscall_64 at ffffffff8c2041ab
#13 [ffffa2ab80ceff50] entry_SYSCALL_64_after_hwframe at ffffffff8cc000ad
    RIP: 00007f515c78ab28  RSP: 00007ffc1172a678  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 0000000000000002  RCX: 00007f515c78ab28
    RDX: 0000000000000002  RSI: 000055b65d8c05c0  RDI: 0000000000000001
    RBP: 000055b65d8c05c0   R8: 000000000000000a   R9: 00007f515c81bc80
    R10: 000000000000000a  R11: 0000000000000246  R12: 00007f515ca5b6c0
    R13: 0000000000000002  R14: 00007f515ca56880  R15: 0000000000000002
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
crash> dis -l sysrq_handle_crash+18
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> dis -l 0xffffffff8c74eb12
/usr/src/debug/kernel-4.19.0-240.23.35.el8/linux-4.19.0-240.23.35.el8_2.bclinux.x86_64/drivers/tty/sysrq.c: 159
0xffffffff8c74eb12 <sysrq_handle_crash+18>:     movb   $0x1,0x0
crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM   458790       1.8 GB         ----
         FREE   194411     759.4 MB   42% of TOTAL MEM
         USED   264379         1 GB   57% of TOTAL MEM
       SHARED    50717     198.1 MB   11% of TOTAL MEM
      BUFFERS      530       2.1 MB    0% of TOTAL MEM
       CACHED   103545     404.5 MB   22% of TOTAL MEM
         SLAB    31239       122 MB    6% of TOTAL MEM

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE

   TOTAL SWAP   532479         2 GB         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE   532479         2 GB  100% of TOTAL SWAP

 COMMIT LIMIT   761874       2.9 GB         ----
    COMMITTED   511634         2 GB   67% of TOTAL LIMIT
crash> sys
      KERNEL: /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Mon May  6 03:24:31 2024
      UPTIME: 00:12:44
LOAD AVERAGE: 0.00, 0.02, 0.03
       TASKS: 346
    NODENAME: NewOSBC8.2
     RELEASE: 4.19.0-240.23.35.el8_2.bclinux.x86_64
     VERSION: #1 SMP Wed Sep 27 10:49:35 EDT 2023
     MACHINE: x86_64  (1796 Mhz)
      MEMORY: 2 GB
       PANIC: "sysrq: SysRq : Trigger a crash"
crash> p cpu_info:1
per_cpu(cpu_info, 1) = $1 = {
  x86 = 23 '\027',
  x86_vendor = 2 '\002',
  x86_model = 104 'h',
  x86_stepping = 1 '\001',
  x86_tlbsize = 3072,
  x86_virt_bits = 48 '0',
  x86_phys_bits = 45 '-',
  x86_coreid_bits = 0 '\000',
  cu_id = 255 '\377',
  extended_cpuid_level = 2147483680,
  cpuid_level = 16,
  x86_capability = {126614527, 802421759, 0, 129319184, 4277678595, 0, 4195321, 376123396, 557056, 563872169, 15, 0, 0, 17584641, 4, 0, 4194308, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 229696, 0},
  x86_vendor_id = "AuthenticAMD\000\000\000",
  x86_model_id = "AMD Ryzen 7 5700U with Radeon Graphics\000        \000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  x86_cache_size = 512,
  x86_cache_alignment = 64,
  x86_cache_max_rmid = -1,
  x86_cache_occ_scale = -1,
  x86_power = 256,
  loops_per_jiffy = 1796624,
  x86_max_cores = 1,
  apicid = 2,
  initial_apicid = 2,
  x86_clflush_size = 64,
  booted_cores = 1,
  phys_proc_id = 2,
  logical_proc_id = 1,
  cpu_core_id = 0,
  cpu_index = 1,
  microcode = 0,
  x86_cache_bits = 45 '-',
  initialized = 1,
  cpuinfo_x86_extended_size_rh = 0,
  _rh = {
    cpu_die_id = 0,
    logical_die_id = 1,
    vmx_capability = {0, 0, 0}
  }
}
crash>  ps 1489
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
   1489   1382   0  ffff8d110eb20000  IN  11.9 3106588 249348  llvmpipe-1
crash>

crash vmcore /usr/lib/debug/usr/lib/modules/4.19.0-240.23.35.el8_2.bclinux.x86_64/vm linux

vmcore生成時間:DATE: Mon May 6 03:24:31 2024

中斷原因:PANIC: "sysrq: SysRq : Trigger a crash"

2、檢視中斷暫存器地址和函式RIP

i.分析當時正在執行哪些應用呼叫函式sysrq_handle_crash,導致中斷卡死問題;

ii.參考:

https://blog.csdn.net/weixin_43564241/article/details/130692946

3、檢視使用者層應用的呼叫程式碼

i.透過“[exception RIP: sysrq_handle_crash+18]”標黃部分檢視呼叫程式碼;

4、檢視當機時記憶體使用情況

5、使用者側觸發

i.手動觸發了記憶體中資料的轉儲到/var/crash中。

相關文章