在redhat7系統上為Oracle11g資料庫打PSU報CRS-4124 CRS-4000錯誤分析與追蹤
1 啟動叢集,發現不能啟動,,報如下錯誤
[root@testdb2 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
2 根據報錯資訊,按照網上的文件,刪除/var/tmp/.oracle/npohasd 發現不能解決問題
[root@testdb2 ~]# cd /var/tmp
[root@testdb2 tmp]# rm -rf .oracle
[root@testdb2 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
3 然後對相關程式做TRACE
[root@testdb2 ~]# ps -ef|grep crsctl
root 32511 31666 0 17:34 pts/0 00:00:00 /u01/app/11.2.0/grid/bin/crsctl.bin start crs
root 36355 36109 0 17:36 pts/1 00:00:00 grep --color=auto crsctl
[root@testdb2 ~]# strace -p 32511
strace: Process 32511 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
open("/proc/self/status", O_RDONLY) = 3
read(3, "Name:\tcrsctl.bin\nUmask:\t0022\nSta"..., 4096) = 1334
close(3) = 0
access("/usr/lib64/qt-3.3/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/sbin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/local/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
access("/sbin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
access("/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/sbin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
access("/usr/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
access("/root/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)
brk(NULL) = 0xefe000
brk(0xf3e000) = 0xf3e000
brk(NULL) = 0xf3e000
brk(0xfbe000) = 0xfbe000
brk(NULL) = 0xfbe000
brk(0xff6000) = 0xff6000
41999 0.000019 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000007>
41999 0.000020 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000008>
41999 0.000019 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000007>
41999 0.000020 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000008>
Linux: OS "init" process does not start init.ohasd in inittab (Doc ID 1591775.1) To BottomTo Bottom
》說明是由於ohasd程式不能啟動,導致crsctl 不能啟動叢集。
4 根據如上文件,認為是ohasd服務不能啟動導致的,由於oracle11G在redhat7支援的不是很好,故懷疑
[root@testdb2 tmp]# systemctl status ohas.service
● ohas.service - Oracle High Availability Services
Loaded: loaded (/usr/lib/systemd/system/ohas.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2021-10-22 08:41:05 CST; 50min ago
Main PID: 26360 (init.ohasd)
Tasks: 1
CGroup: /system.slice/ohas.service
└─26360 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple
Oct 22 08:41:25 testdb2 clsecho[29462]: /etc/init.d/init.ohasd: ohasd.bin process 9443 died while waiting to move.
Oct 22 08:41:25 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: ohasd.bin p rocess 9443 died while waiting to move.
Oct 22 08:59:27 testdb2 clsecho[42022]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/cpu,cpuacct/tasks
Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/cpu,cpuacct/tasks
Oct 22 08:59:27 testdb2 init.ohasd[26360]: /bin/echo: write error: No such process
Oct 22 08:59:27 testdb2 clsecho[42025]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/systemd/system.slice/oracle-ohasd.service/tasks
Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/systemd/system.slice/oracle-ohasd.service/tasks
Oct 22 08:59:27 testdb2 init.ohasd[26360]: /bin/echo: write error: No such process
Oct 22 08:59:27 testdb2 clsecho[42032]: /etc/init.d/init.ohasd: ohasd.bin process 4999 died while waiting to move.
Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: ohasd.bin process 4999 died while waiting to m ove.
5 按照 文件《
Linux: OS "init" process does not start init.ohasd in inittab (Doc ID 1591775.1) To BottomTo Bottom
》 修改主機的配置,再次啟動資料庫叢集,發現叢集可以正常啟動了。
[root@testdb2 tmp]# cat /etc/inittab |grep -v "#"
htfa:35:respawn:/etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null
h1:35:respawn:/etc/init.d/init.ohasd run
經過十幾次的測試,以下命令可以正常啟動了,再也沒有發生CRS-4124 和CRS-4000的錯誤了。
[root@testdb2 tmp]# /u01/app/11.2.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
