在redhat7系統上為Oracle11g資料庫打PSU報CRS-4124 CRS-4000錯誤分析與追蹤

xueshancheng發表於2021-10-22

1 啟動叢集,發現不能啟動,,報如下錯誤

[root@testdb2 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.


2 根據報錯資訊,按照網上的文件,刪除/var/tmp/.oracle/npohasd 發現不能解決問題

[root@testdb2 ~]# cd /var/tmp

[root@testdb2 tmp]# rm -rf .oracle


[root@testdb2 ~]# /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4124: Oracle High Availability Services startup failed.

CRS-4000: Command Start failed, or completed with errors.


3 然後對相關程式做TRACE

[root@testdb2 ~]# ps -ef|grep  crsctl

root      32511  31666  0 17:34 pts/0    00:00:00 /u01/app/11.2.0/grid/bin/crsctl.bin start crs

root      36355  36109  0 17:36 pts/1    00:00:00 grep --color=auto crsctl

[root@testdb2 ~]#  strace -p 32511

strace: Process 32511 attached

restart_syscall(<... resuming interrupted nanosleep ...>) = 0

open("/proc/self/status", O_RDONLY)     = 3

read(3, "Name:\tcrsctl.bin\nUmask:\t0022\nSta"..., 4096) = 1334

close(3)                                = 0

access("/usr/lib64/qt-3.3/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)

access("/usr/local/sbin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)

access("/usr/local/bin/crsctl.bin", F_OK) = -1 ENOENT (No such file or directory)

access("/sbin/crsctl.bin", F_OK)        = -1 ENOENT (No such file or directory)

access("/bin/crsctl.bin", F_OK)         = -1 ENOENT (No such file or directory)

access("/usr/sbin/crsctl.bin", F_OK)    = -1 ENOENT (No such file or directory)

access("/usr/bin/crsctl.bin", F_OK)     = -1 ENOENT (No such file or directory)

access("/root/bin/crsctl.bin", F_OK)    = -1 ENOENT (No such file or directory)

brk(NULL)                               = 0xefe000

brk(0xf3e000)                           = 0xf3e000

brk(NULL)                               = 0xf3e000

brk(0xfbe000)                           = 0xfbe000

brk(NULL)                               = 0xfbe000

brk(0xff6000)                           = 0xff6000

.........

41999      0.000019 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000007>

41999      0.000020 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000008>

41999      0.000019 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000007>

41999      0.000020 access("/var/tmp/.oracle/npohasd", F_OK) = -1 ENOENT (No such file or directory) <0.000008>


由於多次刪除/var/tmp/.oracle目錄,並不能解決問題,根據如下資訊,又發現/var/tmp/.oracle/npohasd檔案不能訪問

於是到Oracle官網查詢相關資訊,發現如下文件《

Linux: OS "init" process does not start init.ohasd in inittab (Doc ID 1591775.1) To BottomTo Bottom

》說明是由於ohasd程式不能啟動,導致crsctl 不能啟動叢集。

文件內容如下:


4 根據如上文件,認為是ohasd服務不能啟動導致的,由於oracle11G在redhat7支援的不是很好,故懷疑

是自己建立的ohasd服務異常,導致的資料庫叢集不能啟動。

檢視ohas.service服務的狀態,發現ohasd程式雖然是running,但提示有die,

[root@testdb2 tmp]# systemctl status ohas.service

● ohas.service - Oracle High Availability Services

   Loaded: loaded (/usr/lib/systemd/system/ohas.service; enabled; vendor preset: disabled)

   Active: active (running) since Fri 2021-10-22 08:41:05 CST; 50min ago

 Main PID: 26360 (init.ohasd)

    Tasks: 1

   CGroup: /system.slice/ohas.service

           └─26360 /bin/sh /etc/init.d/init.ohasd run >/dev/null 2>&1 Type=simple


Oct 22 08:41:25 testdb2 clsecho[29462]: /etc/init.d/init.ohasd: ohasd.bin process 9443 died while waiting to move.

Oct 22 08:41:25 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: ohasd.bin p rocess 9443 died while waiting to move.

Oct 22 08:59:27 testdb2 clsecho[42022]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/cpu,cpuacct/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/cpu,cpuacct/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /bin/echo: write error: No such process

Oct 22 08:59:27 testdb2 clsecho[42025]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/systemd/system.slice/oracle-ohasd.service/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: 4999 > /sys/fs/cgroup/systemd/system.slice/oracle-ohasd.service/tasks

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /bin/echo: write error: No such process

Oct 22 08:59:27 testdb2 clsecho[42032]: /etc/init.d/init.ohasd: ohasd.bin process 4999 died while waiting to move.

Oct 22 08:59:27 testdb2 init.ohasd[26360]: /etc/init.d/init.ohasd: ohasd.bin process 4999 died while waiting to m ove.


5 按照 文件《

Linux: OS "init" process does not start init.ohasd in inittab (Doc ID 1591775.1) To BottomTo Bottom

》 修改主機的配置,再次啟動資料庫叢集,發現叢集可以正常啟動了。

修改如下:

[root@testdb2 tmp]# cat /etc/inittab |grep -v "#"


htfa:35:respawn:/etc/init.d/init.tfa run >/dev/null 2>&1 </dev/null


h1:35:respawn:/etc/init.d/init.ohasd run


經過十幾次的測試,以下命令可以正常啟動了,再也沒有發生CRS-4124 和CRS-4000的錯誤了。

[root@testdb2 tmp]#  /u01/app/11.2.0/grid/bin/crsctl start crs

CRS-4123: Oracle High Availability Services has been started.




來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69996316/viewspace-2838836/,如需轉載,請註明出處,否則將追究法律責任。

相關文章