11R2 clusterware程式無法啟動常見原因

myownstars發表於2012-08-29

Grid infrastruce資源無法啟動,常見原因分析

OHASD無法啟動

原因分析:

1  OS的執行等級設定有誤

--linuxrunlevel參照如下

rc0.d - System Halted

rc1.d - Single User Mode

rc2.d - Single User Mode with Networking

rc3.d - Multi-User Mode - boot up in text mode

rc4.d - Not yet Defined

rc5.d - Multi-User Mode - boot up in X Windows

rc6.d - Shutdown & Reboot

檢視 ohasd的執行等級

oracle@> more /etc/inittab | grep ohasd

h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1

檢視當前使用者的執行等級

oracle@> who -r

   .        run-level 2 Jul 03 07:46       2    0    S

2 init.ohasd run是否執行

init.ohasd run沒有執行,則ohasd.bin不會啟動

oracle@> ps -ef | grep ohasd | grep -v grep

    root  7340038        1  10   Jul 03      - 1022:34 /u001/app/11.2.0.2/grid/bin/ohasd.bin reboot

root  9568408        1   0   Jul 03      -  0:00 /bin/sh /etc/init.ohasd run

init.ohasd不能及時啟動,則會收到類似錯誤"[ohasd()] CRS-0715:Oracle High Availability Service has timed out waiting for init.ohasd to be started."

注:從linux 6inittab被廢棄,init.ohasd配置在/etc/init

3 clusterware自動重啟是否啟用

執行$GRID_HOME/bin/crsctl config crs檢視crs是否自動啟動

OS日誌顯示如下

Feb 29 16:20:36 racnode1 logger: Oracle Cluster Ready Services startup disabled.

Feb 29 16:20:36 racnode1 logger: Could not access /var/opt/oracle/scls_scr/racnode1/root/ohasdstr

--該檔案無法訪問或不存在

4 oracle local registry是否可訪問

 ls –altr $GRID_HOME/cdata/*.olr

OLR不可訪問或損壞,ohasd.log會有類似記錄

2010-01-24 22:59:10.470: [ default][1373676464] Initializing OLR

2010-01-24 22:59:10.472: [  OCROSD][1373676464]utopen:6m':failed in stat OCR file/disk /ocw/grid/cdata/rac1.olr, errno=2, os err string=No such file or directory

2010-01-24 22:59:10.472: [  OCROSD][1373676464]utopen:7:failed to open any OCR file/disk, errno=2, os err string=No such file or directory

2010-01-24 22:59:10.473: [  OCRRAW][1373676464]proprinit: Could not open raw device

5 ohasd.bin無法訪問socket檔案

Network socket 檔案一般位於/tmp/var/opt目錄

Ohasd.log記錄如下:

2010-06-29 10:31:01.570: [ COMMCRS][1206901056]clsclisten: Permission denied for (ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))

 

2010-06-29 10:31:01.571: [  OCRSRV][1217390912]th_listen: CLSCLISTEN failed clsc_ret= 3, addr= [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))]

2010-06-29 10:31:01.571: [  OCRSRV][3267002960]th_init: Local listener did not reach valid state

6 ohasd.bin無法訪問日誌路徑

檢視OS messagesyslog顯示如下

Feb 20 10:47:08 racnode1 OHASD[9566]: OHASD exiting; Directory /ocw/grid/log/racnode1/ohasd not found.

7 ohasd無法啟動

ps -ef| grep ohasd.bin顯示ohasd.bin已經啟動,但是ohasd.log很長時間沒有更新,使用truss跟蹤顯示

15058/1:         0.1995 close(2147483646)                               Err#9 EBADF

15058/1:         0.1996 close(2147483645)                               Err#9 EBADF

Pstack跟蹤ohasd.bin則出現

_close  sclssutl_closefiledescriptors  main ..

此由bug11834289引起,11203已修復

 

OHASD Agent無法啟動

OHASD.bin置換出4agent

oraagent: responsible for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd etc
orarootagent: responsible for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs etc
cssdagent / cssdmonitor: responsible for ora.cssd(for ocssd.bin) and ora.cssdmonitor(for cssdmonitor itself)

1

最常見的問題是相應 agent的日誌目錄沒有操作許可權

2

Agent binary損壞,agent無法啟動,日誌記錄如下:

2011-05-03 11:11:13.189

[ohasd(25303)]CRS-5828:Could not start agent '/ocw/grid/bin/orarootagent_grid'. Details at (:CRSAGF00130:) {0:0:2} in /ocw/grid/log/racnode1/ohasd/ohasd.log.

 

OCSSD.bin無法啟動

cssd.bin啟動需要如下條件

1 GPnP  profile可以正常訪問

--profile儲存著cssdiscoverystring

--voting disk沒有存放在ASM

2 vote disk可以訪問

從第一步的GPnp中找出DiscoveryString

3 網路正常

 

 

CRSD.bin無法啟動

1 ocssd是否啟動

2 OCR可否訪問

3 crsd.bin pid 檔案存在且指向crsd.bin程式

oracle@ justin> pwd

/u001/app/11.2.0.2/grid/crs/init

oracle@ justin> more justin.pid

22347868

oracle@ justin> ps -ef | grep 22347868

    root 22347868        1   6   Jul 03      - 1279:53 /u001/app/11.2.0.2/grid/bin/crsd.bin reboot

如改檔案不存在或其pid指向非crsd.bin程式,則crsd無法正常啟動,詳情需要參考orarootagent_root.log

4 CRSD相關的可執行檔案許可權設定錯誤

--檢視crsd.bin$GRID_HOME/bin下的crsd

 

參考文件1050908.1

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15480802/viewspace-742159/,如需轉載,請註明出處,否則將追究法律責任。

相關文章