oracle 11gR2 crs 其中一個節點grid叢集啟動不成功處理案例
環境:
oracle 11.2.0.2 rac+aix 6.1
問題描述:
其中一個節點grid叢集啟動不成功
分析過程:
為排查這個問題,我們對作業系統日誌,叢集配置環境
如/etc/hosts 主機配置資訊,磁碟組許可權和屬性,ssh測試,/etc/inittab等均做過檢查。
詳細分析過程如下:
1.1 作業系統日誌
errpt 無報錯
1.2 /etc/hosts 記錄
發現私網相關Ip已經註釋掉
1.3 檢查asm磁碟組屬性和許可權
Ls -ltr /dev/rhdiskpower*
Lsattr -El hdiskpower*
都正常
1.4 檢查2個節點/etc/inittab
db18:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power Failure Detection
powermig:2:wait:/etc/rc.powermig transition >/dev/null 2>&1 # powermig startup
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
powermig2:2:wait:/etc/rc.powermig recover >/dev/null 2>&1 # powermig recover
powermt:2:wait:/usr/sbin/powermt load >/dev/null 2>&1 # powermt load
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
platform_agent:2:once:/usr/bin/startsrc -s platform_agent >/dev/null 2>&1
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcemcp_mond:2:wait:/etc/rc.emcp_mond start > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
rcnsr:2:wait:sh /etc/rc.nsr
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
l7:7:wait:/etc/rc.d/rc 7
l8:8:wait:/etc/rc.d/rc 8
l9:9:wait:/etc/rc.d/rc 9
naudio2::boot:/usr/sbin/naudio2 > /dev/null
naudio::boot:/usr/sbin/naudio > /dev/null
ntbl_reset:2:once:/usr/bin/ntbl_reset_datafiles
rcml:2:once:/usr/ml/aix61/rc.ml > /dev/console 2>&1
rcwpars:2:once:/etc/rc.wpars > /dev/console 2>&1 # Corrals autostart
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
artex:2:wait:/usr/sbin/artexset -c -R /etc/security/artex/config/master_profile.xml > /dev/console 2>&1
cimservices:2:once:/usr/bin/startsrc -s cimsys >/dev/null 2>&1
pconsole:2:once:/usr/bin/startsrc -s pconsole > /dev/null 2>&1
xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 2>&1 >/dev/null #Start local b
inary recording
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
rcnetwlm:23456789:wait:/etc/rc.netwlm start> /dev/console 2>&1 # Start netwlm
dt:2:wait:/etc/rc.dt
orapw:2:wait:/etc/loadext -L /etc
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1
rcemcpower:2:wait:/etc/rc.emcpower set_ipldevice > /dev/console 2>&1
cons:0123456789:respawn:/usr/sbin/getty /dev/console
xntpd00:2:boot:/usr/bin/startsrc -s xntpd
db27 節點2:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power Failure Detection
powermig:2:wait:/etc/rc.powermig transition >/dev/null 2>&1 # powermig startup
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
powermig2:2:wait:/etc/rc.powermig recover >/dev/null 2>&1 # powermig recover
powermt:2:wait:/usr/sbin/powermt load >/dev/null 2>&1 # powermt load
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
platform_agent:2:once:/usr/bin/startsrc -s platform_agent >/dev/null 2>&1
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcemcp_mond:2:wait:/etc/rc.emcp_mond start > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
install_assist:2:wait:/usr/sbin/install_assist /dev/console 2>&1
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
l7:7:wait:/etc/rc.d/rc 7
l8:8:wait:/etc/rc.d/rc 8
l9:9:wait:/etc/rc.d/rc 9
naudio2::boot:/usr/sbin/naudio2 > /dev/null
naudio::boot:/usr/sbin/naudio > /dev/null
ntbl_reset:2:once:/usr/bin/ntbl_reset_datafiles
rcml:2:once:/usr/ml/aix61/rc.ml > /dev/console 2>&1
rcwpars:2:once:/etc/rc.wpars > /dev/console 2>&1 # Corrals autostart
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
artex:2:wait:/usr/sbin/artexset -c -R /etc/security/artex/config/master_profile.xml > /dev/console 2>&1
cimservices:2:once:/usr/bin/startsrc -s cimsys >/dev/null 2>&1
pconsole:2:once:/usr/bin/startsrc -s pconsole > /dev/null 2>&1
xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 2>&1 >/dev/null #Start local b
inary recording
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
dt:2:wait:/etc/rc.dt
rcemcpower:2:wait:/etc/rc.emcpower set_ipldevice > /dev/console 2>&1
cons:0123456789:respawn:/usr/sbin/getty /dev/console
xntpd00:2:boot:/usr/bin/startsrc -s xntpd
sshdstart:2:boot:/usr/bin/startsrc -s sshd
#h1:35:respawn:/etc/init.ohasd run >/dev/null 2>&1
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1
[nhdb27:root]
1.5 檢查節點2叢集日誌
沒有相關日誌產生,只有做ocrcheck提示訪問不了asm磁碟組相關資訊的日誌。
2013-10-26 03:30:53.794
[client(5767576)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2013-10-26 03:30:53.795
[client(5767576)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/product/11.2.0/log/db27/client/ocrcheck_5767576.log.
2013-10-26 03:30:56.858
[client(5767576)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2013-10-26 03:30:56.860
[client(5767576)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/product/11.2.0/log/27/client/ocrcheck_5767576.log.
2 處理過程
針對上面分析結果,私網被註釋掉和ssh不通,可能對節點2啟動叢集有影響,
取消私網註釋和ssh互通後,節點2還是啟動不成功。
2.1 對節點2啟動做trace 跟進
訪問到這檔案/tmp/.oracle/sOHASD_UI_SOCKET就執行不下去了
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.3201: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.0101: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.0201: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.0401: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: close(5) = 0
0.0002: close(3) = 0
0.0002: kopen("/grid/product/11.2.0/crs/mesg/crsus.msb", O_RDONLY) = 3
0.0002: kfcntl(3, F_SETFD, 0x0000000000000001) = 0
0.0001: lseek(3, 0, 0) = 0
kread(3, "1513 "011303\t\t\0\0\0\0".., 256) = 256
0.0002: lseek(3, 512, 0) = 512
kread(3, "1F D '94\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 1024, 0) = 1024
kread(3, "\096\0 , 512) = 512
0.0001: lseek(3, 107008, 0) = 107008
kread(3, "\0\t121B\001\0 >121C\0\0".., 512) = 512
0.0002: lseek(3, 153600, 0) = 153600
kread(3, " 0\0\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 154112, 0) = 154112
kread(3, " 0\0\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 154624, 0) = 154624
kread(3, "\0\0\0\0\0\0\0\b\0\0\0\0".., 512) = 512
0.0001: close(3) = 0
0.0002: kopen("/grid/product/11.2.0/crs/mesg/crsus.msb", O_RDONLY) = 3
0.0002: kfcntl(3, F_SETFD, 0x0000000000000001) = 0
0.0001: lseek(3, 0, 0) = 0
kread(3, "1513 "011303\t\t\0\0\0\0".., 256) = 256
0.0001: lseek(3, 512, 0) = 512
kread(3, "1F D '94\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 1024, 0) = 1024
kread(3, "\096\0 , 512) = 512
0.0002: lseek(3, 107008, 0) = 107008
kread(3, "\0\t121B\001\0 >121C\0\0".., 512) = 512
0.0001: lseek(3, 153600, 0) = 153600
2.2 嘗試手工啟動grid叢集
cd
nohup ./init.ohasd run &
--可以正常啟動 grid叢集
但用正常啟動crs命令 crsctl start crs 還是啟動不成功。
2.3 根據/tmp/.oracle/sOHASD_UI_SOCKET,查詢metalink
查到以下資訊:
Seems the AIX post-installation was not complete on this box .
Which resulted below leftover entries in /etc/inittab
# grep /etc/inittab
多了一行與install有關的東西
在節點2 /etc/inittab 確實找到這行,透過去掉這行。
Crsctl start crs 啟動成功。
原因是:
Seems the AIX post-installation was not complete on this box .
Which resulted below leftover entries in /etc/inittab
看來是安裝aix作業系統圖形化介面沒完成,中途中斷的原因造成的。
呵呵,一個很大的坑,差點陷進去了。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7199859/viewspace-776729/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle Linux 6.7中 Oracle 11.2.0.4 RAC叢集CRS異常處理OracleLinux
- grid軟體複製到另外的節點啟動crs
- Oracle12c叢集啟動時提示%CRS_LIMIT_OPENFILE%: invalid numberOracleMIT
- ORACLE RAC 11.2.0.4 FOR RHEL6叢集無法啟動的處理Oracle
- Oracle 11gR2 RAC 叢集服務啟動與關閉總結Oracle
- Oracle叢集技術 | 叢集的自啟動系列(一)Oracle
- Solaris叢集節點重啟
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- Oracle 11gr2修改RAC叢集的scan ip,並處理ORA-12514問題Oracle
- 記一次oracle 19c RAC叢集重啟單節點DB啟動異常(二)Oracle
- 關於叢集節點timeline不一致的處理方式
- 私有IP丟失造成Oracle 12C RAC叢集節點不能啟動Oracle
- Oracle叢集軟體管理-新增和刪除叢集節點Oracle
- 【RAC】Oracle19.13之後的grid,節點重啟後不會自動驅動Oracle
- Oracle 12c叢集啟動故障Oracle
- RAC節點hang住, oracle bug導致了cpu過高,無法啟動叢集隔離Oracle
- Oracle RAC命中ORA-7445只能開啟一個節點故障案例分析Oracle
- consul 多節點/單節點叢集搭建
- Jedis操作單節點redis,叢集及redisTemplate操作redis叢集(一)Redis
- Oracle 19c rac安裝,只能啟動一個節點的ASMOracleASM
- 記錄一下oracle 19c的叢集節點移除、新增操作Oracle
- 4.2 叢集節點初步搭建
- HAC叢集新增新節點
- ORACLE 11.2.0.4 for solaris更換硬體後主機時間改變導致一節點叢集服務無法啟動Oracle
- 11G oracle資料庫重新啟動crsOracle資料庫
- MongoDB叢集搭建(包括隱藏節點,仲裁節點)MongoDB
- cephadm訪問ceph叢集的方式及管理員節點配置案例
- linux搭建kafka叢集,多master節點叢集說明LinuxKafkaAST
- HAC叢集更改IP(單節點更改、全部節點更改)
- 讀懂這一篇,叢集節點不下線
- Oracle 叢集的自啟動,OLR與套接字檔案Oracle
- ClusterShell:一個在叢集節點上並行執行命令的好工具並行
- 沃趣微講堂 | Oracle叢集技術(三):被誤傳的叢集自啟動Oracle
- Redis Manager 叢集管理與節點管理Redis
- zookeeper叢集奇偶數節點問題
- Redis服務之叢集節點管理Redis
- 啟動hive不成功namenode處於安全模式Hive模式
- Zookeeper叢集節點數量為什麼要是奇數個?
- DM8動態增加讀寫分離叢集節點