oracle 11gR2 crs 其中一個節點grid叢集啟動不成功處理案例
環境:
oracle 11.2.0.2 rac+aix 6.1
問題描述:
其中一個節點grid叢集啟動不成功
分析過程:
為排查這個問題,我們對作業系統日誌,叢集配置環境
如/etc/hosts 主機配置資訊,磁碟組許可權和屬性,ssh測試,/etc/inittab等均做過檢查。
詳細分析過程如下:
1.1 作業系統日誌
errpt 無報錯
1.2 /etc/hosts 記錄
發現私網相關Ip已經註釋掉
1.3 檢查asm磁碟組屬性和許可權
Ls -ltr /dev/rhdiskpower*
Lsattr -El hdiskpower*
都正常
1.4 檢查2個節點/etc/inittab
db18:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power Failure Detection
powermig:2:wait:/etc/rc.powermig transition >/dev/null 2>&1 # powermig startup
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
powermig2:2:wait:/etc/rc.powermig recover >/dev/null 2>&1 # powermig recover
powermt:2:wait:/usr/sbin/powermt load >/dev/null 2>&1 # powermt load
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
platform_agent:2:once:/usr/bin/startsrc -s platform_agent >/dev/null 2>&1
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcemcp_mond:2:wait:/etc/rc.emcp_mond start > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
rcnsr:2:wait:sh /etc/rc.nsr
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
l7:7:wait:/etc/rc.d/rc 7
l8:8:wait:/etc/rc.d/rc 8
l9:9:wait:/etc/rc.d/rc 9
naudio2::boot:/usr/sbin/naudio2 > /dev/null
naudio::boot:/usr/sbin/naudio > /dev/null
ntbl_reset:2:once:/usr/bin/ntbl_reset_datafiles
rcml:2:once:/usr/ml/aix61/rc.ml > /dev/console 2>&1
rcwpars:2:once:/etc/rc.wpars > /dev/console 2>&1 # Corrals autostart
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
artex:2:wait:/usr/sbin/artexset -c -R /etc/security/artex/config/master_profile.xml > /dev/console 2>&1
cimservices:2:once:/usr/bin/startsrc -s cimsys >/dev/null 2>&1
pconsole:2:once:/usr/bin/startsrc -s pconsole > /dev/null 2>&1
xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 2>&1 >/dev/null #Start local b
inary recording
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
rcnetwlm:23456789:wait:/etc/rc.netwlm start> /dev/console 2>&1 # Start netwlm
dt:2:wait:/etc/rc.dt
orapw:2:wait:/etc/loadext -L /etc
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1
rcemcpower:2:wait:/etc/rc.emcpower set_ipldevice > /dev/console 2>&1
cons:0123456789:respawn:/usr/sbin/getty /dev/console
xntpd00:2:boot:/usr/bin/startsrc -s xntpd
db27 節點2:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power Failure Detection
powermig:2:wait:/etc/rc.powermig transition >/dev/null 2>&1 # powermig startup
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
powermig2:2:wait:/etc/rc.powermig recover >/dev/null 2>&1 # powermig recover
powermt:2:wait:/usr/sbin/powermt load >/dev/null 2>&1 # powermt load
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
platform_agent:2:once:/usr/bin/startsrc -s platform_agent >/dev/null 2>&1
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcemcp_mond:2:wait:/etc/rc.emcp_mond start > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
install_assist:2:wait:/usr/sbin/install_assist /dev/console 2>&1
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
l7:7:wait:/etc/rc.d/rc 7
l8:8:wait:/etc/rc.d/rc 8
l9:9:wait:/etc/rc.d/rc 9
naudio2::boot:/usr/sbin/naudio2 > /dev/null
naudio::boot:/usr/sbin/naudio > /dev/null
ntbl_reset:2:once:/usr/bin/ntbl_reset_datafiles
rcml:2:once:/usr/ml/aix61/rc.ml > /dev/console 2>&1
rcwpars:2:once:/etc/rc.wpars > /dev/console 2>&1 # Corrals autostart
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
artex:2:wait:/usr/sbin/artexset -c -R /etc/security/artex/config/master_profile.xml > /dev/console 2>&1
cimservices:2:once:/usr/bin/startsrc -s cimsys >/dev/null 2>&1
pconsole:2:once:/usr/bin/startsrc -s pconsole > /dev/null 2>&1
xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 2>&1 >/dev/null #Start local b
inary recording
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
dt:2:wait:/etc/rc.dt
rcemcpower:2:wait:/etc/rc.emcpower set_ipldevice > /dev/console 2>&1
cons:0123456789:respawn:/usr/sbin/getty /dev/console
xntpd00:2:boot:/usr/bin/startsrc -s xntpd
sshdstart:2:boot:/usr/bin/startsrc -s sshd
#h1:35:respawn:/etc/init.ohasd run >/dev/null 2>&1
h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1
[nhdb27:root]
1.5 檢查節點2叢集日誌
沒有相關日誌產生,只有做ocrcheck提示訪問不了asm磁碟組相關資訊的日誌。
2013-10-26 03:30:53.794
[client(5767576)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2013-10-26 03:30:53.795
[client(5767576)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/product/11.2.0/log/db27/client/ocrcheck_5767576.log.
2013-10-26 03:30:56.858
[client(5767576)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2013-10-26 03:30:56.860
[client(5767576)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /grid/product/11.2.0/log/27/client/ocrcheck_5767576.log.
2 處理過程
針對上面分析結果,私網被註釋掉和ssh不通,可能對節點2啟動叢集有影響,
取消私網註釋和ssh互通後,節點2還是啟動不成功。
2.1 對節點2啟動做trace 跟進
訪問到這檔案/tmp/.oracle/sOHASD_UI_SOCKET就執行不下去了
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.3201: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.0101: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.0201: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: _nsleep(0x0FFFFFFFFFFF9920, 0x0FFFFFFFFFFF99F0) = 0
0.0401: close(5) = 0
0.0001: socket(1, 1, 0) = 5
0.0001: connext(5, 0x0FFFFFFFFFFF9F38, 1025) Err#79 ECONNREFUSED
0.0001: access("/tmp/.oracle/sOHASD_UI_SOCKET", 0) = 0
0.0002: close(5) = 0
0.0002: close(3) = 0
0.0002: kopen("/grid/product/11.2.0/crs/mesg/crsus.msb", O_RDONLY) = 3
0.0002: kfcntl(3, F_SETFD, 0x0000000000000001) = 0
0.0001: lseek(3, 0, 0) = 0
kread(3, "1513 "011303\t\t\0\0\0\0".., 256) = 256
0.0002: lseek(3, 512, 0) = 512
kread(3, "1F D '94\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 1024, 0) = 1024
kread(3, "\096\0 , 512) = 512
0.0001: lseek(3, 107008, 0) = 107008
kread(3, "\0\t121B\001\0 >121C\0\0".., 512) = 512
0.0002: lseek(3, 153600, 0) = 153600
kread(3, " 0\0\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 154112, 0) = 154112
kread(3, " 0\0\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 154624, 0) = 154624
kread(3, "\0\0\0\0\0\0\0\b\0\0\0\0".., 512) = 512
0.0001: close(3) = 0
0.0002: kopen("/grid/product/11.2.0/crs/mesg/crsus.msb", O_RDONLY) = 3
0.0002: kfcntl(3, F_SETFD, 0x0000000000000001) = 0
0.0001: lseek(3, 0, 0) = 0
kread(3, "1513 "011303\t\t\0\0\0\0".., 256) = 256
0.0001: lseek(3, 512, 0) = 512
kread(3, "1F D '94\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 1024, 0) = 1024
kread(3, "\096\0 , 512) = 512
0.0002: lseek(3, 107008, 0) = 107008
kread(3, "\0\t121B\001\0 >121C\0\0".., 512) = 512
0.0001: lseek(3, 153600, 0) = 153600
2.2 嘗試手工啟動grid叢集
cd
nohup ./init.ohasd run &
--可以正常啟動 grid叢集
但用正常啟動crs命令 crsctl start crs 還是啟動不成功。
2.3 根據/tmp/.oracle/sOHASD_UI_SOCKET,查詢metalink
查到以下資訊:
Seems the AIX post-installation was not complete on this box .
Which resulted below leftover entries in /etc/inittab
# grep /etc/inittab
多了一行與install有關的東西
在節點2 /etc/inittab 確實找到這行,透過去掉這行。
Crsctl start crs 啟動成功。
原因是:
Seems the AIX post-installation was not complete on this box .
Which resulted below leftover entries in /etc/inittab
看來是安裝aix作業系統圖形化介面沒完成,中途中斷的原因造成的。
呵呵,一個很大的坑,差點陷進去了。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7199859/viewspace-776729/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- oracle grid 其中一個節點asm 磁碟組後設資料損壞處理案例OracleASM
- oracle rac 其中第一個節點監聽偶爾中斷處理案例Oracle
- oracle em節點啟動不成功問題處理總結Oracle
- oracle 10g crs啟動不成功問題處理Oracle 10g
- oracle 11gR2 grid 叢集資源設定跟隨叢集自動啟動Oracle
- 一個CRS CRS-5818 gpnpd、mdnsd程式無法啟動案例處理DNS
- oracle 11gR2 rac 兩節點有一個節點down掉問題處理Oracle
- oracle 11gR2 rac for aix 第二個節點執行root.sh不成功問題處理OracleAI
- 11gR2 叢集(CRS/GRID)新功能—— SCAN(Single Client Access Name)client
- oracle 11gR2 srvctl 命令啟動資料庫不成功處理總結Oracle資料庫
- Oracle Linux 6.7中 Oracle 11.2.0.4 RAC叢集CRS異常處理OracleLinux
- Oracle叢集技術 | 叢集的自啟動系列(一)Oracle
- mongo副本集 叢集中只存在一個SECONDARY節點處理Go
- 記一次oracle 19c RAC叢集重啟單節點DB啟動異常(二)Oracle
- 私有IP丟失造成Oracle 12C RAC叢集節點不能啟動Oracle
- ORACLE RAC 11.2.0.4 FOR RHEL6叢集無法啟動的處理Oracle
- oracle 11gR2 grid 啟動順序圖Oracle
- Oracle叢集軟體管理-新增和刪除叢集節點Oracle
- RAC第一個節點被剔除叢集故障分析
- 一個4節點Hadoop叢集的配置示例Hadoop
- oracle listener 監聽啟動不起來處理案例一則Oracle
- Oracle12c叢集啟動時提示%CRS_LIMIT_OPENFILE%: invalid numberOracleMIT
- oracle 11gR2 asm磁碟組存放grid叢集裡面的內容OracleASM
- 關於叢集節點timeline不一致的處理方式
- Oracle RAC 一個節點不能自動啟動 怪問題Oracle
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- oracle 11gR2 asm例項 不能啟動處理方法OracleASM
- Oracle 11gR2 RAC叢集服務啟動與關閉總結Oracle
- Oracle RAC 10g叢集節點增加Oracle
- dbca 啟動圖形不成功的處理方法
- 安裝 11gR2 Grid Infrastructure(CRS)失敗的處理過程 (文件 ID 1946678.1)ASTStruct
- oracle case處理案例(一)Oracle
- Jedis操作單節點redis,叢集及redisTemplate操作redis叢集(一)Redis
- spark 叢集啟動後,worker 節點worker 程式一段時間後自動結束Spark
- oracle 10g crs 10.2.0.3 升級到10.2.04不成功問題處理Oracle 10g
- consul 多節點/單節點叢集搭建
- RAC修改叢集兩個節點public ip地址
- RAC節點hang住, oracle bug導致了cpu過高,無法啟動叢集隔離Oracle