換主機板-主機名被修改 node1無法獲取css模組

dotaddjj發表於2012-08-17

客戶把rac兩個機器重啟後,rac101無法正常啟動。

[root@bogon crsd]# crs_stat -t -v

Name Type R/RA F/FT Target State Host

----------------------------------------------------------------------

ora....o1.inst application 0/5 0/0 ONLINE OFFLINE

ora....o2.inst application 0/5 0/0 ONLINE ONLINE rac102

ora....uo2.srv application 0/1 0/0 ONLINE UNKNOWN rac101

ora....rver.cs application 0/1 0/1 ONLINE UNKNOWN rac101

ora.benguo.db application 0/1 0/1 ONLINE ONLINE rac101

ora....SM1.asm application 0/5 0/0 ONLINE UNKNOWN rac101

ora....01.lsnr application 0/5 0/0 ONLINE UNKNOWN rac101

ora.rac101.gsd application 0/5 0/0 ONLINE UNKNOWN rac101

ora.rac101.ons application 0/3 0/0 ONLINE UNKNOWN rac101

ora.rac101.vip application 0/0 0/0 ONLINE ONLINE rac101

ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac102

ora....02.lsnr application 0/5 0/0 ONLINE ONLINE rac102

ora.rac102.gsd application 0/5 0/0 ONLINE ONLINE rac102

ora.rac102.ons application 0/3 0/0 ONLINE ONLINE rac102

ora.rac102.vip application 0/0 0/0 ONLINE ONLINE rac102

由於資源全部是unknown,其實10.2.0.1clusterware db本來就bug比較多,經常會碰見資源unknown的狀態。

這裡嘗試關閉crs而後重啟crs

[root@bogon crsd]# crsctl stop crs

clsz init failed while trying to stop resources.

Possible cause: CRSD is down.

Failure at scls_scr_create with code 1

Internal Error Information:

Category: 1234

Operation: scls_scr_create

Location: mkdir

Other: Unable to make user dir

Dep: 2

[root@bogon crsd]# crsctl check crs

Failure 1 contacting CSS daemon

CRS appears healthy

EVM appears healthy

Css模組無法去連線,節點rac102可以正常啟動的,不過檢視rac101css程式確是執行的。

[root@bogon cssd]# cat /etc/hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1 localhost.localdomain localhost

192.168.0.2 rac101

192.168.0.3 rac102

192.168.1.2 priv101

192.168.1.3 priv102

192.168.0.12 vip101

192.168.0.13 vip102

root@bogon原來客戶跟換主機板後,主機名直接取的域名,原因找到了,其實質而客戶對rac101換了相應的主機板,也是因為主機名改後,導致節點互ping心跳機制出現問題,而後rac101也就無法去定位css資源。

而後crs正常啟動後又出現瞭如下問題:

Tomcat中出現大量的:

ORA-12519, TNS:no appropriate service handler found

[root@rac101 ~]# ps -ef|grep lsn

oracle 10641 1 0 16:04 ? 00:00:00 /db/oracle10gasm/product/10.2.0/asm/bin/tnslsnr LISTENER_RAC101 -inherit

root 16629 15404 0 16:08 pts/2 00:00:00 grep lsn

啟動的是asm目錄下監聽程式,也就是這個監聽程式僅僅只監聽是關於asm例項的,當然客戶端無法連線,手動kill掉這個監聽後,手動啟動rac101的監聽。

[oracle@rac101 ~]$ lsnrctl start

LSNRCTL for Linux: Version 10.2.0.1.0 - Production on 17-AUG-2012 16:09:20

Copyright (c) 1991, 2005, Oracle. All rights reserved.

Starting /db/oracle10grac/product/10.2.0/db/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 10.2.0.1.0 - Production

System parameter file is /db/oracle10grac/product/10.2.0/db/network/admin/listener.ora

Log messages written to /db/oracle10grac/product/10.2.0/db/network/log/listener.log

Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rac101)(PORT=1521)))

處理rac asm的問題自己還僅僅只停留在理論上面,特別對於主機和網路上面要優先排查,以後在高可用上面還要多花功夫!

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25362835/viewspace-1059205/,如需轉載,請註明出處,否則將追究法律責任。

相關文章