【kingsql分享】RAC節點故障修復一例

kingsql發表於2016-05-31
很久之前安裝的虛擬機器,今天開機之後,rac2和rac3叢集服務無法啟動
[grid@rac2 cssd]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[grid@rac3 cssd]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[grid@rac1 ~]$ srvctl status database -d kingsql
Instance kingsql1 is running on node rac1
Instance kingsql2 is not running on node rac2
Instance kingsql3 is not running on node rac3


rac2的cssd日誌發現問題
2016-04-19 17:00:47.205: [    CSSD][3666859776]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 356447081, wrtcnt, 294277, LATS 4294697710, lastSeqNo 294274, uniqueness 1461034467, timestamp 1461056437/21174804
2016-04-19 17:00:47.205: [    CSSD][3666859776]clssnmvDHBValidateNCopy: node 3, rac3, has a disk HB, but no network HB, DHB has rcfg 356447080, wrtcnt, 41563, LATS 4294697710, lastSeqNo 41560, uniqueness 1463643843, timestamp 1463643914/18290394

參考文章
http://www.itpub.net/thread-1766984-1-1.html

[grid@rac1 ~]$ crs_stop -all
CRS-2500: Cannot stop resource 'ora.gsd' as it is not running
Attempting to stop `ora.oc4j` on member `rac1`
Attempting to stop `ora.CRS.dg` on member `rac1`
Attempting to stop `ora.DATA.dg` on member `rac1`
Attempting to stop `ora.kingsql.db` on member `rac1`
Attempting to stop `ora.ons` on member `rac1`
CRS-2789: Cannot stop resource 'ora.gsd' as it is not running on server 'rac1'
Stop of `ora.ons` on member `rac1` succeeded.
Attempting to stop `ora.cvu` on member `rac1`
Attempting to stop `ora.rac2.vip` on member `rac1`
Attempting to stop `ora.rac3.vip` on member `rac1`
Attempting to stop `ora.LISTENER_SCAN1.lsnr` on member `rac1`
Stop of `ora.rac3.vip` on member `rac1` succeeded.
Stop of `ora.rac2.vip` on member `rac1` succeeded.
Stop of `ora.LISTENER_SCAN1.lsnr` on member `rac1` succeeded.
Attempting to stop `ora.scan1.vip` on member `rac1`
Stop of `ora.scan1.vip` on member `rac1` succeeded.
Stop of `ora.oc4j` on member `rac1` succeeded.
Stop of `ora.cvu` on member `rac1` succeeded.
Attempting to stop `ora.net1.network` on member `rac1`
Stop of `ora.net1.network` on member `rac1` succeeded.
Stop of `ora.kingsql.db` on member `rac1` succeeded.
Stop of `ora.DATA.dg` on member `rac1` succeeded.
Stop of `ora.CRS.dg` on member `rac1` succeeded.
Attempting to stop `ora.asm` on member `rac1`
Stop of `ora.asm` on member `rac1` succeeded.
Attempting to stop `ora.asm` on member `rac1`
--卡在這不動了,於是執行crsctl stop crs

CRS-0184: Cannot communicate with the CRS daemon.
[root@rac1 ~]# /u01/app/11.2.3/grid/bin/crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac1'
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@rac2 ~]# /u01/app/11.2.3/grid/bin/crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
[root@rac3 ~]# /u01/app/11.2.3/grid/bin/crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac3'
CRS-2673: Attempting to stop 'ora.crsd' on 'rac3'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac3'
CRS-2673: Attempting to stop 'ora.CRS.dg' on 'rac3'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'rac3'
CRS-2677: Stop of 'ora.DATA.dg' on 'rac3' succeeded
CRS-2677: Stop of 'ora.CRS.dg' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'rac3'
CRS-2677: Stop of 'ora.asm' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'rac3'
CRS-2677: Stop of 'ora.ons' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'rac3'
CRS-2677: Stop of 'ora.net1.network' on 'rac3' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac3' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'rac3'
CRS-2673: Attempting to stop 'ora.evmd' on 'rac3'
CRS-2673: Attempting to stop 'ora.asm' on 'rac3'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac3'
CRS-2677: Stop of 'ora.evmd' on 'rac3' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac3' succeeded
CRS-2677: Stop of 'ora.asm' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac3'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rac3'
CRS-2677: Stop of 'ora.cssd' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac3'
CRS-2677: Stop of 'ora.crf' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac3'
CRS-2677: Stop of 'ora.gipcd' on 'rac3' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac3'
CRS-2677: Stop of 'ora.gpnpd' on 'rac3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac3' has completed
CRS-4133: Oracle High Availability Services has been stopped.

啟動
[root@rac1 ~]# /u01/app/11.2.3/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@rac2 ~]# /u01/app/11.2.3/grid/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
--還是有問題
[root@rac3 ~]# /u01/app/11.2.3/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

[grid@rac1 ~]$ crs_start -all
[grid@rac1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.CRS.dg     ora....up.type ONLINE    ONLINE    rac1        
ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac1        
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1        
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac3        
ora.asm        ora.asm.type   ONLINE    ONLINE    rac1        
ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac3        
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE               
ora.kingsql.db ora....se.type ONLINE    ONLINE    rac3        
ora....network ora....rk.type ONLINE    ONLINE    rac1        
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac3        
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1        
ora....SM1.asm application    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    OFFLINE   OFFLINE               
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1        
ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac1        
ora....SM3.asm application    ONLINE    ONLINE    rac3        
ora....C3.lsnr application    ONLINE    ONLINE    rac3        
ora.rac3.gsd   application    OFFLINE   OFFLINE               
ora.rac3.ons   application    ONLINE    ONLINE    rac3        
ora.rac3.vip   ora....t1.type ONLINE    ONLINE    rac3        
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac3   

重啟rac2作業系統
[root@rac2 ~]# reboot
[root@rac2 ~]#
Broadcast message from root@rac2
        (/dev/pts/0) at 17:40 ...

The system is going down for reboot NOW!

[grid@rac1 ~]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.CRS.dg     ora....up.type ONLINE    ONLINE    rac1        
ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac1        
ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac1        
ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac3        
ora.asm        ora.asm.type   ONLINE    ONLINE    rac1        
ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac3        
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE               
ora.kingsql.db ora....se.type ONLINE    ONLINE    rac3        
ora....network ora....rk.type ONLINE    ONLINE    rac1        
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac3        
ora.ons        ora.ons.type   ONLINE    ONLINE    rac1        
ora....SM1.asm application    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    OFFLINE   OFFLINE               
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   ora....t1.type ONLINE    ONLINE    rac1        
ora....SM2.asm application    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    OFFLINE   OFFLINE               
ora.rac2.ons   application    ONLINE    ONLINE    rac2        
ora.rac2.vip   ora....t1.type ONLINE    ONLINE    rac2        
ora....SM3.asm application    ONLINE    ONLINE    rac3        
ora....C3.lsnr application    ONLINE    ONLINE    rac3        
ora.rac3.gsd   application    OFFLINE   OFFLINE               
ora.rac3.ons   application    ONLINE    ONLINE    rac3        
ora.rac3.vip   ora....t1.type ONLINE    ONLINE    rac3        
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac3     

[grid@rac1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
               ONLINE  ONLINE       rac3                                         
ora.DATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
               ONLINE  ONLINE       rac3                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
               ONLINE  ONLINE       rac3                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
               ONLINE  ONLINE       rac3                     Started             
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
               OFFLINE OFFLINE      rac3                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
               ONLINE  ONLINE       rac3                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
               ONLINE  ONLINE       rac3                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac3                                         
ora.cvu
      1        ONLINE  ONLINE       rac3                                         
ora.kingsql.db
      1        ONLINE  ONLINE       rac3                     Open                
      2        ONLINE  ONLINE       rac2                     Open                
      3        ONLINE  ONLINE       rac1                     Open                
ora.oc4j
      1        ONLINE  ONLINE       rac3                                         
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.rac3.vip
      1        ONLINE  ONLINE       rac3                                         
ora.scan1.vip
      1        ONLINE  ONLINE       rac3                                        

結論:當
[root@rac2 ~]# /u01/app/11.2.3/grid/bin/crsctl stop crs

[root@rac2 ~]# /u01/app/11.2.3/grid/bin/crsctl start crs
還是無法重啟的時候,最後的大招就是reboot


kingsql分享
2016年5月31日
轉載請註明出處

Oracle Young Expert查詢網址


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28389881/viewspace-2109655/,如需轉載,請註明出處,否則將追究法律責任。

相關文章