【RAC】Oracle11g RAC CRS磁碟丟失後恢復

xysoul_雲龍發表於2015-06-15
一、         概述 

      為了方便相關問題測試,我在本機搭建了一套RAC環境,但昨天開啟後卻發現RAC無法啟動了,不錯,就當一次實戰演練了。   
   
測試環境:Redhat6.3_x64+ oracle11gr2 RAC

二、         處理過程:
   
在啟動虛擬機器一段時間後,透過命令檢視,資訊如下:

[grid@rac01 ~]$ crs_stat -t

CRS-0184: Cannot communicate with the CRS daemon.
[grid@rac01 ~]$ crsctl status res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.

    檢視CRS服務狀態

[root@rac01 rac-cluster]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4530: Communications failure contacting Cluster Synchronization Services daemon

CRS-4534: Cannot communicate with Event Manager

   啟動Cluster資源

[root@rac01 bin]#crsctl start cluster

CRS-2800: Cannot start resource 'ora.asm' as it is already in the INTERMEDIATE state on server 'rac01'

CRS-4000: Command Start failed, or completed with errors.

相關相關日誌,獲取到如下資訊,並未在其他日誌中找到更有效的參考資訊,如果有好的建議,請聯絡在下:

---alter.log

[ohasd(2017)]CRS-2807:Resource 'ora.crsd' failed to start automatically.

---ocssd.log
    2015-06-12 03:07:14.722: [    CLSF][2402883328]Allocated CLSF context

2015-06-12 03:07:14.723: [   SKGFD][2402883328]Handle 0x16f57d0 from lib :UFS:: for disk :/dev/asm-diskb:

 

2015-06-12 03:07:14.723: [    CSSD][2402883328]clssnmlalloccx:phyname rac01

2015-06-12 03:07:14.742: [    CSSD][2402883328]clssnmvDiskAvailabilityChange: voting file /dev/asm-diskb now online

2015-06-12 03:07:14.742: [    CSSD][2402883328]clssnmlgetfileslot: found expired slot 1 for host rac01 leasename rac01

2015-06-12 03:07:14.747: [   SKGFD][2381424384]NOTE: No asm libraries found in the system

 

2015-06-12 03:07:14.747: [    CLSF][2381424384]Allocated CLSF context

2015-06-12 03:07:14.748: [   SKGFD][2381424384]Handle 0x7f4d7008e6b0 from lib :UFS:: for disk :/dev/asm-diskb:

 

2015-06-12 03:07:14.748: [   SKGFD][2381424384]Lib :UFS:: closing handle 0x7f4d7008e6b0 for disk :/dev/asm-diskb:

2015-06-12 03:07:15.749: [   SKGFD][2381424384]NOTE: No asm libraries found in the system

檢視CSS資訊

[grid@rac01 ~]$ crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

--  -----    -----------------                --------- ---------

    1. ONLINE   aaaf9f57bc9c4fc7bfb57ac937d2d149 (/dev/asm-diskb) [CRS]

下面我透過ASM例項檢視相關ASM磁碟資訊:

SQL> select NAME , STATE FROM V$ASM_DISKGROUP;

 

NAME                           STATE

------------------------------ -----------

DATA                           DISMOUNTED

CRS                            DISMOUNTED

OK,嘗試MOUNT磁碟組(後續,整理是發現奇怪問題,既然前邊我們檢視css資訊時 磁碟是online,那麼這我們卻無法mount,並未嘗試強制mount,有待進一步研究)

SQL> alter diskgroup crs mount;

alter diskgroup crs mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "1" is missing from group number "1"

嘗試MOUNT DATA磁碟組

SQL> alter diskgroup data mount;

 

Diskgroup altered.

 

SQL> select NAME , STATE FROM V$ASM_DISKGROUP; 

 

NAME                           STATE

------------------------------ -----------

DATA                           MOUNTED

CRS                            DISMOUNTED

注:現在寫下當時處理問題的過程,並未過多深入研究問題,在整理文件時有了更多思考,暫且不討論。
 
既然磁碟組DATA可以用,那麼我們先將CRS等資訊儲存到DATA磁碟組中,之前並未手動備份過CRS等資訊,只能透過自動備份資訊恢復。
 
停止CRS服務,兩個節點都執行

[root@rac01 rac-cluster]# crsctl stop has -f

  再次啟動,以NOCRS方式啟動CRS,節點1執行

[root@rac01 rac-cluster]# crsctl start crs -excl -nocrs

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac01'

CRS-2676: Start of 'ora.mdnsd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac01'

CRS-2676: Start of 'ora.gpnpd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac01'

CRS-2672: Attempting to start 'ora.gipcd' on 'rac01'

CRS-2676: Start of 'ora.cssdmonitor' on 'rac01' succeeded

CRS-2676: Start of 'ora.gipcd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'rac01'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac01'

CRS-2676: Start of 'ora.diskmon' on 'rac01' succeeded

CRS-2676: Start of 'ora.cssd' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac01'

CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2672: Attempting to start 'ora.ctssd' on 'rac01'

CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'rac01'

CRS-2676: Start of 'ora.drivers.acfs' on 'rac01' succeeded

CRS-2676: Start of 'ora.ctssd' on 'rac01' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'rac01' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'rac01'

CRS-2676: Start of 'ora.asm' on 'rac01' succeeded

修改/etc/oracle/ocr.loc檔案,將OCR修改為DATA,兩個節點都需要修改。
檢視備份情況,選擇一個最近時間節點恢復

檢視命令:ocrconfig –showbackup
[root@rac01 rac-cluster]# ocrconfig -restore /grid/crs_home/product/11.2.0/cdata/rac-cluster/week.ocr

[root@rac01 rac-cluster]# ocrcheck

Status of Oracle Cluster Registry is as follows :

         Version                  :          3

         Total space (kbytes)     :     262120

         Used space (kbytes)      :       3088

         Available space (kbytes) :     259032

         ID                       :  471595559

         Device/File Name         :      +DATA

                                    Device/File integrity check succeeded

 

                                    Device/File not configured

 

                                    Device/File not configured

 

                                    Device/File not configured

 

                                    Device/File not configured

 

         Cluster registry integrity check succeeded

 

         Logical corruption check succeeded

建立VOTEDISK

在建立時出現以下問題,解決辦法如下:

[root@rac01 rac-cluster]# crsctl replace votedisk +DATA

CRS-4602: Failed 27 to add voting file 7255773670ae4fa9bf64a150a9fd5915.

Failure 27 with Cluster Synchronization Services while deleting voting disk.

Failed to replace voting disk group with +DATA.

CRS-4000: Command Replace failed, or completed with errors.

設定ASM磁碟搜尋路徑

SQL> show parameter asm_diskstring

 

NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

asm_diskstring                       string

SQL> alter system set asm_diskstring = '/dev/asm*';

 

System altered.

 

SQL> create spfile='+DATA' from memory;

 

File created.

 

SQL> startup force mount;


再次建立VOTEDISK

[root@rac01 rac-cluster]# crsctl replace votedisk +DATA

Successful addition of voting disk 383b8c3e4db34f72bf9dedd15e47471b.

Successful deletion of voting disk aaaf9f57bc9c4fc7bfb57ac937d2d149.

Successfully replaced voting disk group with +DATA.

CRS-4266: Voting file(s) successfully replaced

停止叢集服務,再次啟動

[root@rac01 rac-cluster]# crsctl stop has –f
……………………
--
兩個節點順序啟動
[root@rac01 rac-cluster]# crsctl start crs

CRS-4123: Oracle High Availability Services has been started.


 
透過下面叢集狀態檢查,我們可以看到CRS狀態為OFFLINE,需要我們透過asm管理工具重新整理磁碟。

[root@rac01 bin]# crs_stat –t

Name           Type           Target    State     Host       

------------------------------------------------------------

ora.CRS.dg     ora....up.type ONLINE   OFFNLINE             

ora.DATA.dg    ora....up.type ONLINE    ONLINE    rac01      

ora....ER.lsnr ora....er.type ONLINE    ONLINE    rac01      

ora....N1.lsnr ora....er.type ONLINE    ONLINE    rac01      

ora.asm        ora.asm.type   ONLINE    ONLINE    rac01      

ora.cvu        ora.cvu.type   ONLINE    ONLINE    rac01      

ora.gsd        ora.gsd.type   OFFLINE   OFFLINE              

ora....network ora....rk.type ONLINE    ONLINE    rac01      

ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    rac01      

ora.ons        ora.ons.type   ONLINE    ONLINE    rac01      

ora....SM1.asm application    ONLINE    ONLINE    rac01      

ora....01.lsnr application    ONLINE    ONLINE    rac01      

ora.rac01.gsd  application    OFFLINE   OFFLINE              

ora.rac01.ons  application    ONLINE    ONLINE    rac01      

ora.rac01.vip  ora....t1.type ONLINE    ONLINE    rac01      

ora....SM2.asm application    ONLINE    ONLINE    rac02      

ora....02.lsnr application    ONLINE    ONLINE    rac02      

ora.rac02.gsd  application    OFFLINE   OFFLINE              

ora.rac02.ons  application    ONLINE    ONLINE    rac02      

ora.rac02.vip  ora....t1.type ONLINE    ONLINE    rac02      

ora.racdb.db   ora....se.type OFFLINE   OFFLINE              

ora....ry.acfs ora....fs.type ONLINE    ONLINE    rac01      

ora.scan1.vip  ora....ip.type ONLINE    ONLINE    rac01 

三、         總結:
  
此次測試系統情況,主要透過之前叢集自動備份恢復至新的磁碟組解決出現的問題, 只針對問題做出瞭解決,並未查詢出根本原因,這個需要進一步去查證,當然虛擬環境容易出現問題,我們可以透過這種方式鍛鍊自己解決問題的能力。此次出現問題的磁碟組是CRS,透過備份已恢復,加入DATA磁碟組呢,首先對於資料,我們需要定製備份計劃,其次在處理該問題時應該更慎重、有更好的計劃。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29487349/viewspace-1699535/,如需轉載,請註明出處,否則將追究法律責任。

相關文章