Oracle OCR和VOTEDISK故障修復

chenoracle發表於2017-11-08

Oracle OCR和VOTEDISK故障修復

實驗目的
模擬ASM中CRS磁碟組(包括ocr和votedisk)損壞及修復
模擬故障:通過dd命令,破壞CRS磁碟組對應的所有磁碟
實驗環境: 
(1)OS:Enterprise Linux Enterprise Linux Server release 5.4
(2)DB: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 
Ocr和Votedisk修復過程:
(1)強制停止所有節點CRS
(2)節點一獨佔模式啟動CRS
(3)進入ASM例項,建立新的磁碟組CRS
(4)恢復OCR
(5)恢復votedisk
(6)所有節點重啟CRS
(7)所有節點重啟Oracle例項

詳細修復過程如下:
一:檢視基本資訊
(1)檢查ocr
[root@host01 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3004
         Available space (kbytes) :     259116
         ID                       : 1940676675
         Device/File Name         :       +CRS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check succeeded

(2)檢視ocr備份  
[root@host01 bin]# ./ocrconfig -showbackup
host01     2017/11/05 21:42:56     /u01/app/11.2.0/grid/cdata/cluster/backup00.ocr
host01     2017/11/05 21:42:56     /u01/app/11.2.0/grid/cdata/cluster/day.ocr
host01     2017/11/05 21:42:56     /u01/app/11.2.0/grid/cdata/cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
 
[root@host01 bin]# ./ocrconfig -local -manualbackup
host01     2017/11/06 18:55:05     /u01/app/11.2.0/grid/cdata/host01/backup_20171106_185505.olr
host01     2017/11/05 17:29:37     /u01/app/11.2.0/grid/cdata/host01/backup_20171105_172937.olr
 
(3)檢視磁碟組資訊
SQL> select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
CRS                            MOUNTED
DATA                           CONNECTED
FRA                            CONNECTED

SQL>
select a.group_number, a.name, b.path
  from v$asm_diskgroup a, v$asm_disk b
 where a.group_number = b.group_number
 order by 1, 3;

GROUP_NUMBER NAME PATH
1 1 CRS    ORCL:ASMDISK1
2 1 CRS    ORCL:ASMDISK2
3 1 CRS    ORCL:ASMDISK3
4 2 DATA ORCL:ASMDISK5
5 2 DATA ORCL:ASMDISK6
6 2 DATA ORCL:ASMDISK7
7 2 DATA ORCL:ASMDISK8
8 2 DATA ORCL:ASMDISK9
9 3 FRA    ORCL:ASMDISK10
10 3 FRA    ORCL:ASMDISK11
11 3 FRA    ORCL:ASMDISK12

(4)檢視votedisk資訊
[oracle@host01 bin]$ ./crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   8063959a6aaf4fcfbfc6e80d9b9babf5 (ORCL:ASMDISK1) [CRS]
 2. ONLINE   aaf1dd0274a24fcdbfedafa9cef41155 (ORCL:ASMDISK2) [CRS]
 3. ONLINE   1908a480ec324f10bf3c1f52cc1660da (ORCL:ASMDISK3) [CRS]
Located 3 voting disk(s).

二:模擬故障
(1)通過dd命令,破壞CRS磁碟組對應的所有磁碟
[root@host01 ~]# dd if=/dev/zero of=/dev/sdb1 bs=1024k count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.369057 seconds, 28.4 MB/s
[root@host01 ~]# dd if=/dev/zero of=/dev/sdb2 bs=1024k count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.0136208 seconds, 770 MB/s
[root@host01 ~]# dd if=/dev/zero of=/dev/sdb3 bs=1024k count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.030256 seconds, 347 MB/s

此時ocrchek工具還沒有檢測到故障
[root@host01 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3004
         Available space (kbytes) :     259116
         ID                       : 1940676675
         Device/File Name         :       +CRS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check succeeded

停止節點一has後,節點二ocrcheck可以檢測到故障,無法訪問物理儲存
[root@host01 bin]# ./crsctl stop has
[root@host02 bin]# ./ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
PROC-26: Error while accessing the physical storage

三:Ocr和Votedisk修復過程
(1)強制停止所有節點CRS
[root@host01 bin]# ./crsctl stop crs -f
[root@host02 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'host01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'host01'
CRS-2673: Attempting to stop 'ora.crf' on 'host01'
CRS-2677: Stop of 'ora.crf' on 'host01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'host01'
CRS-2677: Stop of 'ora.mdnsd' on 'host01' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'host01' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'host01'
CRS-2677: Stop of 'ora.gpnpd' on 'host01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'host01' has completed
CRS-4133: Oracle High Availability Services has been stopped.

需要加-f引數強制關閉,否則報錯如下:
[root@host02 bin]# ./crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.

(2)節點一獨佔模式啟動CRS
-excl:獨佔模式
-nocrs:不檢查crs及votedisk
[root@host01 bin]# ./crsctl start crs -excl -nocrs
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'host01'
CRS-2676: Start of 'ora.mdnsd' on 'host01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'host01'
CRS-2676: Start of 'ora.gpnpd' on 'host01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'host01'
CRS-2672: Attempting to start 'ora.gipcd' on 'host01'
CRS-2676: Start of 'ora.cssdmonitor' on 'host01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'host01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'host01'
CRS-2672: Attempting to start 'ora.diskmon' on 'host01'
CRS-2676: Start of 'ora.diskmon' on 'host01' succeeded
CRS-2676: Start of 'ora.cssd' on 'host01' succeeded
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'host01'
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'host01'
CRS-2672: Attempting to start 'ora.ctssd' on 'host01'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'host01' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'host01'
CRS-2676: Start of 'ora.ctssd' on 'host01' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'host01' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'host01' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'host01'
CRS-2676: Start of 'ora.asm' on 'host01' succeeded

(3)進入ASM例項,建立新的磁碟組CRS
[oracle@host01 ~]$ export ORACLE_BASE=/u01/app/grid
[oracle@host01 ~]$ export ORACLE_HOME=/u01/app/11.2.0/grid
[oracle@host01 ~]$ export PATH=$ORACLE_HOME/bin:$PATH
[oracle@host01 ~]$ export ORACLE_SID=+ASM1
[oracle@host01 ~]$ sqlplus / as sysasm
SQL> select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
FRA                            DISMOUNTED
DATA                           DISMOUNTED

[oracle@host01 ~]$ sqlplus / as sysasm
SQL> create diskgroup CRS normal redundancy disk 'ORCL:ASMDISK13','ORCL:ASMDISK14','ORCL:ASMDISK15' attribute 'COMPATIBLE.ASM' = '11.2.0';

Diskgroup created.

SQL> select name,state from v$asm_diskgroup;
NAME                           STATE
------------------------------ -----------
FRA                            DISMOUNTED
DATA                           DISMOUNTED
CRS                            MOUNTED

需要sysasm方式登入資料庫建立磁碟組,如果已sysdba建立,會報錯沒有許可權:
SQL> create diskgroup CRS normal redundancy disk 'ORCL:ASMDISK10','ORCL:ASMDISK11','ORCL:ASMDISK12' attribute 'COMPATIBLE.ASM' = '11.2.0';
create diskgroup CRS normal redundancy disk 'ORCL:ASMDISK10','ORCL:ASMDISK11','ORCL:ASMDISK12' attribute 'COMPATIBLE.ASM' = '11.2.0'
*
ERROR at line 1:
ORA-15260: permission denied on ASM disk group

(4)恢復OCR
[root@host02 bin]# ./ocrconfig -showbackup
PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy

host01     2017/11/05 21:42:56     /u01/app/11.2.0/grid/cdata/cluster/backup00.ocr
host01     2017/11/05 21:42:56     /u01/app/11.2.0/grid/cdata/cluster/day.ocr
host01     2017/11/05 21:42:56     /u01/app/11.2.0/grid/cdata/cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
You have new mail in /var/spool/mail/root

[root@host01 ~]# cd /u01/app/11.2.0/grid/bin/
[root@host01 bin]# ./ocrconfig -restore /u01/app/11.2.0/grid/cdata/cluster/backup00.ocr

[root@host01 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3036
         Available space (kbytes) :     259084
         ID                       : 1940676675
         Device/File Name         :       +CRS
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
         Cluster registry integrity check succeeded
         Logical corruption check succeeded

(5)恢復votedisk
[root@host01 bin]# ./crsctl query css votedisk
Located 0 voting disk(s).

[root@host01 bin]# ./crsctl replace votedisk +CRS
Successful addition of voting disk 35d4a3b952f84fe6bf0a1260c5f147ed.
Successful addition of voting disk 21fc8a3162ba4fddbf7478ca6c9e8a32.
Successful addition of voting disk 3abb687e95d04f32bf732f359fca48c1.
Successfully replaced voting disk group with +CRS.
CRS-4266: Voting file(s) successfully replaced

[root@host01 bin]# ./crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   35d4a3b952f84fe6bf0a1260c5f147ed (ORCL:ASMDISK13) [CRS]
 2. ONLINE   21fc8a3162ba4fddbf7478ca6c9e8a32 (ORCL:ASMDISK14) [CRS]
 3. ONLINE   3abb687e95d04f32bf732f359fca48c1 (ORCL:ASMDISK15) [CRS]
Located 3 voting disk(s).

(6)所有節點重啟CRS
[root@host01 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'host01'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'host01'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'host01'
CRS-2673: Attempting to stop 'ora.ctssd' on 'host01'
CRS-2673: Attempting to stop 'ora.asm' on 'host01'
CRS-2677: Stop of 'ora.mdnsd' on 'host01' succeeded
CRS-2677: Stop of 'ora.asm' on 'host01' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'host01'
CRS-2677: Stop of 'ora.drivers.acfs' on 'host01' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'host01' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'host01' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'host01'
CRS-2677: Stop of 'ora.cssd' on 'host01' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'host01'
CRS-2677: Stop of 'ora.gipcd' on 'host01' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'host01'
CRS-2677: Stop of 'ora.gpnpd' on 'host01' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'host01' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@host01 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

[root@host02 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

[root@host01 bin]# ./crs_stat -t -v
Name           Type           R/RA   F/FT   Target    State     Host        
----------------------------------------------------------------------
ora.CRS.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    host01      
ora.DATA.dg    ora....up.type 0/5    0/     ONLINE    ONLINE    host01      
ora.FRA.dg     ora....up.type 0/5    0/     ONLINE    ONLINE    host01      
ora....ER.lsnr ora....er.type 0/5    0/     ONLINE    ONLINE    host01      
ora....N1.lsnr ora....er.type 0/5    0/0    ONLINE    ONLINE    host01      
ora.asm        ora.asm.type   0/5    0/     ONLINE    ONLINE    host01      
ora.cvu        ora.cvu.type   0/5    0/0    ONLINE    ONLINE    host01      
ora.gsd        ora.gsd.type   0/5    0/     OFFLINE   OFFLINE               
ora....SM1.asm application    0/5    0/0    ONLINE    ONLINE    host01      
ora....01.lsnr application    0/5    0/0    ONLINE    ONLINE    host01      
ora.host01.gsd application    0/5    0/0    OFFLINE   OFFLINE               
ora.host01.ons application    0/3    0/0    ONLINE    ONLINE    host01      
ora.host01.vip ora....t1.type 0/0    0/0    ONLINE    ONLINE    host01      
ora....SM2.asm application    0/5    0/0    ONLINE    ONLINE    host02      
ora....02.lsnr application    0/5    0/0    OFFLINE   OFFLINE               
ora.host02.gsd application    0/5    0/0    OFFLINE   OFFLINE               
ora.host02.ons application    0/3    0/0    ONLINE    ONLINE    host02      
ora.host02.vip ora....t1.type 0/0    0/0    ONLINE    ONLINE    host02      
ora....network ora....rk.type 0/5    0/     ONLINE    ONLINE    host01      
ora.oc4j       ora.oc4j.type  0/1    0/2    ONLINE    ONLINE    host01      
ora.ons        ora.ons.type   0/3    0/     ONLINE    ONLINE    host01      
ora.racdb.db   ora....se.type 0/2    0/1    OFFLINE   OFFLINE               
ora....ry.acfs ora....fs.type 0/5    0/     ONLINE    ONLINE    host01      
ora.scan1.vip  ora....ip.type 0/0    0/0    ONLINE    ONLINE    host01

(7)所有節點重啟Oracle例項
SQL> startup
ORACLE instance started.

Total System Global Area  318046208 bytes
Fixed Size                  1344680 bytes
Variable Size             171969368 bytes
Database Buffers          138412032 bytes
Redo Buffers                6320128 bytes
Database mounted.
Database opened.

SQL> select open_mode from v$database;
OPEN_MODE
--------------------
READ WRITE

SQL> show parameter name

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_name_convert                 string
db_name                              string      RACDB
db_unique_name                       string      RACDB
global_names                         boolean     FALSE
instance_name                        string      RACDB1
lock_name_space                      string
log_file_name_convert                string
processor_group_name                 string
service_names                        string      RACDB

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29785807/viewspace-2146998/,如需轉載,請註明出處,否則將追究法律責任。

相關文章