10G rac 因為ocr原因導致crs不能啟動的排查二例

還不算暈發表於2015-11-20

近期遇到過兩次RAC節點的主機後記 後CRS不能啟動的情況。

案例1:LINUX+10.2.0.5RAC平臺,OCR對應的裸裝置許可權在重啟後不正確,因為設定裸裝置許可權的指令碼設定有誤。

案例2:主機版本為HP-UX B.11.31,使用的是的HP-UX Service Guard叢集件,小機當機重啟後VG未掛載導致OCR所在磁碟無法訪問。

記錄如下:

案例1:

LINUX+10.2.0.5RAC平臺,OCR對應的裸裝置許可權在重啟後不正確,因為設定裸裝置許可權的指令碼設定有誤。

情況如下:

[root@rac02 ~]# ps -ef|grep css
root     16820     1  0 May25 ?        00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root     16872 16818  0 May25 ?        00:01:48 /bin/sh /etc/init.d/init.cssd startcheck
root     16924 16820  0 May25 ?        00:01:38 /bin/sh /etc/init.d/init.cssd startcheck
root     17062 16823  0 May25 ?        00:01:50 /bin/sh /etc/init.d/init.cssd startcheck
root     17866 17636  0 19:32 pts/1    00:00:00 grep css

[root@rac02 ~]# tail /var/log/messages
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16924.
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.17062.
Sep 11 19:33:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16872.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16924.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.17062.
Sep 11 19:34:04 rac02 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.16872.
[root@rac02 log]# cat /tmp/crsctl.17062
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]


[root@rac02 ~]# ls -al /dev/raw*
crw------- 1 root root 162, 0 May 25 01:46 /dev/rawctl

/dev/raw:
total 0
drwxr-xr-x  2 root   root         140 May 25 01:46 .
drwxr-xr-x 14 root   root        5860 May 25 01:46 ..
crw-------  1 root   root     162, 10 May 25 01:46 raw10
crw-------  1 oracle oinstall 162,  3 May 25 01:46 raw3
crw-------  1 oracle oinstall 162,  4 May 25 01:46 raw4
crw-------  1 oracle oinstall 162,  5 May 25 01:46 raw5
crw-------  1 root   root     162,  9 May 25 01:46 raw9


修改指令碼使許可權如下後正常:--注意指令碼設定正確確保下次重啟主機後許可權仍正確 。
[root@rac02 ~]# ls -al /dev/raw*
crw------- 1 root root 162, 0 May 25 01:46 /dev/rawctl

/dev/raw:
total 0
drwxr-xr-x  2 root   root         140 May 25 01:46 .
drwxr-xr-x 14 root   root        5860 May 25 01:46 ..
crw-r-----  1 root   oinstall 162, 10 May 25 01:46 raw10
crw-r--r--  1 oracle oinstall 162,  3 May 25 01:46 raw3
crw-r--r--  1 oracle oinstall 162,  4 May 25 01:46 raw4
crw-r--r--  1 oracle oinstall 162,  5 May 25 01:46 raw5




案例2:

主機版本為HP-UX B.11.31,使用的是的HP-UX Service Guard叢集件,小機當機重啟後VG未掛載導致OCR所在磁碟無法訪問。

故障分析:

rac#[/etc]ps -ef|grep crs
    root  2249     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root 29242 26214  0 16:12:54 pts/0     0:00 grep crs
rac#[/etc]ps -ef|grep init
    root     1     0  0  Nov  5  ?         0:01 init
    root    23     0  0  Nov  5  ?         0:00 pagetable_init_daemon
    root 29368 26214  0 16:15:29 pts/0     0:00 grep init
    root  2247     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.evmd run
    root  2248     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.cssd fatal
    root  2249     1  0  Nov  5  ?         0:00 /bin/sh /sbin/init.d/init.crsd run
    root  2281  2248  0  Nov  5  ?         0:08 /bin/sh /sbin/init.d/init.cssd startcheck
    root  2274  2249  0  Nov  5  ?         0:08 /bin/sh /sbin/init.d/init.cssd startcheck
root  2284  2247  0  Nov  5  ?         0:08 /bin/sh /sbin/init.d/init.cssd startcheck

rac$[/tmp]ls -lrt crsctl*
-rw-rw-rw-   1 oracle     dba            155 Nov  9 15:35 crsctl.2274
-rw-rw-rw-   1 oracle     dba            155 Nov  9 15:35 crsctl.2281
-rw-rw-rw-   1 oracle     dba            155 Nov  9 15:35 crsctl.2284
rac$[/tmp]cat crsctl.2284
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
rac$[/tmp]cat crsctl.2281
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
rac$[/tmp]cat  crsctl.2274
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such device or address] [6]
檢查OCR資訊
nbrbdb2$[/home/oracle]ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     130852
         Used space (kbytes)      :       3312
         Available space (kbytes) :     127540
         ID                       :  245644703
         Device/File Name         : /dev/vgora/rocr0
                                    Device/File integrity check succeeded

                                    Device/File not configured

         Cluster registry integrity check succeeded

nbrbdb2$[/home/oracle]ls -al /dev/vgora/rocr0
crw-r-----   1 oracle     dba         64 0x020001 Jun 14  2013 /dev/vgora/rocr0


檢視節點1上的資訊:
rac$[/oracle/product/10.2.0/crs_1/log/rac/cssd]ls -al /dev/vgora/rocr0
crw-r-----   1 oracle     dba         64 0x020001 Sep 28  2012 /dev/vgora/rocr0

rac#[/]vgdisplay
--- Volume groups ---
VG Name                     /dev/vg00
VG Write Access             read/write     
VG Status                   available                 
Max LV                      255    
Cur LV                      10     
Open LV                     10     
Max PV                      16     
Cur PV                      1      
Act PV                      1      
Max PE per PV               4353         
VGDA                        2   
PE Size (Mbytes)            32              
Total PE                    4343    
Alloc PE                    4073    
Free PE                     270     
Total PVG                   0        
Total Spare PVs             0              
Total Spare PVs in use      0                     

vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vglog".
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vglock".
vgdisplay: Volume group not activated.
vgdisplay: Cannot display volume group "/dev/vgora".

故障解決:
從以上資訊可以看到VG未啟用,導致OCR不可讀寫。
使用如下命令啟用VG後CRS恢復正常:
#[/]vgchange -a s vgora
#[/]vgchange -a s vglog





相關文章