oracle crs voting disk損壞一例(asm+rac)

polestar123發表於2010-03-26
--DBA行有一句老話,什麼損壞都不可怕,可怕的是沒有有效備份
--下面就遇到了這個問題,就是在沒有備份下的恢復
檢查一下crs程式:

$ ps -ef|grep crs
root 10469 1 0 16:52:16 ? 0:00 /bin/sh /etc/init.d/init.crsd run
oracle 14164 9725 0 17:05:48 pts/1 0:00 grep crs


此時需要以root身份啟動crs後臺程式:或者crsctl

# /etc/init.d/init.crs start
Startup will be queued to init within 30 seconds.

# id
uid=0(root) gid=1(other)
# ps -ef|grep crs

--crsctl start crs 啟動crs報錯資訊如下
clsscfg_vhinit: unable(1) to open disk (/dev/rdsk/c1t16d12s5)
Internal Error Information:
Category: 1234
Operation: scls_block_open
Location: open
Other: open failed /dev/rdsk/c1t16d12s5
Dep: 9
Failure 1 checking the Cluster Synchronization Services voting disk '/dev/rdsk/c1t16d12s5'.
Not able to read adequate number of voting disks
PRKH-1010 : Unable to communicate with CRS services.

同時檢視/tmp目錄,有下面類似的檔案生成,檢視
/tmp/crsctl.1177
/tmp/crsctl.1179


#crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

--檢查下這個磁碟(/dev/rdsk/c1t16d12s5)
dd if=/dev/rdsk/c1t16d12s5 of=/opt/oracle/test
提示
dd: /dev/rdsk/c1t16d12s5: open: I/O error

看來voting disk壞了
表決盤壞了,只能透過重建voting disk了,
重建voting disk 有2個方法
1、如果voting disk有備份的話
2、如果沒有備份的話,只能重新安裝clusterware了


開始準備叢集安裝軟體/home/oracle/clusterware
準備保留原來的資訊
節點1:10.253.20.168 節點名稱:ahniosdb1 例項名稱:niosdb1 主機名:AHNIOSDB1
節點2:10.253.20.169 節點名稱:ahniosdb2 例項名稱:niosdb2 主機名:AHNIOSDB2

刪除原來的叢集軟體
首先做好備份
mv /opt/oracle/crs /opt/oracle/crsbak

準備2個新的raw device 每個300M大小 用作安裝crs 和 voting disk
/dev/rdsk/c1t16d14s4 --用來安裝crs
/dev/rdsk/c1t16d14s5 --用作voting disk

安裝過程中
The ONS configuration failed to create

--如果在下面的執行root.sh時出現執行時出現下面的資訊
# ./root.sh
Checking to see if Oracle CRS stack is already configured Oracle
CRS stack is already configured and will be running under init(1M)
--上面的報錯,說明刪除crs時沒有清理乾淨
--清理舊配置資訊(在各節點執行)

rm -rf /etc/oracle/*
rm -rf /var/tmp/.oracle
修改 /etc/inittab, 刪除以下三行.
h1:2:respawn:/etc/init.evmd run >/dev/null 2>&1 h2:2:respawn:/etc/init.cssd fatal >/dev/null 2>&1 h3:2:respawn:/etc/init.crsd run >/dev/null 2>&1 ps -ef|grep init.d
--發現下面的幾個程式始終存在,刪除下面的程式;重起機器
/etc/init.d/init.crsd
/etc/init.d/init.evmd
/etc/init.d/init.cssd


然後再執行root.sh 成功
# ./root.sh

--檢視crs_stat -t 發現gsd ons vip都已經online
# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....db1.gsd application ONLINE ONLINE ahniosdb1
ora....db1.ons application ONLINE ONLINE ahniosdb1
ora....db1.vip application ONLINE ONLINE ahniosdb1
ora....db2.gsd application ONLINE ONLINE ahniosdb2
ora....db2.ons application ONLINE ONLINE ahniosdb2
ora....db2.vip application ONLINE ONLINE ahniosdb2

--準備載入listener asm db instance
--asm 準備好節點名字 asm例項名稱 oracle_home
srvctl add asm -n ahniosdb1 -i +ASM1 -o /opt/oracle/product/10gr2
srvctl add asm -n ahniosdb2 -i +ASM2 -o /opt/oracle/product/10gr2
--db
srvctl add database -d niosdb -o /opt/oracle/product/10gr2
--instance
srvctl add instance -d niosdb -i niosdb1 -n ahniosdb1
srvctl add instance -d niosdb -i niosdb2 -n ahniosdb2
--啟動asm
srvctl start asm -n ahniosdb1
srvctl start asm -n ahniosdb2
--啟動instance
srvctl start instance -d niosdb -i niosdb1
srvctl start instance -d niosdb -i niosdb2
--檢查下各節點的狀態
# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....SM1.asm application ONLINE ONLINE ahniosdb1
ora....B1.lsnr application ONLINE ONLINE ahniosdb1
ora....db1.gsd application ONLINE ONLINE ahniosdb1
ora....db1.ons application ONLINE ONLINE ahniosdb1
ora....db1.vip application ONLINE ONLINE ahniosdb1
ora....SM2.asm application ONLINE ONLINE ahniosdb2
ora....B2.lsnr application ONLINE ONLINE ahniosdb2
ora....db2.gsd application ONLINE ONLINE ahniosdb2
ora....db2.ons application ONLINE ONLINE ahniosdb2
ora....db2.vip application ONLINE ONLINE ahniosdb2
ora.niosdb.db application ONLINE ONLINE ahniosdb2
ora....b1.inst application ONLINE ONLINE ahniosdb1
ora....b2.inst application ONLINE ONLINE ahniosdb2

全部online
至此 oracle 2-node rac voting disk crash的恢復完成

客戶:起來了 起來了 我們自己也連上了。

我說 測試下應用 看能不能連線

dd if=/dev/rdsk/c1t16d12s5 of=/opt/oracle/voting.bak

叢集
/home/oracle/clusterware
# more /etc/hosts配置如下
127.0.0.1 localhost
10.253.20.168 AHNIOSDB1 loghost
10.253.20.173 AHNIOSDB1-VIP
10.11.0.11 AHNIOSDB1-PIV

10.253.20.169 AHNIOSDB2 loghost
10.253.20.174 AHNIOSDB2-VIP
10.11.0.12 AHNIOSDB2-PIV


Details (see full log at /opt/oracle/oraInventory/logs/installActions2010-03-25_07-58-57PM.log):

/opt/oracle/crs/install/onsconfig add_config AHNIOSDB1:6251 AHNIOSDB2:6251

/opt/oracle/crs/bin/oifcfg setif -global bge0/10.253.20.160:public bge1/10.11.0.0:cluster_interconnect sppp0/192.168.254.0:cluster_interconnect

/opt/oracle/crs/bin/cluvfy stage -post crsinst -n AHNIOSDB1,AHNIOSDB2



網上收到的一個其他解決方式 :
1) 清理舊配置資訊(在各節點執行)

rm -rf /etc/oracle/*

rm -rf /var/tmp/.oracle

修改 /etc/inittab, 刪除以下三行.

h1:2:respawn:/etc/init.evmd run >/dev/null 2>&1
h2:2:respawn:/etc/init.cssd fatal >/dev/null 2>&1
h3:2:respawn:/etc/init.crsd run >/dev/null 2>&1
再執行init q

如果在下面的執行root.sh時出現以下資訊則證明配置資訊沒清楚乾淨.

Checking to see if Oracle CRS stack is already configured

Oracle CRS stack is already configured and will be running under init(1M)

2) 清理當前記憶體中資訊
slibclean
用genkld |grep crs 檢查,有的話繼續用 slibclean清楚乾淨.

3) modify /CRS_HOME/install/rootconfig

修改裡面關於ocr和vote disk的資訊.

4) 用oracle使用者touch 出新的ocr檔案和 vote disk (必須)檔案,否則出錯

# /opt/oracle/10g/crs/root.sh
WARNING: directory '/opt/oracle/10g' is not owned by root
WARNING: directory '/opt/oracle' is not owned by root
"/ocr/vote" does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
1

5) 在各節點執行/CRS_HOME/root.sh

METHOD 2 - RE-INSTALL CRS
-------------------------

The only safe and sure way to re-create the voting disk in 10gR1 is to reinstall
CRS. The deinstallation procedure is the only way we have to undo the CSS fatal
mode, which in turn makes it safe to reinstall.

Only do this after consulting with Oracle Support Services and there is no reasonable
way to fix the inconsistency.

Once you re-install CRS, you can restore OCR from one of its automatic backups.
Then, you can back up the voting disk, and also back it up again after any node
addition or deletion operations.


1. Use Note 239998.1 to completely remove the CRS installation.
2. Re-install CRS
3. Run the CRS root.sh as prompted at the end of the CRS install.
4. Run the root.sh in the database $ORACLE_HOME to re-run VIPCA. This will re-
create the VIP, GSD, and ONS resources.
5. Use NETCA to re-add any listeners.
6. Add databases and instances with SRVCTL, syntax is in Note 259301.1

ORACLE RAC 的一些備份
ORACLE會自動對CRS的配置資訊OCR盤進行自動備份
可以透過orcconfig -showbackup檢視備份資訊

對於中裁盤votingdisk可以使用DD命令備份檔案系統
可以透過crsctl query css votedisk
備份
dd if=/dev/votedisk of=/oraclebackup/vote_disk
恢復只要
dd if=/oraclebackup/vote_disk of /dev/votedisk

ASM例項的備份
可以只備份ASM的$ORACLE_HOME

OCR的備份可以透過如下命令
ocrconfig -export myfile
orcdump -backupfile myfile
恢復可以用如下命令
crs stop
ocrconfig -import myfile


OCR和Voting disk的管理
ocr和vote disk雖然是很小的空間,但是對RAC來所非常的重要呀,

繼給客戶排除好故障以後,也順便給ocr和vote disk做好了備份,這裡特別整理了這兩者的管理和備份方法,特記錄下來給網友們參考

Voting disk記錄節點成員資訊,如包含哪些節點成員、節點的新增刪除資訊記錄,大小為20MB
檢視voting disk位置:crsctl query css votedisk
$ crsctl query css votedisk
0. 0 /dev/ocrbackup
如果CRS安裝過程失敗,需要重新安裝則需要初始化voting disk盤,可用DD或重建卷
dd if=/dev/zero of=/dev/ocrbackup bs=8192 count=2560
備份votedisk: dd if=/dev/ocrbackup of=/tmp/votedisk.bak
恢復votedisk: dd if=/tmp/votedisk.bak of=/dev/ocrbackup
新增voting disk映象盤:
crsctl add css votedisk /dev/ocrbackup -force
刪除voting disk映象盤
crsctl delete css votedisk /dev/ocrbackup -force


OCR方面
OCR記錄節點成員的配置資訊,如database、ASM、instance、listener、VIP等CRS資源的配置資訊,可儲存於裸裝置或者群集檔案系統上,推薦設定大小為100MB
如以RAW的方式,則劃分一個RAW,例如:/dev/ocrbackup
如果CRS安裝過程失敗,需要重新安裝則需要初始化OCR盤(RAW方式),可用DD或重建卷
dd if=/dev/zero of=/dev/ocrbackup bs=8192 count=12800
Oracle每四個小時自動發起備份,並儲存三個版本,但只存在一個節點上
$ ocrconfig -showbackup


恢復OCR:ocrconfig -restore /u01/app/oracle/product/10.2.0/crs/cdata/crs/backup01.ocr

OCR手動匯出:ocrconfig -export /tmp/ocr_bak
OCR手動匯入:ocrconfig -import /tmp/ocr_bak

新增OCR映象盤:
1.用crsctl stop crs停掉CRS服務
2.建立用於映象OCR的RAW裝置,比如為:/dev/rhdisk6
3.用ocrconfig –export 匯出OCR的資訊
4.編輯/etc/oracle/ocr.loc檔案,新增ocrmirrorconfig_loc行
$ cat ocr.loc
ocrconfig_loc=/dev/ocrbackup
ocrmirrorconfig_loc=/dev/ocrmirror
local_only=FALSE
5.用ocrconfig –import 匯入OCR的資訊
6.檢查ocr設定資訊
$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 103724
Used space (kbytes) : 3824
Available space (kbytes) : 99900
ID : 1086971606
Device/File Name : /dev/ocrbackup Device/File integrity check succeeded
Device/File Name : /dev/ocrmirror Device/File integrity check succeeded
Cluster registry integrity check succeeded
7.最後用crsctl start crs啟動CRS服務


[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/70612/viewspace-1032370/,如需轉載,請註明出處,否則將追究法律責任。

相關文章