Oracle升級導致ocr檔案損壞一例

wenaini發表於2009-03-16

所以說對於生產庫,備份重要啊。。。

[@more@]

備用的叢集rac發生意外當機,一邊查硬體原因一邊查oracle這邊的問題。發現資料庫是10.2.0.3但是crs是10.2.0.1,雖然不能斷定是這個原因,但是排除法嘛,先試試看咯,結果試了一身冷汗。。。

過程一路順利,啟動crs,所有節點服務也正常起來了,但是遲遲看不到資料庫起來,crs_stat檢查發現所有的例項和service都沒起來。手工是用sqlplus 啟動正常,資料庫也可以開啟。但是使用srvctl啟動就報下面的錯誤:

[oracle@bj15-75 ~]$ srvctl start database -d membj
PRKP-1001 : Error starting instance membj1 on node bj15-74
CRS-0212: Resource 'ora.membj.membj1.inst' is not registered.
PRKP-1001 : Error starting instance membj2 on node bj15-75
CRS-0212: Resource 'ora.membj.membj2.inst' is not registered.
PRKP-1001 : Error starting instance membj3 on node bj15-77
CRS-0212: Resource 'ora.membj.membj3.inst' is not registered.

crs_stat的輸出明明顯示有這些資源阿。。。。心裡一緊,馬上查了下metalink,發現了一個說法:

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
This problem can occur on any platform.

Symptoms
The database and/or instances is not starting up using the srvctl command, reporting the following errors when invoked, ex:

PRKP-1001: Error starting instance inslo1 on node rias-ins-dba01
CRS-0212: Resource 'ora.inslo.inslo1.inst' is not registered.

The errors seems like the that the clusterware does not know about the resource, because it is not registered in the OCR.
Cause
Issue is caused due to the corruption of the database and/or instances entries corruption in the OCR.
The following output shows that the resource is not registered in the clusterware, but at the same time the CRS can get its status using the crs_stat command, so it is discouraging any updating, starting or stopping for this resource.

PRKP-1001: Error starting instance inslo1 on node rias-ins-dba01
CRS-0212: Resource 'ora.inslo.inslo1.inst' is not registered.
Solution
Because the of the corruption of this resource entry in the OCR you can simply remove the the resource with all of its corrupted information from the OCR using the "srvctl remove" command for this resource, then proceed with adding the resource again which is going to make it work back again.

1. Removing the resource:
srvctl remove database -d

2. Add the resources again:
srvctl add database -d -o
srvctl add instance -d -i -n

說我的ocr升級升壞了?那就先恢復ocr咯,於是用ocrconfig恢復了最近升級前的ocr備份,也就是這個有問題的備份導致我走了很多彎路,還原後的ocr檔案還是出現了上面的問題,致使我認為備份都出了問題。

看來只好試試看文件中的辦法了。但是執行完srvctl remove database後,仍然發現crs_stat中還是有原來的資料db資源。嘗試新增新的db資源失敗,報已經存在,看來ocr壞的比較嚴重了,沒法透過常規刪除資訊了。只能試試看dd ocr檔案出來修改。但是條目太多,上次被我僥倖修改成功了,這次卻總是失敗。於是想辦法刪除例項服務試試看,前2個節點都成功了,到最後一個報錯:

[oracle@bj15-74 ~]$ srvctl remove instance -i membj1 -d membj
Remove instance membj1 from the database membj? (y/[n]) y
PRKP-1075 : Instance membj1 is the last preferred instance for service membjapp.

嘗試刪除服務:

[oracle@bj15-74 ~]$ srvctl remove service -s membjapp -d membj
service membjapp is running

還是失敗- -,可是這個服務明明沒有起來麼。。。。

沒辦法了,最後只能想還原上個禮拜的ocr備份碰碰運氣了。不行只能重灌crs了。。。。。

還原後,我嘗試先刪除前2個節點的例項和服務,然後重新註冊了這2個節點的服務,之後再去刪除第三個節點的例項,居然成功了,之後使用dbca重新配置了下service,居然成功了!

看來除了ocr的備份的問題,ocr資訊的寫入使用srvctl始終還是不如dbca好啊,已經第二次遇上資訊清除不掉了。。。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/79686/viewspace-1018846/,如需轉載,請註明出處,否則將追究法律責任。

相關文章