Ocfs2檔案系統常見問題解決方法

tolywang發表於2009-10-02


現象一:
mount -t ocfs2 -o datavolume,nointr /dev/sdb1 /webdata

mount.ocfs2: Transport endpoint is not connected while mounting /dev/sdb1 on /webdata. Check 'dmesg' for more information on this error.

 

 

可能問題:

1:防火牆開啟著,沒有關閉,遮蔽了心跳埠

2:各個節點的/etc/init.d/o2cb configure值配置不同導致。

IXDBA.NET社群論壇

3:一個節點處於掛載中,另外一個節點剛剛配置好,重啟了ocfs2服務導致,此時只要把連個節點都重啟一下服務即可完成掛載。

4:SElinux沒有關閉導致。

下面是一個案例:

[root@test02 ~]# mount -t ocfs2 /dev/vg_ocfs/lv_u02 /u02
mount.ocfs2: Transport endpoint is not connected while mounting /dev/vg_ocfs/lv_u02 on /u02. Check 'dmesg' for more information on this error.

出現這個錯誤是由於配置OCFS時O2CB_HEARTBEAT_THRESHOLD各節點的值不一樣導致的。我用/etc/init.d/o2cb configure時其實各個節點的值已經都一樣了,不過第一個節點忘了重啟o2cb,結果查了好久才發現。接下當然是把已經MOUNT的OCFS目錄UMOUNT掉,結果又出錯了:

[root@test01 u02]# umount -f /u02
umount2: Device or resource busy
umount: /u02: device is busy
umount2: Device or resource busy
umount: /u02: device is busy

這時候應該用/etc/init.d/ocfs2 stop和/etc/init.d/o2cb stop停掉OCFS2和O2CB再UMOUJNT才行,然後把OCFS2和O2CB啟動以後其他節點就可以順利MOUNT OCFS了。

 

現象二:
# /etc/init.d/o2cb online ocfs2

Starting cluster ocfs2: Failed

Cluster ocfs2 created

o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration maches this machine's host name.

Stopping cluster ocfs2: OK

 主機名問題,檢查more /etc/ocfs2/cluster.conf以及/etc/hosts檔案資訊,修改相應的主機名即可

注意:為了保證開機能自動掛載ocfs2檔案系統,需要在/etc/fstab加入自動啟動選項後,必須在/etc/hosts中加入兩個節點的主機名和ip的對應解析,主機名和 /etc/ocfs2/cluster.conf配置的主機名一定要相同。

現象三

1: Starting O2CB cluster ocfs2: Failed
在安裝完ocfs2 後,配置o2cb 出錯:
[root@rac1 ocfs2]# /etc/init.d/o2cb configure
Configuring the O2CB driver.

This will configure the on-boot properties of the O2CB driver.
The following questions will determine whether the driver is loaded on
boot. The current values will be shown in brackets ('[]'). Hitting
without typing an answer will keep that current value. Ctrl-C
will abort.

Load O2CB driver on boot (y/n) [y]:
Cluster to start on boot (Enter "none" to clear) [ocfs2]:
Specify heartbeat dead threshold (>=7) [7]:
Writing O2CB configuration: OK
Starting O2CB cluster ocfs2: Failed
Cluster ocfs2 created
     o2cb_ctl: Configuration error discovered while populating cluster ocfs2. None of its nodes were considered local. A node is considered local when its node name in the configuration matches this machine's host name.
Stopping O2CB cluster ocfs2: OK

 

 

出現這中情況,應該是OCFS沒有配置,可以看一下,有一個圖形ocfs配置命令,首先要配置他,而且最好 用IP地址,不要用主機名!

也就是說,在啟動ocfs2時,ocfs節點配置檔案一定要配置好,如果沒有配置正確,就會報錯,同時在用圖形介面配置的時候,/etc/ocfs2/cluster.conf檔案最好是空檔案,要不然也會報錯!


現象四
掛載ocfs2檔案系統遇到
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"
mount -t ocfs2 -o datavolume /dev/sdb1 /u02/oradata/orcl
ocfs2_hb_ctl: Bad magic number in superblock while reading uuid
mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: "Operation not permitted"


這個問題是由於ocfs2檔案檔案系統分割槽沒有格式化引起的錯誤,在掛載ocfs2檔案系統之前,用於這個檔案系統的分割槽一定要進行格式化.

 

現象五:
 Configuration assistant "Oracle Cluster Verification Utility" failed
10g rac 安裝請教 oracle 10.2.0.1 solaris 5.9 雙機 安裝crs最後一步有錯,不知如何解決?

LOG 資訊:
INFO: Configuration assistant "Oracle Cluster Verification Utility" failed
-----------------------------------------------------------------------------
*** Starting OUICA ***
Oracle Home set to /orabase/product/10.2
Configuration directory is set to /orabase/product/10.2/cfgtoollogs. All xml files under the directory will be processed
INFO: The "/orabase/product/10.2/cfgtoollogs/configToolFailedCommands" script. contains all commands that failed, were skipped or were cancelled. This file may be used to run these configuration assistants outside of OUI. Note that you may have to update this script. with passwords (if any) before executing the same.
-----------------------------------------------------------------------------
SEVERE: OUI-25031:Some of the configuration assistants failed. It is strongly recommended that you retry the configuration assistants at this time. Not successfully running any "Recommended" assistants means your system will not be correctly configured.
1. Check the Details panel on the Configuration Assistant Screen to see the errors resulting in the failures.
2. Fix the errors causing these failures.
3. Select the failed assistants and click the 'Retry' button to retry them.
INFO: User Selected: Yes/OK

是vip地址沒有啟動造成的,建議在執行完orainstRoot.sh和root.sh命令後新開個視窗執行vipca,把crs服務都起來後再執行最後的verify步驟,可以嘗試一下。

去crs的bin目錄下執行crs_stat -t 看看服務是不是都起了,這種情況應該是vip沒起來。

 

現象六:
Failed to upgrade Oracle Cluster Registry configuration
在安裝CRS時,在第二個節點執行./root.sh時,出現如下提示,我在第一個節點執行正常.請大蝦指點一些,不勝感激!謝謝!
[root@RACtest2 crs]# ./root.sh
WARNING: directory '/app/oracle/product/10.2.0' is not owned by root
WARNING: directory '/app/oracle/product' is not owned by root
WARNING: directory '/app/oracle' is not owned by root
WARNING: directory '/app' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
PROT-1: Failed to initialize ocrconfig
Failed to upgrade Oracle Cluster Registry configuration

 

錯誤原因:

是因為安裝crs的裝置許可權有問題,例如我的裝置用raw來放置ocr和vote,此時要設定好這些硬體裝置以及連線的檔案的許可權,下面是我的環境:

[root@rac2 oracrs]#

lrwxrwxrwx  1 root root 13 Jan 27 12:49 ocr.crs -> /dev/raw/raw1

lrwxrwxrwx  1 root root 13 Jan 26 13:31 vote.crs -> /dev/raw/raw2

 

 

chown root:oinstall /dev/raw/raw1

chown root:oinstall /dev/raw/raw2

chmod 660 /dev/raw/raw1

chmod 660 /dev/raw/raw2

其中/dev/sdb1放置ocr,/dev/sdb2放置vote.

[root@rac2 oracrs]# service rawdevices reload

Assigning devices:

           /dev/raw/raw1  --&gt   /dev/sdb1

/dev/raw/raw1:  bound to major 8, minor 17

           /dev/raw/raw2  --&gt   /dev/sdb2

/dev/raw/raw2:  bound to major 8, minor 18

Done

然後再次執行就ok了.

[root@rac2 oracrs]# /oracle/app/oracle/product/crs/root.sh

WARNING: directory '/oracle/app/oracle/product' is not owned by root

WARNING: directory '/oracle/app/oracle' is not owned by root

Checking to see if Oracle CRS stack is already configured

 

Setting the permissions on OCR backup directory

Setting up NS directories

Oracle Cluster Registry configuration upgraded successfully

WARNING: directory '/oracle/app/oracle/product' is not owned by root

WARNING: directory '/oracle/app/oracle' is not owned by root

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

assigning default hostname rac1 for node 1.

assigning default hostname rac2 for node 2.

Successfully accumulated necessary OCR keys.

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node :

node 1: rac1 priv1 rac1

node 2: rac2 priv2 rac2

clscfg: Arguments check out successfully.

 

現象七
 Startup will be queued to init within 90 seconds
在安裝的a節點上執行root.sh如下:
[root@rac2 OraHome1]# ./root.sh
WARNING: directory '/oracle' is not owned by root
Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/oracle' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname rac1 for node 1.
assigning default hostname rac2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: rac1 vip1 rac1
node 2: rac2 vip2 rac2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
就掛起了,察看日誌ocrconfig_7758.log
::::::::::::::
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2006-10-29 22:47:09.537: [ OCRCONF][3086919360]ocrconfig starts...
2006-10-29 22:47:09.541: [ OCRCONF][3086919360]Upgrading OCR data
2006-10-29 22:47:09.649: [ OCRRAW][3086919360]propriogid:1: INVALID FORMAT
2006-10-29 22:47:09.660: [ OCRRAW][3086919360]ibctx:1:ERROR: INVALID FORMAT
2006-10-29 22:47:09.660: [ OCRRAW][3086919360]proprinit roblem reading the bootblock or superbloc 22

2006-10-29 22:47:09.661: [ default][3086919360]a_init:7!: Backend init unsuccessful : [22]
2006-10-29 22:47:09.662: [ OCRCONF][3086919360]Exporting OCR data to [OCRUPGRADEFILE]
2006-10-29 22:47:09.663: [ OCRAPI][3086919360]a_init:7!: Backend init unsuccessful : [33]
2006-10-29 22:47:09.663: [ OCRCONF][3086919360]There was no previous version of OCR. error:[PROC-33: Oracle Cluster Registry is not
configured]
2006-10-29 22:47:09.666: [ OCRRAW][3086919360]propriogid:1: INVALID FORMAT
2006-10-29 22:47:09.668: [ OCRRAW][3086919360]ibctx:1:ERROR: INVALID FORMAT
2006-10-29 22:47:09.668: [ OCRRAW][3086919360]proprinit roblem reading the bootblock or superbloc 22

2006-10-29 22:47:09.668: [ default][3086919360]a_init:7!: Backend init unsuccessful : [22]
2006-10-29 22:47:09.672: [ OCRRAW][3086919360]propriogid:1: INVALID FORMAT
2006-10-29 22:47:09.673: [ OCRRAW][3086919360]ibctx:1:ERROR: INVALID FORMAT
2006-10-29 22:47:09.673: [ OCRRAW][3086919360]proprinit roblem reading the bootblock or superbloc 22

首先檢查防火牆是否關閉:

檢查並關閉 UDP ICMP 拒絕

在 Linux 安裝期間,我指出不配置防火牆選項。預設情況下,配置防火牆的選項由安裝程式選擇。這使我吃了好幾次苦頭,因此我要仔細檢查防火牆選項是否未配置,並確保 udp ICMP 過濾已關閉。

如果 UDP ICMP 被防火牆阻塞或拒絕,Oracle 叢集件軟體將在執行幾分鐘之後崩潰。如果 Oracle 叢集程式出現故障,您的 _evmocr.log 檔案中將出現以下類似內容:

08/29/2005 22:17:19
 oac_init:2: Could not connect to server, clsc retcode = 9
 08/29/2005 22:17:19
 a_init:12!: Client init unsuccessful : [32]
 ibctx:1:ERROR: INVALID FORMAT
 proprinit:problem reading the bootblock or superbloc 22

 如果遇到此類錯誤,解決方法是移除 udp ICMP (iptables) 拒絕規則,或者只需關閉防火牆選項。之後,Oracle 叢集件軟體將開始正常工作,而不會崩潰。以下命令應該以 root 使用者帳戶的身份執行:

1.            檢查以確保防火牆選項關閉。如果防火牆選項已停用(如下面的示例所示),則不必繼續執行以下步驟。
# /etc/rc.d/init.d/iptables statusFirewall is stopped
.

2.            如果防火牆選項已啟用,您首先需要手動停用 UDP ICMP 拒絕:
# /etc/rc.d/init.d/iptables stopFlushing firewall rules: [ OK ]Setting chains to policy ACCEPT: filter [ OK ]Unloading iptables modules: [ OK ]

3.            然後,針對下一次伺服器重啟關閉 UDP ICMP 拒絕(應該始終被關閉):
# chkconfig iptables off

再次,如果不是上面的問題,

建議你先用dd把ocr和votedisk清除資訊,再給許可權,再執行 root.sh

看看我的第一個節點執行root.sh的反饋資訊:
[root@node1 crs10.2.0]# ./root.sh
WARNING: directory '/ora10g/product' is not owned by root
WARNING: directory '/ora10g' is not owned by root
Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/ora10g/product' is not owned by root
WARNING: directory '/ora10g' is not owned by root
assigning default hostname node1 for node 1.
assigning default hostname node2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: node1 privnode1 node1
node 2: node2 privnode2 node2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /ocfs/votedisk/votedisk.dat
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
node1
CSS is inactive on these nodes.
node2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
[root@node1 crs10.2.0]#

現象八:
 CRS-0215: Could not start resource 'ora.orcl.orcl1.inst'.
$ srvctl start instance -d orcl -i orcl1
PRKP-1001 : 在節點 znawdb1 上啟動例項 orcl1 時出錯
CRS-0215: Could not start resource 'ora.orcl.orcl1.inst'.

出現這個問題的原因是因為裝載資料庫資料檔案的ocfs2檔案系統,或者是ASM例項沒有掛載的原因,例如,我的環境是raw+ASM,當沒有掛載ASM例項,然後用srvctl start asm –n rac1啟動了ASM,但是這個啟動,並沒有掛載ASM,因此當再次執行srvctl start instance -d orcl -i orcl1的時候,就出現了CRS-0215: Could not start resource 'ora.orcl.orcl2.inst'.

此時執行ALTER DISKGROUP dgroup1 MOUNT;後再次執行srvctl start instance -d orcl -i orcl1,啟動成功.

 

現象九
CRS-0223: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has placement error.
錯誤一般提示如下:

[oracle@rac1 admin]$ srvctl start nodeapps -n rac1

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.gsd' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.vip' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.ons' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has placement error.

[oracle@rac1 admin]$ srvctl start nodeapps -n rac1

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.gsd' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.vip' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.ons' has placement error.

CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has placement error.

原因:

出現這個問題的原因,主要是資源佔用,也就是說兩個例項資源出現在同一個節點上,導致另外一個節點得不到需要得資源.

解決辦法:

出現這個問題,最好是手工用命令啟動相關的CRS服務,然後看看具體報什麼錯誤。
啟動服務得時候,一定要將所有節點服務關閉,然後先啟動一個節點,接著觀察crs_stat的狀態。當這個節點的所有服務正常後,再啟動另一個節點。最後透過crs_stat觀察全域性節點狀態.

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/35489/viewspace-616223/,如需轉載,請註明出處,否則將追究法律責任。

相關文章