花費一天一夜解決RAC故障(二)

anycall2010發表於2008-07-06

初步分析:

 

CRS出問題了,重新配置CRS重新配置CRS,然後重新配置節點?

 

OCFS2檔案系統入手:

 

rac1-> more /etc/ocfs2/cluster.conf

node:

        ip_port = 7777

        ip_address = 192.168.0.3

        number = 0

        name = rac1

        cluster = ocfs2

 

node:

        ip_port = 7777

        ip_address = 192.168.0.4

        number = 1

        name = rac2

        cluster = ocfs2

 

cluster:

        node_count = 2

        name = ocfs2

 

 

rac2-> more /etc/ocfs2/cluster.conf

node:

        ip_port = 7777

        ip_address = 192.168.0.3

        number = 0

        name = rac1

        cluster = ocfs2

 

node:

        ip_port = 7777

        ip_address = 192.168.0.4

        number = 1

        name = rac2

        cluster = ocfs2

 

cluster:

        node_count = 2

        name = ocfs2

 

兩個節點沒問題 !繼續排查。。。。

檢查心跳:

[root@rac1 ~]# /etc/init.d/o2cb status

Module "configfs": Loaded

Filesystem "configfs": Mounted

Module "ocfs2_nodemanager": Loaded

Module "ocfs2_dlm": Loaded

Module "ocfs2_dlmfs": Loaded

Filesystem "ocfs2_dlmfs": Mounted

Checking O2CB cluster ocfs2: Online

  Heartbeat dead threshold: 61

  Network idle timeout: 10000

  Network keepalive delay: 5000

  Network reconnect delay: 2000

Checking O2CB heartbeat: Active

心跳也沒問題 !第2個節點同上!

 

使用命令對兩個節點進行檔案掛載:

mount -t ocfs2 -o datavolume,nointr /dev/sdb1 /ocfs

 

重新檢查CRS

 

rac1-> /u01/oracle/product/10.2.0/crs_1/bin/cluvfy stage -post crsinst -n rac1,rac2

 

Performing post-checks for cluster services setup

 

Checking node reachability...

Node reachability check passed from node "rac1".

 

 

Checking user equivalence...

User equivalence check passed for user "oracle".

 

Checking Cluster manager integrity...

 

 

Checking CSS daemon...

Daemon status check passed for "CSS daemon".

 

Cluster manager integrity check passed.

 

Checking cluster integrity...

 

 

Cluster integrity check passed

 

 

Checking OCR integrity...

 

Checking the absence of a non-clustered configuration...

All nodes free of non-clustered, local-only configurations.

 

Uniqueness check for OCR device passed.

 

Checking the version of OCR...

OCR of correct Version "2" exists.

 

Checking data integrity of OCR...

Data integrity check for OCR passed.

 

OCR integrity check passed.

 

Checking CRS integrity...

 

Checking daemon liveness...

Liveness check passed for "CRS daemon".

 

Checking daemon liveness...

Liveness check passed for "CSS daemon".

 

Checking daemon liveness...

Liveness check passed for "EVM daemon".

 

Checking CRS health...

CRS health check passed.

 

CRS integrity check passed.

 

Checking node application existence...

 

 

Checking existence of VIP node application (required)

Check passed.

 

Checking existence of ONS node application (optional)

Check passed.

 

Checking existence of GSD node application (optional)

Check passed.

 

 

Post-check for cluster services setup was successful.

 

說明CRS執行正常了

 

重新檢查狀態:

rac1->  crs_stat -t

Name           Type           Target    State     Host

------------------------------------------------------------

ora.dbvdb.db   application    ONLINE    UNKNOWN   rac1

ora....b1.inst application    ONLINE    OFFLINE

ora....b2.inst application    ONLINE    OFFLINE

ora....SM1.asm application    ONLINE    UNKNOWN   rac1

ora....C1.lsnr application    ONLINE    UNKNOWN   rac1

ora.rac1.gsd   application    ONLINE    UNKNOWN   rac1

ora.rac1.ons   application    ONLINE    UNKNOWN   rac1

ora.rac1.vip   application    ONLINE    ONLINE    rac1

ora....SM2.asm application    ONLINE    UNKNOWN   rac2

ora....C2.lsnr application    ONLINE    UNKNOWN   rac2

ora.rac2.gsd   application    ONLINE    UNKNOWN   rac2

ora.rac2.ons   application    ONLINE    UNKNOWN   rac2

ora.rac2.vip   application    ONLINE    ONLINE    rac2

和起初的狀態好像不太一樣了。。。。。

對節點服務進行檢查:

rac1-> srvctl status nodeapps -n rac1

VIP is running on node: rac1

GSD is not running on node: rac1

Listener is not running on node: rac1

ONS daemon is not running on node: rac1

服務依舊沒有啟動。難道ASM出現問題?下步考慮解決GSD服務啟動問題。。。。

rac1-> srvctl start asm -n rac1

PRKS-1009 : Failed to start ASM instance "+ASM1" on node "rac1", [PRKS-1009 : Failed to start ASM instance "+ASM1" on node "rac1", [CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.ASM1.asm' has placement error.]]

  [PRKS-1009 : Failed to start ASM instance "+ASM1" on node "rac1", [CRS-1028: Dependency analysis failed because of:

CRS-0223: Resource 'ora.rac1.ASM1.asm' has placement error.]]

ASM手工啟動不了啊?看來是ASM的問題,

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8334342/viewspace-373143/,如需轉載,請註明出處,否則將追究法律責任。

相關文章