oracle rac 其中第一個節點監聽偶爾中斷處理案例

paulyibinyi發表於2011-07-06

問題簡述:2010-5-31日上午,p730a節點listener_p730a資源offline,導致

應用切換到p730b節點上,後來p730a節點每隔四五天左右,監聽就會出現偶爾

中斷現象。

作業系統:AIX 6100

資料庫:oracle 10.2.0.5 rac

儲存: emc-cx4-960

日上午,p730a節點listener_p730a資源offline,導致

應用切換到p730b節點上,後來p730a節點監聽每隔四五天左右會出現offline現象,需要手工去啟動p730a節點監聽。

3          處理過程

1.   透過以下方法,可以暫時解決這個問題

  Srvctl stop listener –n p730a

  Srvctl start listener –n p730a

2.   檢查作業系統日誌

最新日誌只有到2011.5.18號,後來作業系統沒有任何相關報錯。

3.   檢視p730a節點crs日誌

2011-05-30 09:19:37.294: [  CRSAPP][11834]32CheckResource error for ora.p730b.vip error code = 1

2011-05-30 09:19:37.308: [  CRSRES][11834]32In stateChanged, ora.p730b.vip target is ONLINE

2011-05-30 09:19:37.309: [  CRSRES][11834]32ora.p730b.vip on p730a went OFFLINE unexpectedly

2011-05-30 09:19:37.309: [  CRSRES][11834]32StopResource: setting CLI values

2011-05-30 09:19:37.321: [  CRSRES][11834]32Attempting to stop `ora.p730b.vip` on member `p730a`

2011-05-30 09:19:37.689: [  CRSRES][11834]32Stop of `ora.p730b.vip` on member `p730a` succeeded.

2011-05-30 09:19:37.690: [  CRSRES][11834]32ora.p730b.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0

2011-05-30 09:19:37.692: [  CRSRES][11834]32ora.p730b.vip failed on p730a relocating.

2011-05-30 09:19:37.755: [  CRSRES][11834]32Attempting to start `ora.p730b.vip` on member `p730b`

2011-05-30 09:19:44.705: [  CRSRES][11834]32Start of `ora.p730b.vip` on member `p730b` failed.

2011-05-30 09:21:08.879: [  CRSAPP][11841]32CheckResource error for ora.p730a.vip error code = 1

2011-05-30 09:21:08.883: [  CRSRES][11841]32In stateChanged, ora.p730a.vip target is ONLINE

2011-05-30 09:21:08.883: [  CRSRES][11841]32ora.p730a.vip on p730a went OFFLINE unexpectedly

2011-05-30 09:21:08.883: [  CRSRES][11841]32StopResource: setting CLI values

2011-05-30 09:21:08.903: [  CRSRES][11841]32Attempting to stop `ora.p730a.vip` on member `p730a`

2011-05-30 09:21:09.280: [  CRSRES][11841]32Stop of `ora.p730a.vip` on member `p730a` succeeded.

2011-05-30 09:21:09.280: [  CRSRES][11841]32ora.p730a.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0

2011-05-30 09:21:09.283: [  CRSRES][11841]32ora.p730a.vip failed on p730a relocating.

2011-05-30 09:21:09.321: [  CRSRES][11841]32StopResource: setting CLI values

2011-05-30 09:21:09.330: [  CRSRES][11841]32Attempting to stop `ora.p730a.LISTENER_P730A.lsnr` on member `p730a`

2011-05-30 09:22:26.511: [  CRSRES][11841]32Stop of `ora.p730a.LISTENER_P730A.lsnr` on member `p730a` succeeded.

2011-05-30 09:22:26.527: [  CRSRES][11841]32Attempting to start `ora.p730a.vip` on member `p730b`

2011-05-30 09:22:28.006: [  CRSRES][11841]32Start of `ora.p730a.vip` on member `p730b` succeeded.

可以看到p730a節點監聽offline主要原因是由於p730a節點 vip offline,然後p730a節點的vip資源自動切換到p370b節點。

 

4.   開啟debugvip資源進行trace

 crsctl debug log res "ora.p730a.vip:5"

        產生的trace檔案放在$ORA_CRS_HOME/log/p730a/目錄下

5.      根據metalink文件ID1297867.1

    根據以下步驟:修改racgvip指令碼

1. Stop all node applications.
% srvctl stop nodeapps -n

2. Backup then Modify the racgvip script. .

Change:
# timeout of ping in number of loops (1 sec)
PING_TIMEOUT=" -c 1 -w 1"

To:
# timeout of ping in number of loops (3 sec)
PING_TIMEOUT=" -c 1 -w 3"

3. Start the node applications and other necessary resources.
% srvctl start nodeapps -n

6.   關閉debug

crsctl debug log res "ora.p730a.vip:0"

 後來打電話給客戶,客戶說透過修改racgvip指令碼後, p730a監聽中斷問題沒有再出現過。

4          結論和建議

對於比較異常的crs問題,可以用debug來跟蹤產生log,從而確定問題所在。

開啟debug


crsctl debug log res "ora.p730a.vip:5"
crsctl debug log res "ora.p730b.vip:5"


  關閉debug

 

crsctl debug log res "ora.p730a.vip:0"
crsctl debug log res "ora.p730b.vip:0"

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7199859/viewspace-701543/,如需轉載,請註明出處,否則將追究法律責任。

相關文章