ORACLE 9I RAC IPC Send timeout detected問題處理

wtjiang2008發表於2014-09-03
最近幫使用者IBM P780擴容,
擴容完成之後,發現核心資料庫啟動第一個節點正常,但是啟動第二個節點時,就特別慢,花了一個小時資料庫才OPEN,好像是使用者非正常關閉資料庫導致的;
Sun Aug 24 12:30:20 2014
SMON: enabling cache recovery
Sun Aug 24 12:30:20 2014
ARC0: Evaluating archive   log 5 thread 2 sequence 9860
ARC0: Unable to archive log 5 thread 2 sequence 9860
      Log actively being archived by another process
Sun Aug 24 12:30:21 2014
ARC1: Completed archiving  log 5 thread 2 sequence 9860
Sun Aug 24 12:37:01 2014
Successfully onlined Undo Tablespace 6.
Sun Aug 24 12:37:01 2014
SMON: enabling tx recovery
Sun Aug 24 12:37:28 2014
Database Characterset is US7ASCII
Sun Aug 24 13:00:04 2014
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Aug 24 13:07:10 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_925808.trc.
Sun Aug 24 13:08:51 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_966808.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_995504.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_970794.trc.
Sun Aug 24 13:09:51 2014
Completed: alter database open
到週一早晨使用者反映說ORACLE心跳網路通訊資料量比平時大很多,我們監控了一段時間發現心跳網路速率在1M~4MB/s因為是10GB光纖網路,我感覺這個速率也是正常的,結果到了10點多使用者就說節點2DOWN掉了。
ARC0: Completed archiving  log 5 thread 2 sequence 9864
Mon Aug 25 10:07:15 2014
IPC Send timeout detected. Sender ospid 2961608
Mon Aug 25 10:07:17 2014
Communications reconfiguration: instance 0
Mon Aug 25 10:07:17 2014
Trace dumping is performing id=[cdmp_20140825100717]
Mon Aug 25 10:09:02 2014
Waiting for clusterware split-brain resolution
Mon Aug 25 10:19:02 2014
USER: terminating instance due to error 481
Mon Aug 25 10:19:02 2014
Errors in file /oracle/app/oracle/admin/xxxx/bdump/xxxx2_lmon_1118210.trc:
ORA-29740: evicted by member 1, group incarnation 7
Mon Aug 25 10:19:35 2014
Trace dumping is performing id=[cdmp_20140825101905]
DIAG: terminating instance due to error 1092
Instance terminated by DIAG, pid = 1179834
檢查AIX系統ERRPT報錯
# errpt |more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
173C787F   0825150514 I S topsvcs        Possible malfunction on local adapter
173C787F   0824122814 I S topsvcs        Possible malfunction on local adapter
173C787F   0824122714 I S topsvcs        Possible malfunction on local adapter
1BA7DF4E   0824111914 P S SRC            SOFTWARE PROGRAM ERROR
CB4A951F   0824111914 I S SRC            SOFTWARE PROGRAM ERROR
CB4A951F   0824111914 I S SRC            SOFTWARE PROGRAM ERROR
發現有三條 Possible malfunction on local adapter報錯。
# errpt -aj 173C787F
---------------------------------------------------------------------------
LABEL:          TS_LOC_DOWN_ST
IDENTIFIER:     173C787F


Date/Time:       Mon Aug 25 15:05:54 BEIST 2014
Sequence Number: 43985
Machine Id:      00F880244C00
Node Id:         yxyxyx2
Class:           S
Type:            INFO
Resource Name:   topsvcs         


Description
Possible malfunction on local adapter


Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured


Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured


        Recommended Actions
        Verify adapter configuration
        Verify network connectivity


Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.39,6703             
ERROR ID 
6zV5DL.G/iyH/tE.0J66e.1...................
REFERENCE CODE
                                          
Adapter interface name
en13
Adapter offset
           2
Adapter IP address
xx.xx.xx.xx

錯誤並不明顯,如果是網路有問題,會有明顯的網路卡錯誤EN13是其他兩個網路卡做ETHERCHANNEL繫結生成的,但是這個並不明顯,沒有明顯網路斷開的報錯。
經過分析,有可能是網路卡、光纖、或者交換機埠問題。
於是等到下班以後把所有光纖光纖換掉、光纖口換掉,
重啟資料庫,很順利就起來了,
第二天中午有觀察了一下,一切正常。
問題很奇怪!

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/527318/viewspace-1262820/,如需轉載,請註明出處,否則將追究法律責任。

相關文章