ORACLE 9I RAC IPC Send timeout detected問題處理
最近幫使用者IBM P780擴容,
擴容完成之後,發現核心資料庫啟動第一個節點正常,但是啟動第二個節點時,就特別慢,花了一個小時資料庫才OPEN,好像是使用者非正常關閉資料庫導致的;
Sun Aug 24 12:30:20 2014
SMON: enabling cache recovery
Sun Aug 24 12:30:20 2014
ARC0: Evaluating archive log 5 thread 2 sequence 9860
ARC0: Unable to archive log 5 thread 2 sequence 9860
Log actively being archived by another process
Sun Aug 24 12:30:21 2014
ARC1: Completed archiving log 5 thread 2 sequence 9860
Sun Aug 24 12:37:01 2014
Successfully onlined Undo Tablespace 6.
Sun Aug 24 12:37:01 2014
SMON: enabling tx recovery
Sun Aug 24 12:37:28 2014
Database Characterset is US7ASCII
Sun Aug 24 13:00:04 2014
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Aug 24 13:07:10 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_925808.trc.
Sun Aug 24 13:08:51 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_966808.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_995504.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_970794.trc.
Sun Aug 24 13:09:51 2014
Completed: alter database open
到週一早晨使用者反映說ORACLE心跳網路通訊資料量比平時大很多,我們監控了一段時間發現心跳網路速率在1M~4MB/s因為是10GB光纖網路,我感覺這個速率也是正常的,結果到了10點多使用者就說節點2DOWN掉了。
ARC0: Completed archiving log 5 thread 2 sequence 9864
Mon Aug 25 10:07:15 2014
IPC Send timeout detected. Sender ospid 2961608
Mon Aug 25 10:07:17 2014
Communications reconfiguration: instance 0
Mon Aug 25 10:07:17 2014
Trace dumping is performing id=[cdmp_20140825100717]
Mon Aug 25 10:09:02 2014
Waiting for clusterware split-brain resolution
Mon Aug 25 10:19:02 2014
USER: terminating instance due to error 481
Mon Aug 25 10:19:02 2014
Errors in file /oracle/app/oracle/admin/xxxx/bdump/xxxx2_lmon_1118210.trc:
ORA-29740: evicted by member 1, group incarnation 7
Mon Aug 25 10:19:35 2014
Trace dumping is performing id=[cdmp_20140825101905]
DIAG: terminating instance due to error 1092
Instance terminated by DIAG, pid = 1179834
檢查AIX系統ERRPT報錯
# errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
173C787F 0825150514 I S topsvcs Possible malfunction on local adapter
173C787F 0824122814 I S topsvcs Possible malfunction on local adapter
173C787F 0824122714 I S topsvcs Possible malfunction on local adapter
1BA7DF4E 0824111914 P S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
發現有三條 Possible malfunction on local adapter報錯。
# errpt -aj 173C787F
---------------------------------------------------------------------------
LABEL: TS_LOC_DOWN_ST
IDENTIFIER: 173C787F
Date/Time: Mon Aug 25 15:05:54 BEIST 2014
Sequence Number: 43985
Machine Id: 00F880244C00
Node Id: yxyxyx2
Class: S
Type: INFO
Resource Name: topsvcs
Description
Possible malfunction on local adapter
Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Recommended Actions
Verify adapter configuration
Verify network connectivity
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.39,6703
ERROR ID
6zV5DL.G/iyH/tE.0J66e.1...................
REFERENCE CODE
Adapter interface name
en13
Adapter offset
2
Adapter IP address
xx.xx.xx.xx
錯誤並不明顯,如果是網路有問題,會有明顯的網路卡錯誤EN13是其他兩個網路卡做ETHERCHANNEL繫結生成的,但是這個並不明顯,沒有明顯網路斷開的報錯。
經過分析,有可能是網路卡、光纖、或者交換機埠問題。
於是等到下班以後把所有光纖光纖換掉、光纖口換掉,
重啟資料庫,很順利就起來了,
第二天中午有觀察了一下,一切正常。
問題很奇怪!
擴容完成之後,發現核心資料庫啟動第一個節點正常,但是啟動第二個節點時,就特別慢,花了一個小時資料庫才OPEN,好像是使用者非正常關閉資料庫導致的;
Sun Aug 24 12:30:20 2014
SMON: enabling cache recovery
Sun Aug 24 12:30:20 2014
ARC0: Evaluating archive log 5 thread 2 sequence 9860
ARC0: Unable to archive log 5 thread 2 sequence 9860
Log actively being archived by another process
Sun Aug 24 12:30:21 2014
ARC1: Completed archiving log 5 thread 2 sequence 9860
Sun Aug 24 12:37:01 2014
Successfully onlined Undo Tablespace 6.
Sun Aug 24 12:37:01 2014
SMON: enabling tx recovery
Sun Aug 24 12:37:28 2014
Database Characterset is US7ASCII
Sun Aug 24 13:00:04 2014
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Aug 24 13:07:10 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_925808.trc.
Sun Aug 24 13:08:51 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_966808.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_995504.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_970794.trc.
Sun Aug 24 13:09:51 2014
Completed: alter database open
到週一早晨使用者反映說ORACLE心跳網路通訊資料量比平時大很多,我們監控了一段時間發現心跳網路速率在1M~4MB/s因為是10GB光纖網路,我感覺這個速率也是正常的,結果到了10點多使用者就說節點2DOWN掉了。
ARC0: Completed archiving log 5 thread 2 sequence 9864
Mon Aug 25 10:07:15 2014
IPC Send timeout detected. Sender ospid 2961608
Mon Aug 25 10:07:17 2014
Communications reconfiguration: instance 0
Mon Aug 25 10:07:17 2014
Trace dumping is performing id=[cdmp_20140825100717]
Mon Aug 25 10:09:02 2014
Waiting for clusterware split-brain resolution
Mon Aug 25 10:19:02 2014
USER: terminating instance due to error 481
Mon Aug 25 10:19:02 2014
Errors in file /oracle/app/oracle/admin/xxxx/bdump/xxxx2_lmon_1118210.trc:
ORA-29740: evicted by member 1, group incarnation 7
Mon Aug 25 10:19:35 2014
Trace dumping is performing id=[cdmp_20140825101905]
DIAG: terminating instance due to error 1092
Instance terminated by DIAG, pid = 1179834
檢查AIX系統ERRPT報錯
# errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
173C787F 0825150514 I S topsvcs Possible malfunction on local adapter
173C787F 0824122814 I S topsvcs Possible malfunction on local adapter
173C787F 0824122714 I S topsvcs Possible malfunction on local adapter
1BA7DF4E 0824111914 P S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
發現有三條 Possible malfunction on local adapter報錯。
# errpt -aj 173C787F
---------------------------------------------------------------------------
LABEL: TS_LOC_DOWN_ST
IDENTIFIER: 173C787F
Date/Time: Mon Aug 25 15:05:54 BEIST 2014
Sequence Number: 43985
Machine Id: 00F880244C00
Node Id: yxyxyx2
Class: S
Type: INFO
Resource Name: topsvcs
Description
Possible malfunction on local adapter
Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Recommended Actions
Verify adapter configuration
Verify network connectivity
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.39,6703
ERROR ID
6zV5DL.G/iyH/tE.0J66e.1...................
REFERENCE CODE
Adapter interface name
en13
Adapter offset
2
Adapter IP address
xx.xx.xx.xx
錯誤並不明顯,如果是網路有問題,會有明顯的網路卡錯誤EN13是其他兩個網路卡做ETHERCHANNEL繫結生成的,但是這個並不明顯,沒有明顯網路斷開的報錯。
經過分析,有可能是網路卡、光纖、或者交換機埠問題。
於是等到下班以後把所有光纖光纖換掉、光纖口換掉,
重啟資料庫,很順利就起來了,
第二天中午有觀察了一下,一切正常。
問題很奇怪!
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/527318/viewspace-1262820/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【問題處理】IPC Send timeout detected
- IPC Send timeout detected
- IPC Send timeout detected. Receiver ospid 25822
- 【RAC】如何診斷RAC資料庫上的“IPC Send timeout”問題資料庫
- 如何診斷RAC資料庫上的“IPC Send timeout”問題?資料庫
- IPC Send timeout故障現象
- 【ASK_ORACLE】Oracle RAC報錯“ipc send timeout”的原因以及解決辦法Oracle
- IPC send completion sync
- RAC磁碟頭損壞問題處理
- redhat7 搭建oracle 11g RAC 問題與處理RedhatOracle
- 【故障-ORACLE】rdbms ipc message timeout解釋Oracle
- Oracle啟動問題處理Oracle
- Oracle壞塊問題處理Oracle
- 【故障處理】DBCA建庫詭異問題處理--rac環境不能建立rac庫
- oracle 9i dbca Ora-00603, ORA-27300 問題處理Oracle
- SQLNET.RECV_TIMEOUT & SQLNET.SEND_TIMEOUTSQL
- crontab對oracle操作問題處理Oracle
- oracle SP2-問題處理Oracle
- 關於Oracle RAC 叢集日誌無法輪循的問題處理Oracle
- ORACLE問題處理十個指令碼Oracle指令碼
- Oracle delete 高水位線處理問題Oracledelete
- 某省ORACLE10G RAC資料庫CRS啟動失敗問題處理Oracle資料庫
- oracle 10g cluster rac vip始終在節點2的問題處理Oracle 10g
- Oracle 10g RAC故障處理Oracle 10g
- Oracle日常問題處理ORA-04031Oracle
- oracle taf unknown 問題處理過程Oracle
- Oracle10g RAC 兩個監聽狀態為offline 問題處理Oracle
- Switch to short timeout for ipc polling
- RAC 腦裂 處理機制 Oracle RAC Brain SplitOracleAI
- 排查 “Detected Tx Unit Hang”問題
- oracle 11gR2 rac 兩節點有一個節點down掉問題處理Oracle
- oracle 10g rac+asm 歸檔路徑磁碟組空間滿問題處理Oracle 10gASM
- oracle rac修改ip的處理辦法Oracle
- 【原創】Oracle RAC故障分析與處理Oracle
- ORACLE RAC spfile異常處理辦法Oracle
- Oracle 9i RAC on PowerHA5.5Oracle
- Oracle 9i RAC 互聯效能Oracle
- ORACLE懸疑分散式事務問題處理Oracle分散式