ORACLE 9I RAC IPC Send timeout detected問題處理
最近幫使用者IBM P780擴容,
擴容完成之後,發現核心資料庫啟動第一個節點正常,但是啟動第二個節點時,就特別慢,花了一個小時資料庫才OPEN,好像是使用者非正常關閉資料庫導致的;
Sun Aug 24 12:30:20 2014
SMON: enabling cache recovery
Sun Aug 24 12:30:20 2014
ARC0: Evaluating archive log 5 thread 2 sequence 9860
ARC0: Unable to archive log 5 thread 2 sequence 9860
Log actively being archived by another process
Sun Aug 24 12:30:21 2014
ARC1: Completed archiving log 5 thread 2 sequence 9860
Sun Aug 24 12:37:01 2014
Successfully onlined Undo Tablespace 6.
Sun Aug 24 12:37:01 2014
SMON: enabling tx recovery
Sun Aug 24 12:37:28 2014
Database Characterset is US7ASCII
Sun Aug 24 13:00:04 2014
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Aug 24 13:07:10 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_925808.trc.
Sun Aug 24 13:08:51 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_966808.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_995504.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_970794.trc.
Sun Aug 24 13:09:51 2014
Completed: alter database open
到週一早晨使用者反映說ORACLE心跳網路通訊資料量比平時大很多,我們監控了一段時間發現心跳網路速率在1M~4MB/s因為是10GB光纖網路,我感覺這個速率也是正常的,結果到了10點多使用者就說節點2DOWN掉了。
ARC0: Completed archiving log 5 thread 2 sequence 9864
Mon Aug 25 10:07:15 2014
IPC Send timeout detected. Sender ospid 2961608
Mon Aug 25 10:07:17 2014
Communications reconfiguration: instance 0
Mon Aug 25 10:07:17 2014
Trace dumping is performing id=[cdmp_20140825100717]
Mon Aug 25 10:09:02 2014
Waiting for clusterware split-brain resolution
Mon Aug 25 10:19:02 2014
USER: terminating instance due to error 481
Mon Aug 25 10:19:02 2014
Errors in file /oracle/app/oracle/admin/xxxx/bdump/xxxx2_lmon_1118210.trc:
ORA-29740: evicted by member 1, group incarnation 7
Mon Aug 25 10:19:35 2014
Trace dumping is performing id=[cdmp_20140825101905]
DIAG: terminating instance due to error 1092
Instance terminated by DIAG, pid = 1179834
檢查AIX系統ERRPT報錯
# errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
173C787F 0825150514 I S topsvcs Possible malfunction on local adapter
173C787F 0824122814 I S topsvcs Possible malfunction on local adapter
173C787F 0824122714 I S topsvcs Possible malfunction on local adapter
1BA7DF4E 0824111914 P S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
發現有三條 Possible malfunction on local adapter報錯。
# errpt -aj 173C787F
---------------------------------------------------------------------------
LABEL: TS_LOC_DOWN_ST
IDENTIFIER: 173C787F
Date/Time: Mon Aug 25 15:05:54 BEIST 2014
Sequence Number: 43985
Machine Id: 00F880244C00
Node Id: yxyxyx2
Class: S
Type: INFO
Resource Name: topsvcs
Description
Possible malfunction on local adapter
Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Recommended Actions
Verify adapter configuration
Verify network connectivity
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.39,6703
ERROR ID
6zV5DL.G/iyH/tE.0J66e.1...................
REFERENCE CODE
Adapter interface name
en13
Adapter offset
2
Adapter IP address
xx.xx.xx.xx
錯誤並不明顯,如果是網路有問題,會有明顯的網路卡錯誤EN13是其他兩個網路卡做ETHERCHANNEL繫結生成的,但是這個並不明顯,沒有明顯網路斷開的報錯。
經過分析,有可能是網路卡、光纖、或者交換機埠問題。
於是等到下班以後把所有光纖光纖換掉、光纖口換掉,
重啟資料庫,很順利就起來了,
第二天中午有觀察了一下,一切正常。
問題很奇怪!
擴容完成之後,發現核心資料庫啟動第一個節點正常,但是啟動第二個節點時,就特別慢,花了一個小時資料庫才OPEN,好像是使用者非正常關閉資料庫導致的;
Sun Aug 24 12:30:20 2014
SMON: enabling cache recovery
Sun Aug 24 12:30:20 2014
ARC0: Evaluating archive log 5 thread 2 sequence 9860
ARC0: Unable to archive log 5 thread 2 sequence 9860
Log actively being archived by another process
Sun Aug 24 12:30:21 2014
ARC1: Completed archiving log 5 thread 2 sequence 9860
Sun Aug 24 12:37:01 2014
Successfully onlined Undo Tablespace 6.
Sun Aug 24 12:37:01 2014
SMON: enabling tx recovery
Sun Aug 24 12:37:28 2014
Database Characterset is US7ASCII
Sun Aug 24 13:00:04 2014
replication_dependency_tracking turned off (no async multimaster replication found)
Sun Aug 24 13:07:10 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_925808.trc.
Sun Aug 24 13:08:51 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_966808.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_995504.trc.
Sun Aug 24 13:09:25 2014
Waited too long for library cache load lock. More info in file /oracle/app/oracle/admin/xxxx/udump/xxxx2_ora_970794.trc.
Sun Aug 24 13:09:51 2014
Completed: alter database open
到週一早晨使用者反映說ORACLE心跳網路通訊資料量比平時大很多,我們監控了一段時間發現心跳網路速率在1M~4MB/s因為是10GB光纖網路,我感覺這個速率也是正常的,結果到了10點多使用者就說節點2DOWN掉了。
ARC0: Completed archiving log 5 thread 2 sequence 9864
Mon Aug 25 10:07:15 2014
IPC Send timeout detected. Sender ospid 2961608
Mon Aug 25 10:07:17 2014
Communications reconfiguration: instance 0
Mon Aug 25 10:07:17 2014
Trace dumping is performing id=[cdmp_20140825100717]
Mon Aug 25 10:09:02 2014
Waiting for clusterware split-brain resolution
Mon Aug 25 10:19:02 2014
USER: terminating instance due to error 481
Mon Aug 25 10:19:02 2014
Errors in file /oracle/app/oracle/admin/xxxx/bdump/xxxx2_lmon_1118210.trc:
ORA-29740: evicted by member 1, group incarnation 7
Mon Aug 25 10:19:35 2014
Trace dumping is performing id=[cdmp_20140825101905]
DIAG: terminating instance due to error 1092
Instance terminated by DIAG, pid = 1179834
檢查AIX系統ERRPT報錯
# errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
173C787F 0825150514 I S topsvcs Possible malfunction on local adapter
173C787F 0824122814 I S topsvcs Possible malfunction on local adapter
173C787F 0824122714 I S topsvcs Possible malfunction on local adapter
1BA7DF4E 0824111914 P S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
CB4A951F 0824111914 I S SRC SOFTWARE PROGRAM ERROR
發現有三條 Possible malfunction on local adapter報錯。
# errpt -aj 173C787F
---------------------------------------------------------------------------
LABEL: TS_LOC_DOWN_ST
IDENTIFIER: 173C787F
Date/Time: Mon Aug 25 15:05:54 BEIST 2014
Sequence Number: 43985
Machine Id: 00F880244C00
Node Id: yxyxyx2
Class: S
Type: INFO
Resource Name: topsvcs
Description
Possible malfunction on local adapter
Probable Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Failure Causes
Local adapter mal-functioned
Local adapter lost connection to network
Local adapter mis-configured
Recommended Actions
Verify adapter configuration
Verify network connectivity
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.39,6703
ERROR ID
6zV5DL.G/iyH/tE.0J66e.1...................
REFERENCE CODE
Adapter interface name
en13
Adapter offset
2
Adapter IP address
xx.xx.xx.xx
錯誤並不明顯,如果是網路有問題,會有明顯的網路卡錯誤EN13是其他兩個網路卡做ETHERCHANNEL繫結生成的,但是這個並不明顯,沒有明顯網路斷開的報錯。
經過分析,有可能是網路卡、光纖、或者交換機埠問題。
於是等到下班以後把所有光纖光纖換掉、光纖口換掉,
重啟資料庫,很順利就起來了,
第二天中午有觀察了一下,一切正常。
問題很奇怪!
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/527318/viewspace-1262820/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【問題處理】IPC Send timeout detected
- 【ASK_ORACLE】Oracle RAC報錯“ipc send timeout”的原因以及解決辦法Oracle
- redhat7 搭建oracle 11g RAC 問題與處理RedhatOracle
- RAC磁碟頭損壞問題處理
- Oracle 10g RAC故障處理Oracle 10g
- oracle SP2-問題處理Oracle
- Switch to short timeout for ipc polling
- Oracle日常問題處理ORA-04031Oracle
- ORACLE問題處理十個指令碼Oracle指令碼
- Oracle 11gr2修改RAC叢集的scan ip,並處理ORA-12514問題Oracle
- linux處理oracle問題常用命令LinuxOracle
- Oracle Linux 6.7中 Oracle 11.2.0.4 RAC叢集CRS異常處理OracleLinux
- Oracle CPU使用率過高問題處理Oracle
- ORACLE懸疑分散式事務問題處理Oracle分散式
- ORACLE RAC TO RAC DG搭建過程中可能遇到的問題Oracle
- pyinstaller打包cx_Oracle庫問題處理記錄Oracle
- oracle系統表空間過大問題處理Oracle
- Oracle 記一次ORA-00001問題處理Oracle
- Oracle資料庫中的逐行處理問題NEOracle資料庫
- oracle send mailOracleAI
- Oracle 19C RAC腦裂問題分析Oracle
- 打Oracle PSU時碰到的一些問題處理Oracle
- Oracle日常問題處理-資料庫無法啟動Oracle資料庫
- Oracle 11g ORA-600 [kjbrcrcvt:lms] 問題處理Oracle
- redis connect timeout問題排查Redis
- go 語言 proxy.golang.org timeout 無法訪問 處理方法Golang
- Oracle資料庫處理壞塊問題常用命令Oracle資料庫
- 如何處理Oracle資料庫中的壞塊問題(轉)Oracle資料庫
- 【ERROR】儲存鏈路問題造成oracle錯誤,ora-600[4193] 問題處理ErrorOracle
- golang json處理問題GolangJSON
- [git] git問題處理Git
- 關於Oracle 9i字符集問題的解決辦法FCOracle
- ORACLE RAC 11.2.0.4 FOR RHEL6叢集無法啟動的處理Oracle
- ORACLE rac 11.2.0.4 for rhel7.8 upgrade to 19.11.1 報錯ORA-29516處理Oracle
- JQ AJAX 超時問題 timeout
- Oracle OER 7451 in Load Indicator : Error Code = OSD-04500的問題處理OracleIndicatorError
- Oracle X9M ORA-15001 ORA-15018問題處理Oracle
- 銀河麒麟系統安裝ORACLE資料庫問題處理Oracle資料庫
- 併發問題處理方式