IPC Send timeout detected. Receiver ospid 25822

tolywang發表於2010-02-13

NODE1  alert log  Message  :  :

Thu Feb 11 16:12:16 2010
Thread 1 advanced to log sequence 8254 (LGWR switch)
  Current log# 28 seq# 8254 mem# 0: /ocfs_ctrl_redo/mxdell/redo28.log
  Current log# 28 seq# 8254 mem# 1: /ocfs_data/mxdell/redo28.log
Thu Feb 11 16:30:02 2010
IPC Send timeout detected. Receiver ospid 25822
Thu Feb 11 16:30:02 2010
Errors in file /u01/product/admin/mxdell/udump/mxdell1_ora_25822.trc:
IPC Send timeout detected. Receiver ospid 25822
Thu Feb 11 16:30:03 2010
Errors in file /u01/product/admin/mxdell/udump/mxdell1_ora_25822.trc:
Thu Feb 11 16:34:05 2010
Thread 1 advanced to log sequence 8255 (LGWR switch)
  Current log# 30 seq# 8255 mem# 0: /ocfs_ctrl_redo/mxdell/redo30.log
  Current log# 30 seq# 8255 mem# 1: /ocfs_data/mxdell/redo30.log
Thu Feb 11 16:46:21 2010
Thread 1 advanced to log sequence 8256 (LGWR switch)
  Current log# 29 seq# 8256 mem# 0: /ocfs_ctrl_redo/mxdell/redo29.log
  Current log# 29 seq# 8256 mem# 1: /ocfs_data/mxdell/redo29.log


----------------------------------------------------------------------------------------------------------


NODE3  alert log  Message  :

Thu Feb 11 16:12:15 2010
Thread 3 advanced to log sequence 7401 (LGWR switch)
  Current log# 21 seq# 7401 mem# 0: /ocfs_ctrl_redo/mxdell/redo21.log
  Current log# 21 seq# 7401 mem# 1: /ocfs_data/mxdell/redo21.log
Thu Feb 11 16:22:11 2010
Thread 3 advanced to log sequence 7402 (LGWR switch)
  Current log# 22 seq# 7402 mem# 0: /ocfs_ctrl_redo/mxdell/redo22.log
  Current log# 22 seq# 7402 mem# 1: /ocfs_data/mxdell/redo22.log
Thu Feb 11 16:30:02 2010
IPC Send timeout detected.Sender: ospid 15764
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:30:02 2010
IPC Send timeout detected.Sender: ospid 15766
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:34:41 2010
Thread 3 advanced to log sequence 7403 (LGWR switch)
  Current log# 23 seq# 7403 mem# 0: /ocfs_ctrl_redo/mxdell/redo23.log
  Current log# 23 seq# 7403 mem# 1: /ocfs_data/mxdell/redo23.log

----------------------------------------------------------------------------------------------------------

NODE4 alert log  Message

Thu Feb 11 15:57:24 2010
Thread 4 advanced to log sequence 3193 (LGWR switch)
  Current log# 8 seq# 3193 mem# 0: /ocfs_ctrl_redo/mxdell/redo08_1.log
  Current log# 8 seq# 3193 mem# 1: /ocfs_data/mxdell/redo08_2.log
Thu Feb 11 16:12:16 2010
Thread 4 advanced to log sequence 3194 (LGWR switch)
  Current log# 9 seq# 3194 mem# 0: /ocfs_ctrl_redo/mxdell/redo09_1.log
  Current log# 9 seq# 3194 mem# 1: /ocfs_data/mxdell/redo09_2.log
Thu Feb 11 16:30:02 2010
IPC Send timeout detected.Sender: ospid 6046
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:30:02 2010
IPC Send timeout detected.Sender: ospid 6071
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:30:02 2010
IPC Send timeout detected.Sender: ospid 6003
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:30:02 2010
IPC Send timeout detected.Sender: ospid 6035
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:30:02 2010
IPC Send timeout detected.Sender: ospid 6016
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:30:03 2010
IPC Send timeout detected.Sender: ospid 6067
Receiver: inst 1 binc 6 ospid 25822
Thu Feb 11 16:42:08 2010
Thread 4 advanced to log sequence 3195 (LGWR switch)
  Current log# 10 seq# 3195 mem# 0: /ocfs_ctrl_redo/mxdell/redo10_1.log
  Current log# 10 seq# 3195 mem# 1: /ocfs_data/mxdell/redo10_2.log

 

----------------------------------------------------------------------------------------------------------


NODE1 trace file message : 

/u01/product/admin/mxdell/udump/mxdell1_ora_25822.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/product/oracle
System name:    Linux
Node name:      mxrac01
Release:        2.6.18-128.el5
Version:        #1 SMP Wed Dec 17 11:41:38 EST 2008
Machine:        x86_64
Instance name: mxdell1
Redo thread mounted by this instance: 1
Oracle process number: 583
Unix process pid: 25822, image:

*** 2010-02-11 16:24:05.672
*** ACTION NAME:() 2010-02-11 16:24:05.672
*** MODULE NAME:(TOAD 9.0.1.8) 2010-02-11 16:24:05.672
*** SERVICE NAME:(mxdell) 2010-02-11 16:24:05.672
*** SESSION ID:(2015.6536) 2010-02-11 16:24:05.672
SKGXPSEGRCV: MESSAGE TRUNCATED user data 48 bytes payload 2104 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta. data header 0x0x7fff7775a678 len 48 bytes
SKGXPLOSTACK: message truncation expected
SKGXPLOSTACK: data sent to port with no buffers queued from
SKGXPGPID 0x7fff7775a7c8        Internet address 192.168.1.14   UDP port number 17506
SKGXPLOSTACK: sent seq 32787 expecting 32788
SKGXPLOSTACK: lost ack detected retransmit ack
SKGXPSEGRCV: MESSAGE TRUNCATED user data 48 bytes payload 2104 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta. data header 0x0x7fff7775a678 len 48 bytes
SKGXPLOSTACK: message truncation expected
SKGXPLOSTACK: data sent to port with no buffers queued from
SKGXPGPID 0x7fff7775a7c8        Internet address 192.168.1.14   UDP port number 13121
SKGXPLOSTACK: sent seq 32787 expecting 32788
SKGXPLOSTACK: lost ack detected retransmit ack
SKGXPSEGRCV: MESSAGE TRUNCATED user data 48 bytes payload 2104 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta. data header 0x0x7fff7775a678 len 48 bytes
SKGXPLOSTACK: message truncation expected
SKGXPLOSTACK: data sent to port with no buffers queued from
SKGXPGPID 0x7fff7775a7c8        Internet address 192.168.1.14   UDP port number 30817
SKGXPLOSTACK: sent seq 32787 expecting 32788
SKGXPLOSTACK: lost ack detected retransmit ack
SKGXPSEGRCV: MESSAGE TRUNCATED user data 48 bytes payload 2104 bytes
SKGXPSEGRCV: trucated message buffer data skgxpmsg meta. data header 0x0x7fff7775a678 len 48 bytes
SKGXPLOSTACK: message truncation expected
SKGXPLOSTACK: data sent to port with no buffers queued from
SKGXPGPID 0x7fff7775a7c8        Internet address 192.168.1.14   UDP port number 47606

 

=======================================


連結:


 

IPC Send timeout 是 Oracle10g Rac中非常讓人頭痛的一個問題,在資源緊張、網路擁堵等情況下,就有可能發生IPC超時的問題,而RAC隨後就會將問題節點驅逐,引發一輪重新配置。

可喜的是Metalink上針對10.2.0.3有了一個Patch可以修正,而且在10.2.0.4中徹底修正了該問題。
常見的錯誤提示是這樣的:

Thu Nov 27 11:32:05 2008
IPC Send timeout detected. Receiver ospid 4001974
Thu Nov 27 11:33:08 2008
Trace dumping is performing id=[cdmp_20081127113236]
Thu Nov 27 11:34:37 2008
Errors in file /oracle/app/product/admin/srs/bdump/srs1_lms1_4001974.trc:
Thu Nov 27 11:34:38 2008
Errors in file /oracle/app/product/admin/srs/bdump/srs1_lmon_3977348.trc:
ORA-29740: evicted by member 1, group incarnation 32
Thu Nov 27 11:34:38 2008
LMON: terminating instance due to error 29740

這個BUG號是 。

在我的印象裡10.2.0.3的確常有這個問題,而10.2.0.4卻很少看到。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/35489/viewspace-627246/,如需轉載,請註明出處,否則將追究法律責任。

相關文章