LMON TERMINIATING WITH ERROR ORA-29702
Bug No. | 6028803 | ||
Filed | 02-MAY-2007 | Updated | 12-JUN-2007 |
Product | Oracle Server - Enterprise Edition | Product Version | 9.2.0.8 |
Platform | Linux x86 | Platform Version | RHAS 3 |
Database Version | 9.2.0.8 | Affects Platforms | Port-Specific |
Severity | Minimal Loss of Service | Status | Not a Bug. To Filer |
Base Bug | N/A | Fixed in Product Version | No Data |
Problem statement:
LMON TERMINIATING WITH ERROR ORA-29702
*** 05/02/07 05:40 am ***
TAR:
----
.
PROBLEM:
--------
On 9.2.0.8.0 in Production:
Every now and then LMON is terminating the instance,
the following error occurs:
.
ERROR
-----------------------
ORA-29702 error occurred in Cluster Group Service operation
.
-- Steps To Reproduce:
The issue can be reproduced at will with the following steps:
1. Suddenly the error ORA-29702 appeared.
2. LMON terminated the instance.
.
DIAGNOSTIC ANALYSIS:
--------------------
Alert Log: node#2
-------------
.
Sun Apr 15 00:21:22 2007
Errors in file
/home/oracle/app/product/9.2.0.5.0/rdbms/log/somprd2_lmon_21839.trc:
ORA-29702 : error occurred in Cluster Group Service operation
Sun Apr 15 00:21:22 2007
LMON: terminating instance due to error 29702
Sun Apr 15 00:21:22 2007
System state dump is made for local instance
Sun Apr 15 00:21:27 2007
Instance terminated by LMON, pid = 21839
.
WORKAROUND:
-----------
N/A
.
RELATED BUGS:
-------------
.
REPRODUCIBILITY:
----------------
occurs every now and then.
.
TEST CASE:
----------
N/A
.
STACK TRACE:
------------
.
SUPPORTING INFORMATION:
-----------------------
.
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
.
DIAL-IN INFORMATION:
--------------------
.
IMPACT DATE:
------------
.
*** 05/02/07 05:41 am ***
Log File: cm.log on node#2
-----------
.
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850771996
file = unixinc.c, line = 833 {Sun Apr 15 00:19:16 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850788369
file = unixinc.c, line = 833 {Sun Apr 15 00:19:52 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850804764
file = unixinc.c, line = 833 {Sun Apr 15 00:19:53 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850821137
file = unixinc.c, line = 833 {Sun Apr 15 00:20:41 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850837532
file = unixinc.c, line = 833 {Sun Apr 15 00:20:41 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850853905
file = unixinc.c, line = 833 {Sun Apr 15 00:20:51 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850870300
file = unixinc.c, line = 833 {Sun Apr 15 00:20:51 2007 }
>WARNING: ReadCommPort: received error=104 on recv()., tid = 756793359 file
= unixinc.c, line = 841 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 848773149
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850116629
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 757022747
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 756989978
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 784973857
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 757055504
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 847069206
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 816824340
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 756973593
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 757317642
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 756957208
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 757006345
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 757039122
file = unixinc.c, line = 833 {Sun Apr 15 00:21:2 2 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 757088275
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>ERROR: WriteEventPort: write failed with error 32., tid = 757088275 file
= unixinc.c, line = 981 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 756940823
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 756760590
file = unixinc.c, line = 833 {Sun Apr 15 00:21:22 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850886665
file = unixinc.c, line = 833 {Sun Apr 15 05:54:16 2007 }
>WARNING: ReadCommPort: socket closed by peer on recv()., tid = 850903050
file = unixinc.c, line = 833 {Sun Apr 15 05:54:16 2007 }
*** 05/02/07 05:42 am ***
.
.
libskgxn9: node#1:
-rwxr-xr-x 1 oracle oinstall 65236 Apr 8 06:34
/home/oracle/app/product/9.2.0.5.0/lib/libskgxn9.so
.
libskgxn9: node#2:
-rwxr-xr-x 1 oracle oinstall 64274 Mar 11 2004
/home/oracle/app/product/9.2.0.5.0/lib/libskgxn9.so
.
ls -al $ORACLE_HOME/bin/ora* : node#1
-rwxr-xr-x 1 oracle oinstall 46 Nov 20 2001
/home/oracle/app/product/9.2.0.5.0/bin/oracg
-rwsr-s--x 1 oracle oinstall 53734840 Apr 8 07:23
/home/oracle/app/product/9.2.0.5.0/bin/oracle
-rwsr-s--x 1 oracle oinstall 53734840 Apr 8 07:20
/home/oracle/app/product/9.2.0.5.0/bin/oracleO
-rwxr-xr-x 1 oracle oinstall 2548 Jul 26 2003
/home/oracle/app/product/9.2.0.5.0/bin/oraenv
.
.
ls -al $ORACLE_HOME/bin/ora* : node#2
-rwxr-xr-x 1 oracle oinstall 46 Nov 20 2001
/home/oracle/app/product/9.2.0.5.0/bin/oracg
-rwsr-s--x 1 oracle oinstall 53734840 Apr 8 07:23
/home/oracle/app/product/9.2.0.5.0/bin/oracle
-rwsr-s--x 1 oracle oinstall 53735479 Apr 8 06:35
/home/oracle/app/product/9.2.0.5.0/bin/oracleO
-rwsr-s--x 1 oracle oinstall 52999216 May 15 2005
/home/oracle/app/product/9.2.0.5.0/bin/oracleO.bak
-rwxr-xr-x 1 oracle oinstall 2548 Jul 26 2003
/home/oracle/app/product/9.2.0.5.0/bin/oraenv
*** 05/09/07 08:26 am *** (CHG: Sta->16)
*** 05/09/07 08:26 am ***
uploaded the following files:
RDA.RDA_somprd1.zip
RDA.RDA_somprd2.zip
alert_traces.zip
.
Thanks in advance
*** 05/09/07 09:13 pm *** (CHG: Sta->10)
*** 05/09/07 09:14 pm ***
*** 05/14/07 05:09 am ***
This is what happenened in a previous time. The same ORA-29702 happened, it
can be seen in LMON trace log:
*** SESSION ID:(3.1) 2007-04-13 10:45:27.324
GES IPC: Receivers 3 Senders 3
GES IPC: Buffers Receive 1000 Send (i:660 b:660) Reserve 430
GES IPC: Msg Size Regular 396 Batch 2048
Batch msg size = 2048
Batching factor: enqueue replay 48, ack 53
Batching factor: cache replay 34 size per lock 56
kjxggin: receive buffer size = 32768
kjxgmin: SKGXN ver (2 1 Oracle 9i Reference CM)
CMCLI WARNING: CMInitContext: init ctx(0xccfb358)
*** 2007-04-13 10:45:31.439
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 0 0.
*** 2007-04-13 10:45:31.441
Name Service frozen
kjxgmcs: Setting state to 0 1.
kjfcpiora: publish my weight 62874
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
Name Service normal
Name Service recovery done
*** 2007-04-13 10:45:32.808
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
*** 2007-04-13 10:45:32.920
*** 2007-04-13 10:45:32.920
Reconfiguration started (old inc 0, new inc 6)
Synchronization timeout interval: 660 sec
List of nodes:
0 1
Global Resource Directory frozen
node 0
release 9 2 0 8
node 1
release 9 2 0 8
res_master_weight for node 0 is 62874
res_master_weight for node 1 is 62874
Total master weight = 125748
Dead inst
Join inst 0 1
Exist inst
Active Sendback Threshold = 50 %
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 0
0 GCS shadows traversed, 0 cancelled, 0 closed
0 GCS resources traversed, 0 cancelled
set master node info
Submitted all remote-enqueue requests
kjfcrfg: Number of mesgs sent to node 0 = 0
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
*** 2007-04-13 10:45:34.508
0 GCS shadows traversed, 0 replayed, 0 unopened
Submitted all GCS cache requests
0 write requests issued in 31086 GCS resources
7 PIs marked suspect, 0 flush PI msgs
*** 2007-04-13 10:45:34.775
Reconfiguration complete
*** 2007-04-13 10:45:39.636
kjxgrtmc2: Member 1 thread 2 mounted
CMCLI WARNING: ReadCommPort: poll() failed
kjxggpoll: received an error event from DBALL_DB
Return code from kjxggpoll: 10
error 29702 detected in background process
ORA-29702: error occurred in Cluster Group Service operation
ksuitm: waiting for [5] seconds before killing DIAG
.
.
--
The concern here that when checking the libskgxn9.so library on both nodes,
they are not the same. So may be when the reconfiguration happened the issue
appeared.
.
The same time the ORA-29702 appeared in alert log these errors appeared in
cm.log:
>WARNING: ReadCommPort: received error=104 on recv()., tid = 756793359 file
= unixinc.c, line = 841 {Sun Apr 15 00:21:22 2007 }
>ERROR: WriteEventPort: write failed with error 32., tid = 757088275 file
= unixinc.c, line = 981 {Sun Apr 15 00:21:22 2007 }
.
This happened in the node which has a wrong libskgxn9.so library.
.
Thanks
*** 05/14/07 05:10 am *** (CHG: Sta->16)
*** 05/23/07 07:24 am ***
Hi,
Could you please provide an update?
.
Thanks
Amr
*** 05/30/07 10:45 am *** (CHG: G/P->P Asg->NEW OWNER)
*** 05/30/07 10:45 am ***
*** 06/04/07 06:42 am *** (CHG: Asg->NEW OWNER)
*** 06/04/07 06:42 am ***
*** 06/04/07 08:34 am ***
*** 06/04/07 08:34 am *** (CHG: Sta->10)
*** 06/11/07 08:43 am *** (CHG: Sta->16)
*** 06/11/07 08:43 am ***
*** 06/12/07 07:51 am ***
*** 06/12/07 07:52 am *** (CHG: Sta->32)
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/35489/viewspace-996629/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- LMON: terminating instance due to error 29702 -- ORA-29702Error
- ORA-29702: error occurred in Cluster Group Service operationError
- ORA-29702:error occurred in Cluster Group Service operation錯誤解決Error
- [總結]9i RAC LMON: terminating instance due to error 29702Error
- [總結]9i RAC LMON: terminating instance due to error 29702Error
- NON-RAC Database Startup Giving Error ORA-29702 (Doc ID 433310.1)DatabaseError
- Oracle9.2.0.4 RAC 升級到Oracle9.2.0.7 ,LMON: terminating instance due to error 29702OracleError
- 基於oracle 10.2.0.1 rac學習lmon程式系列六Oracle
- ORA-29702複製RAC Oracle軟體啟動單例項Oracle單例
- 基於oracle 10.2.0.1 rac學習lmon程式系列六之增補一Oracle
- rac節點無法啟動ORA-29702的問題及分析
- RAC環境單例項啟動資料庫收到ORA-29702報錯單例資料庫
- [重慶思莊技術分享]-資料庫啟動報:ORA-29702錯誤資料庫
- mount error(5): Input/output errorError
- 【ERROR】OPatch failed with error code 73ErrorAI
- error:slave communication error with ASMErrorASM
- ERROR: slave communication error with ASMErrorASM
- MMON encountered error 959, clearing the errorError
- 如何解決"Parse error: syntax error"Error
- DB error due to HP-UX Error:23ErrorUX
- CRS ERROR - PRKC-1073 - ErrorError
- Error page: / Error infos: DedeCms錯誤警告Error
- Error for iOSErroriOS
- error事件Error事件
- Error: no such columnError
- DepthClipEnable errorError
- Error StackError
- innobackupex: Error: noError
- ERROR 1045 (28000): ProxySQL Error: 報錯ErrorSQL
- Original error: Error: socket hang upError
- ERROR 2026 (HY000): SSL connection error: unknown error numberError
- Golang 學習——error 和建立 error 原始碼解析GolangError原始碼
- Setup had an error Error: At least one of these paths should existErrorAST
- MySQL ERROR Got an error reading communication packetsMySqlErrorGo
- [ERROR] InnoDB: Unable to lock ./ibdata1, error: 11Error
- Install Error : ERROR:SQLException null Resultset [ID 875002.1]ErrorSQLExceptionNull
- FATAL ERROR IN TWO-TASK SERVER: error = 12569ErrorServer
- Last_IO_Error: Got fatal error 1236ASTErrorGo