ORA-07445: exception encountered: core dump [skgxpdmpctx

jichengjie發表於2014-06-09

報錯資訊:ORA-07445: exception encountered: core dump [skgxpdmpctx()+183] [SIGFPE] [Integer divide by zero] [0x4001DE3F] [] []
環境:suse9+oracle 9206RAC
兩個節點的RAC中的一個例項夜間down掉
alert報錯資訊如下:
Thread 2 advanced to log sequence 1764
  Current log# 6 seq# 1764 mem# 0: /opt/oracle/oradata/psm12/redo203.log
Thu May 29 00:51:03 2014
ARC0: Evaluating archive   log 5 thread 2 sequence 1763
ARC0: Beginning to archive log 5 thread 2 sequence 1763
Creating archive destination LOG_ARCHIVE_DEST_1: '/oraarch/arch01/2_1763.dbf'
ARC0: Completed archiving  log 5 thread 2 sequence 1763
Thu Jun  5 23:24:06 2014
Errors in file /opt/oracle/admin/psm12/bdump/psm122_lck0_6154.trc:
ORA-07445: exception encountered: core dump [skgxpdmpctx()+183] [SIGFPE] [Integer divide by zero] [0x4001DE3F] [] []
Thu Jun  5 23:24:08 2014
Errors in file /opt/oracle/admin/psm12/bdump/psm122_pmon_6103.trc:
ORA-00480: LCK* process terminated with error
Thu Jun  5 23:24:08 2014
PMON: terminating instance due to error 480
Thu Jun  5 23:24:08 2014
Errors in file /opt/oracle/admin/psm12/bdump/psm122_lms0_6113.trc:
ORA-00480: LCK* process terminated with error
Thu Jun  5 23:24:08 2014
Errors in file /opt/oracle/admin/psm12/bdump/psm122_lmon_6107.trc:
ORA-00480: LCK* process terminated with error
Instance terminated by PMON, pid = 6103
可以正常啟動,啟動過程也沒有報錯,在網上查到應該是bug4059639
典型的報錯資訊skgxpdmpctx()+183] [SIGFPE
說是與RAC中使用udp協議有關,由網路錯誤觸發,
多發生在9205-9206版本
在9207已經修補
檢視了當時的messages檔案發現了網路問題

Jun  5 21:59:00 psm3 /USR/SBIN/CRON[9486]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly)
Jun  5 22:17:01 psm3 -- MARK --
Jun  5 22:37:01 psm3 -- MARK --
Jun  5 22:57:01 psm3 -- MARK --
Jun  5 22:59:00 psm3 /USR/SBIN/CRON[9577]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly)
Jun  5 23:17:01 psm3 -- MARK --
Jun  5 23:24:02 psm3 kernel: tg3: eth1: Link is down.
Jun  5 23:24:02 psm3 kernel: tg3: eth0: Link is down.
Jun  5 23:25:04 psm3 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
Jun  5 23:25:04 psm3 kernel: tg3: eth0: Flow control is off for TX and off for RX.
Jun  5 23:25:04 psm3 kernel: tg3: eth1: Link is up at 1000 Mbps, full duplex.
Jun  5 23:25:04 psm3 kernel: tg3: eth1: Flow control is off for TX and off for RX.
Jun  5 23:25:17 psm3 kernel: tg3: eth1: Link is down.
Jun  5 23:25:17 psm3 kernel: tg3: eth0: Link is down.
Jun  5 23:26:15 psm3 kernel: tg3: eth1: Link is up at 1000 Mbps, full duplex.
Jun  5 23:26:15 psm3 kernel: tg3: eth1: Flow control is off for TX and off for RX.
Jun  5 23:26:15 psm3 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
Jun  5 23:26:15 psm3 kernel: tg3: eth0: Flow control is off for TX and off for RX.
Jun  5 23:37:01 psm3 -- MARK --
Jun  5 23:57:01 psm3 -- MARK --
以下是oracle有關報錯資訊:
> cat psm122_lms1_6115.trc
/opt/oracle/admin/psm12/bdump/psm122_lms1_6115.trc
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production
ORACLE_HOME = /opt/oracle/product/9ir2
System name:    Linux
Node name:      psm3
Release:        2.6.5-7.244-bigsmp
Version:        #1 SMP Mon Dec 12 18:32:25 UTC 2005
Machine:        i686
Instance name: psm122
Redo thread mounted by this instance: 2
Oracle process number: 7
Unix process pid: 6115, image: (LMS1)

*** 2014-06-05 23:24:08.544
*** SESSION ID:(7.1) 2014-06-05 23:24:08.543
error 480 detected in background process
ORA-00480: LCK* process terminated with error
> cat  psm122_pmon_6103.trc
/opt/oracle/admin/psm12/bdump/psm122_pmon_6103.trc
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production
ORACLE_HOME = /opt/oracle/product/9ir2
System name:    Linux
Node name:      psm3
Release:        2.6.5-7.244-bigsmp
Version:        #1 SMP Mon Dec 12 18:32:25 UTC 2005
Machine:        i686
Instance name: psm122
Redo thread mounted by this instance: 2
Oracle process number: 2
Unix process pid: 6103, image: (PMON)

*** 2014-06-05 23:24:08.528
*** SESSION ID:(1.1) 2014-06-05 23:24:08.528
error 480 detected in background process
ORA-00480: LCK* process terminated with error
ksuitm: waiting for [5] seconds before killing DIAG
> cat  psm122_diag_6105.trc
/opt/oracle/admin/psm12/bdump/psm122_diag_6105.trc
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production
ORACLE_HOME = /opt/oracle/product/9ir2
System name:    Linux
Node name:      psm3
Release:        2.6.5-7.244-bigsmp
Version:        #1 SMP Mon Dec 12 18:32:25 UTC 2005
Machine:        i686
Instance name: psm122
Redo thread mounted by this instance: 0
Oracle process number: 3
Unix process pid: 6105, image: (DIAG)

*** SESSION ID:(2.1) 2011-09-02 11:07:53.315
kjzcprt:rcv port created
Node id: 1
List of nodes: 1,
*** 2011-09-02 11:07:53.328
Reconfiguration starts [incarn=0]
I'm the master node
*** 2011-09-02 11:07:53.328
Reconfiguration completes [incarn=1]
cluster reconfiguration is ongoing: 0, 1,
*** 2011-09-02 11:08:34.444
Reconfiguration starts [incarn=2]
I'm the voting node
Send my bitmap to master 0
Rcfg confirmation is received from master 0
I agree with the rcfg confirmation
*** 2011-09-02 11:08:44.075
Reconfiguration completes [incarn=2]
cluster reconfiguration is ongoing: 0, 1, 2,
*** 2011-09-20 13:31:06.483
Reconfiguration starts [incarn=3]
I'm the voting node
Send my bitmap to master 0
Rcfg confirmation is received from master 0
I agree with the rcfg confirmation
*** 2011-09-20 13:31:16.186
Reconfiguration completes [incarn=3]
cluster reconfiguration is ongoing: 0, 1,
*** 2011-09-20 13:34:42.517
Reconfiguration starts [incarn=4]
I'm the voting node
Send my bitmap to master 0
Rcfg confirmation is received from master 0
I agree with the rcfg confirmation
*** 2011-09-20 13:34:52.203
Reconfiguration completes [incarn=4]
cluster reconfiguration is ongoing: 0, 1, 2,
*** 2012-01-05 16:23:41.115
Reconfiguration starts [incarn=5]
I'm the voting node
Send my bitmap to master 0
Rcfg confirmation is received from master 0
I agree with the rcfg confirmation
*** 2012-01-05 16:23:50.740
Reconfiguration completes [incarn=5]
cluster reconfiguration is ongoing: 0, 1,
*** 2012-01-05 16:38:37.406
Reconfiguration starts [incarn=6]
I'm the voting node
Send my bitmap to master 0
Rcfg confirmation is received from master 0
I agree with the rcfg confirmation
*** 2012-01-05 16:38:47.293
Reconfiguration completes [incarn=6]
cluster reconfiguration is ongoing: 0, 1, 2,
*** 2013-12-10 14:46:41.210
Reconfiguration starts [incarn=7]
I'm the voting node
Send my bitmap to master 0
Rcfg confirmation is received from master 0
I agree with the rcfg confirmation
*** 2013-12-10 14:46:50.834
Reconfiguration completes [incarn=7]
cluster reconfiguration is ongoing: 0, 1,
*** 2013-12-10 14:47:07.673
Reconfiguration starts [incarn=8]
I'm the voting node
Send my bitmap to master 0
Rcfg confirmation is received from master 0
I agree with the rcfg confirmation
*** 2013-12-10 14:47:17.655
Reconfiguration completes [incarn=8]
*** 2014-06-05 23:24:07.854
Multi-instance trace dumping is requested by a fatal process
Instance is terminating by an unknown process
Performing diagnostic data dump for this instance
> tail -10 psm122_lmon_6107.trc
 25787 GCS shadows traversed, 12570 replayed, 428 unopened
 Submitted all GCS cache requests
 0 write requests issued in 12789 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
* kjdrqrnums: node 0 resnum could not be queried (ret 7).
*** 2011-09-02 11:08:39.873
Reconfiguration complete
*** 2014-06-05 23:24:08.546
error 480 detected in background process
ORA-00480: LCK* process terminated with error
> vi  psm122_lck0_6154.trc

/opt/oracle/admin/psm12/bdump/psm122_lck0_6154.trc
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production
ORACLE_HOME = /opt/oracle/product/9ir2
System name:    Linux
Node name:      psm3
Release:        2.6.5-7.244-bigsmp
Version:        #1 SMP Mon Dec 12 18:32:25 UTC 2005
Machine:        i686
Instance name: psm122
Redo thread mounted by this instance: 2
Oracle process number: 19
Unix process pid: 6154, image: (LCK0)

*** 2014-06-05 23:24:06.441
*** SESSION ID:(19.1) 2014-06-05 23:24:06.440
------ Dumping SKGXP context ------
SKGXPCTX: 0x0xaef392c ctx
wait delta 4294967 sec (-8 msec) ctx ts 0x4485d8d2 last ts 0x4485d8da
Exception signal: 8 (SIGFPE), code: 1 (Integer divide by zero), addr: 0x4001de3f, PC: [0x4001de3f, skgxpdmpctx()+183]
Registers:
%eax: 0x00000000 %ebx: 0x4002d894 %ecx: 0x00000000
%edx: 0x00000000 %edi: 0x00000000 %esi: 0x4002a4c0
%esp: 0xbfffd928 %ebp: 0xbfffd970 %eip: 0x4001de3f
%efl: 0x00010246
  skgxpdmpctx()+170 (0x4001de32) lea 0xffffcc2c(%ebx),%eax
  skgxpdmpctx()+176 (0x4001de38) mov %eax,0xffffffe8(%ebp)
  skgxpdmpctx()+179 (0x4001de3b) mov %eax,%esi
  skgxpdmpctx()+181 (0x4001de3d) mov %ecx,%eax
> skgxpdmpctx()+183 (0x4001de3f) div %edi,%eax
  skgxpdmpctx()+185 (0x4001de41) xor %edx,%edx
  skgxpdmpctx()+187 (0x4001de43) mov %eax,0xffffffe4(%ebp)
  skgxpdmpctx()+190 (0x4001de46) mov 0xfffffff4(%ebp),%eax
  skgxpdmpctx()+193 (0x4001de49) div %edi,%eax
*** 2014-06-05 23:24:06.484
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [skgxpdmpctx()+183] [SIGFPE] [Integer divide by zero] [0x4001DE3F] [] []
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+274         call     ksedst()             1 ? 0 ? 0 ? 1 ? 6165252C ?
                                                   25280078 ?
ssexhd()+1113        call     ksedmp()             3 ? 0 ? 0 ? 0 ? 0 ? 0 ?
__pthread_sighandle  call     00000000             8 ? 4070FC90 ? 4070FD10 ? 0 ?
r_rt()+121                                         0 ? 0 ?
skgxpdmpctx()+183    signal   018DCECA             8 ? 4070FC90 ? 4070FD10 ?
kjugscn()+6282       call     08213474             AEF392C ? 0 ? 2 ? A6C9A18 ?
kcssct()+54          call     kjugscn()            64833CB0 ? BFFFDF50 ?
                                                   A4768495 ? BFFFDD01 ?
                                                   AEF8600 ? A63E000 ?
kcsciln()+94         call     kcssct()             6CA12DD4 ? 146 ? AEC2D3C ?
                                                   0 ? A4768369 ? 12C ?
kcsmto()+30          call     kcsciln()            0 ? A4768369 ? BFFFDF10 ?
                                                   85288B1 ? 0 ? 0 ?
"psm122_lck0_6154.trc" 21404L, 1240450C                                                     

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26870952/viewspace-1178832/,如需轉載,請註明出處,否則將追究法律責任。

相關文章