RAC Instance Crashes During Startup Due To Error 495 - GEN0 process terminated with error (文件 ID 1547091.1)

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1 to 11.2.0.3 [Release 11.2]
Information in this document applies to any platform.

Symptoms

o GI & RDBMS 11.2.0.3.0 running on a RAC cluster
o database instances on all nodes keep crashing during startup
o the database alert log shows that PMON terminated the instance:

PMON (ospid: 2660): terminating the instance due to error 495
System state dump requested by (instance=1, osid=2660 (PMON)), summary=[abnormal instance termination].

ORA-495 means "GEN0 process terminated with error", the corresponding GEN0 trace file shows that the process terminated with ORA-7445:

Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF00000001] [PC:0x38024797C0, strlen()+16] [flags: 0x0, count: 1]
Errors in file /u01/app/oracle/diag/rdbms/racdb/RACDB1/trace/RACDB1_gen0_2674.trc (incident=2616070):
ORA-07445: exception encountered: core dump [strlen()+16] [SIGSEGV] [ADDR:0x7FFF00000001] [PC:0x38024797C0] [Address not mapped to object] []
Incident details in: /u01/app/oracle/diag/rdbms/racdb/RACDB1/incident/incdir_2616070/RACDB1_gen0_2674_i2616070.trc

The corresponding incident trace file shows the following call stack:

========= Dump for incident 2616070 (ORA-7445 [strlen()+16]) ========
----- Beginning of Customized Incident Dump(s) -----
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF00000001]
[PC:0x38024797C0, strlen()+16] [flags: 0x0, count: 1]
<<..>>
*** 2011-12-08 20:17:55.983
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3,
mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
.
----- Call Stack Trace -----
....
_IO_vfprintf()+1752 call strlen() 7FFF00000001 ? 00000002D ?
4 7FFF1575C420 ? 000000007 ?
FEFEFEFEFEFEFEFF ?
415441444B434952 ?
vsnprintf()+149 call _IO_vfprintf() 7FFF1575BDB0 ?
2AC347D5D740 ?
7FFF1575C3C8 ? 000000007 ?
FEFEFEFEFEFEFEFF ?
415441444B434952 ?
clsrapii_print2()+1 call vsnprintf() 7FFF1575BF10 ?
2AC347D5D740 ?
95 2AC347D5D740 ?
7FFF1575C3C8 ?
FEFEFEFEFEFEFEFF ?
415441444B434952 ?
clsrapii_upd_db_res call clsrapii_print2() 7FFF1575BF10 ?
2AC347D5D740 ?
()+1916 2AC347D5D740 ? 000000046 ?
7FFF1575C484 ?
7FFF1575CC84 ?
clsr_add_db_dg_dep2 call clsrapii_upd_db_res 2AC34A2D0450 ? 01560C0D0 ?
()+4345 () 2AC347D59AA8 ?
7FFF1575E3B4 ?
2000000001 ? 2AC347D5A62C
?
kjha_add_db_dg_dep( call clsr_add_db_dg_dep2 2AC34A2D0450 ? 060009C50 ?
)+626 () 7FFF1575FAF0 ? 000000001 ?
00659A692 ? 7FFF1575FB10 ?
kfgbDGRes()+795 call kjha_add_db_dg_dep( 060009C50 ? 7FFF157609C0 ?
) 000000001 ? 000000001 ?
00659A692 ? 7FFF1575FB10 ?
ksbabs()+1878 call kfgbDGRes() 7FFF157609A8 ? 000000050 ?
000000001 ? 000000001 ?
00659A692 ? 7FFF1575FB10 ?
ksbrdp()+1613 call ksbabs() 7FFF157609A8 ? 000000050 ?
000000001 ? 000000001 ?
00659A692 ? 7FFF1575FB10 ?
opirip()+994 call ksbrdp() 7FFF157609A8 ? 000000050 ?
000000001 ? 000000001 ?
00659A692 ? 7FFF1575FB10 ?
opidrv()+1139 call opirip() 000000032 ? 000000004 ?
sou2o()+143 call opidrv() 000000032 ? 000000004 ?
7FFF15761E18 ? 000000001 ?
00659A692 ? 7FFF1575FB10 ?
opimai_real()+884 call sou2o() 7FFF15761DE0 ? 000000032 ?
000000004 ? 7FFF15761E18 ?
00659A692 ? 7FFF1575FB10 ?
ssthrdmain()+473 call opimai_real() 7FFF15762F9B ? 000000000 ?
000000004 ? 7FFF15761E18 ?
00659A692 ? 7FFF1575FB10 ?
main()+196 call ssthrdmain() 000000003 ? 7FFF15762020 ?
000000001 ? 000000000 ?
00659A692 ? 7FFF1575FB10 ?
__libc_start_main() call main() 000000003 ? 7FFF157621C0 ?
+244 000000001 ? 000000000 ?

Changes

Added a new tablespaces on new ASM diskgroups to the database.

Cause

The issue is caused by internal, unpublished Bug 13483672 "ORA-7445 [strlen()+16] creating database dependencies for large number of disk groups" which causes a buffer overflow if the diskgroup dependencies of the database resource exceed a certain size.
The Bug 13483672 has been fixed in 11.2.0.3 PSU 3 and windows 11.2.0.3.7 patch bundle. Interim patch has also been provided for 11.2.0.2 on certain platforms.

Solution

A. Short Term (workaround)

Set hidden parameter "_notify_crs" to false, which will prevent the database instance from notifying the CRS daemon process when diskgroups are being mounted:

set "_notify_crs"=FALSE in pfile or spfile, then restart the database.

B. Long Term

Apply Grid Infrastructure 11.2.0.3 PSU 3 (or windows 11.2.0.3 patch bundle 7) or higher which contains the fix for bug 13483672

Please note that the GI PSU needs to be applied to both the Grid Infrastructure as well as the Database home!

RAC Instance Crashes During Startup Due To Error 495

相關文章