環境:
OS:AIX5.3
DB:oracle 9.2.0.8
現象:
在關閉資料時報下面的錯誤:
LGECHPOPHZ1Q:ora9i:HZCDRMES:/db02/app/oracle/product/9.2.0/dbs> sqlplus "/as sysdba"
SQL*Plus: Release 9.2.0.8.0 - Production on Mon Mar 7 02:30:39 2011
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 2258618344 bytes
Fixed Size 743400 bytes
Variable Size 1593835520 bytes
Database Buffers 637534208 bytes
Redo Buffers 26505216 bytes
Database mounted.
Database opened.
SQL> show parameter cpu
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 80
parallel_threads_per_cpu integer 2
SQL> shutdown immediate
Database closed.
Database dismounted.
ORA-00600: internal error code, arguments: [504], [0x70000000006CA40], [640], [7], [shared pool], [3], [0], [0x70000000006C860]
SQL> exit
--伺服器配置
LGECHPOPHZ1Q:root::/> prtconf | more
System Model: IBM,9119-595
Machine Serial Number: 83F4A4C
Processor Type: PowerPC_POWER5
Number Of Processors: 40
Processor Clock Speed: 1654 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 1 LGECHPOPHZ1Q
Memory Size: 115200 MB
Good Memory Size: 115200 MB
Platform. Firmware level: Not Available
Firmware Version: IBM,SF240_358
Console Login: enable
Auto Restart: true
Full Core: true
分析:
從網上到的資料:
是由於oracle 9.2.0.8的bug引起的,新server的cpu增多,使得kgl_latch_count > 31
Symptoms
Mon Nov 22 08:50:45 2010
Errors in file /home/oracle/admin/orcl/udump/orcl_ora_127.trc:
ORA-00600: internal error code, arguments: [504], [0x380068D90], [160], [7], [shared pool], [2], [0], [0x380068CA0]
Mon Nov 22 08:51:03 2010
Errors in file /home/oracle/admin/orcl/udump/orcl_ora_444.trc:
ORA-00600: internal error code, arguments: [504], [0x380068E80], [160], [7], [shared pool], [3], [0], [0x380068D90]
The following query returns _kgl_latch_count > 31
SQL> select a.ksppinm aa, b.ksppstvl bb
from x$ksppi a, x$ksppsv b
where a.indx=b.indx
and a.ksppinm like '%latch%count%';
OR
the query may return query return that _kgl_latch_count is 0, but when issuing a 'show parameter cpu_count', it returns a value of 32 or greater.
SQL> show parameter cpu_count;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 128
Cause
This is Bug 5888835 ORA-600 [504] DURING FLUSH SHARED_POOL
closed as a duplicate of
Base Bug 5508574 ORA-00600 [99999] , ORA-07445 [KGSCDUMP()+673]
The latch directory size exceeds 255 when _kgl_latch_count > 31.
However, even when the _kgl_latch_count is equal to 0 (default value), if the cpu_count is >=32 the bug still applies.
This is due to the as the _kgl_latch_count default value is calculated as next prime number after the value returned by cpu_count. So, this bug could still apply if the cpu_count=32 as the _kgl_latch_count would be calcuated to the next prime number would be 37.
Solution
1. Upgrade to the 10.2.0.4 patchset or the 11g release.
OR
2. You can use workaround of setting parameter _kgl_latch_count <= 31 in your pfile/spfile.
(please note that this may adversely affect the concurrency)
OR
3. If available for your platform/version, download and apply Patch 5508574
解決:
最終通過修改initHZCDRMES.ora檔案裡
_kgl_latch_count=30
--再關閉資料庫時不會出現ora-600錯誤了
SQL> startup
ORACLE instance started.
Total System Global Area 1.0496E+10 bytes
Fixed Size 755264 bytes
Variable Size 7314866176 bytes
Database Buffers 3154116608 bytes
Redo Buffers 26505216 bytes
Database mounted.
Database opened.
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1.0496E+10 bytes
Fixed Size 755264 bytes
Variable Size 7314866176 bytes
Database Buffers 3154116608 bytes
Redo Buffers 26505216 bytes
Database mounted.
Database opened.
SQL> show parameter cpu
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 80
parallel_threads_per_cpu integer 2
相關:matalink查到資訊
Bug 5888835: ORA-600 [504] DURING FLUSH SHARED_POOL
--------------------------------------------------------------------------------
Bug 屬性
--------------------------------------------------------------------------------
型別 B - Defect 已在產品版本中修復 -
嚴重性 2 - Severe Loss of Service 產品版本 9.2.0.8
狀態 36 - Duplicate Bug. To Filer 平臺 59 - HP-UX PA-RISC (64-bit)
建立時間 17-Feb-2007 平臺版本 11.11
更新時間 18-Feb-2007 基本 Bug 5508574
資料庫版本 9.2.0.8
影響平臺 Generic
產品源 Oracle
相關產品
--------------------------------------------------------------------------------
產品線 Oracle Database Products 系列 Oracle Database
區域 Oracle Database 產品 5 - Oracle Server - Enterprise Edition
Hdr: 5888835 9.2.0.8 RDBMS 9.2.0.8 VOS HEAP MGMT PRODID-5 PORTID-59 ORA-600 5508574
Abstract: ORA-600 [504] DURING FLUSH SHARED_POOL
*** 02/17/07 03:49 am ***
TAR:
----
PROBLEM:
--------
Ct got the following error when upgrading 8.1.7.4 to 9.2.0.8. This seems
to occur when flushing shared_pool on the upgrade script.
ORA-600: internal error code, arguments: [504], [0xC00000038E5B2530],
[640], [7], [shared pool], [2], [0], [0xC00000038E5B2418]
This can also be reproduced by manually flushing shared pool.
The customer's system has 64 CPUs. The ORA-600 [504] was suppressed
when _kgl_latch_count=30 was set.
DIAGNOSTIC ANALYSIS:
--------------------
The heapdump level 2 information when ORA-600[504] occurred says the
following.
The number of next slot reaches to 255.
It seems to be the same issue as bug 5562921 (base bug 5508574).
Thanks.
====
/home/oracle/udump/dz00001_ora_22436.trc
Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.8.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0
System name: HP-UX
Node name: DC-0-001
Release: B.11.11
Version: U
Machine: 9000/800
Instance name: DZ00001
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 22436, image: oracle@DC-0-001 (TNS V1-V3)
*** 16:17:31.085
*** ID:(20.280) 2007-02-17 16:17:31.083
KGH Latch Directory Information
ldir state: 2 next slot: 255
Slot [ 1] Latch: c0000001a654c5c0 Index: 2 Flags: 3 State: 2 next:
0000000000000000
Slot [ 2] Latch: c000000182a04468 Index: 1 Flags: 3 State: 2 next:
0000000000000000
Slot [ 3] Latch: c0000001a654dcb0 Index: 2 Flags: 3 State: 2 next:
0000000000000000
Slot [ 4] Latch: c000000182a047e8 Index: 1 Flags: 3 State: 2 next:
0000000000000000
Slot [ 5] Latch: c000000182a04940 Index: 2 Flags: 3 State: 2 next:
0000000000000000
Slot [ 6] Latch: c0000001a654e340 Index: 1 Flags: 3 State: 2 next:
0000000000000000
Slot [ 7] Latch: c0000001a6553528 Index: 2 Flags: 3 State: 2 next:
0000000000000000
Slot [ 8] Latch: c0000001a65557f8 Index: 1 Flags: 3 State: 2 next:
0000000000000000
Slot [ 9] Latch: c0000001a6556988 Index: 2 Flags: 3 State: 2 next:
c0000001a65aff98
Slot [ 10] Latch: c0000001a6558328 Index: 1 Flags: 3 State: 2 next:
c0000001a65affb0
.
.
WORKAROUND:
-----------
Setting _kgl_latch_count=30
RELATED BUGS:
-------------
bug 5562921
bug 5508574
REPRODUCIBILITY:
----------------
It can be reproduced by manually flushing shared pool.
(without _kgl_latch_count=30)
TEST CASE:
----------
none
STACK TRACE:
------------
ksedmp ksddoa ksdpcg ksdpec ksfpec kgeriv kgesiv ksesic7 kslgetl ksfglt
kghupr_flg kghupr kglrfcl kglobcl kglobfr kglobf0 kglhpd_internal kglhpd
kghfrx kghfrunp kghfsh_helper kghfsh kkyasy opiexe opiall0 kpoal8 opiodr
ttcpip opitsk opiino opiodr opidrv sou2o main
SUPPORTING INFORMATION:
-----------------------
I will trace files later.
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
DIAL-IN INFORMATION:
--------------------
IMPACT DATE:
------------
*** 02/17/07 03:58 am *** (CHG: Sta->16)
*** 02/17/07 03:58 am ***
*** 02/17/07 04:40 am *** (CHG: Asg->NEW OWNER OWNER)
*** 02/17/07 05:45 am ***
Well, this seems to be same issue with bug#5562921.
heapdump level 2 shows
-------
*** ID:(20.280) 2007-02-17 16:17:31.083
KGH Latch Directory Information
ldir state: 2 next slot: 255
:
Slot [253] Latch: c000000190c02cb0 Index: 2 Flags: 3 State: 2 next:
0000
01a65b1420
Slot [254] Latch: c000000190c02b98 Index: 1 Flags: 3 State: 2 next:
0000
01a65b1438
This means we have registered 255 latches to kgh latch directory, and
it is full. (See bug#5508574 update by Joan at 09/08/06 12:24 pm, there
is small bug in this print routine and 255 is not next slot, but last slot)
ORA-600 means latch hierarchy violation. We tried to get 2nd child shared
pool latch with wait get, that is not permitted.
This happend when freeing library cache object from shared pool, we need
to get correct library cache latch to do it. However, probablly due to
kgl latch is not registered correctly to kgh latch directory, we end up
with requesting latch with wrong order, or strange latch.
kgl latch is not registered correctly because kgh latch directory is full.
It has 255 slots, and it seems full. We regsiter many latches to kgh latch
directory, however, if kgl latch number is big, we will fill kgh latch
directory and some latch is not registered correctly.
*** 02/17/07 05:47 am ***
In my 9.2.0.8 env (with 2 cpu) we register 208 latches.
number of latches which will be allocated to kgh latch directory is not
calculated easily. because we register many kind of latches.
heapdump level 2 will print latch address registered to kgh, so you can
find the name of latch via v$latch, v$latch_children and v$latchname.
To see if an instance gets this problem, taking heapdump level 2 and
check "next slot:" value. If it is 255, most likely that instance gets
this problem.
*** 02/17/07 05:48 am ***
*** 02/17/07 06:09 am ***
*** 02/17/07 06:35 am ***
*** 02/18/07 04:14 pm *** (CHG: Sta->36 SubComp->VOS HEAP MGMT)
*** 02/18/07 06:51 pm ***
From 9.2.0.8, we allocate many row cache latches compared to elder PSR.
This is due to enhauncement introduced in 5040691.
So, disable this fix is another workaround.
If I set _more_rowcache_latches = false in init.ora,
I see (only) 238 latches are registered to kgh latch directory
even I set
_kgl_latch_count=67
_kghdsidx_count=3
If I don't set _more_rowcache_latches = false,
I can see ORA-600 simply startup and shutdown database