昨天處理了一起ora-00600的錯誤，其中也經歷了各種曲折，真是霧裡看花，看透了之後發現很多問題都是有原因的。
起初是開發說有一個job執行的時候報錯了，資料庫版本是11.2.0.2.0

等到問題提交到我這，客戶已經檢查了一些資訊了。但是還是沒有結論。
對於這個問題，我還是照例開始檢查資料庫日誌。
在那個時間段內裡出現了ora-00600的錯誤。
Wed Jun 10 13:47:17 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
Wed Jun 10 13:48:04 2015
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

在這個ora-00600錯誤前後又碰到了幾個奇怪的ora錯誤,我把錯誤日誌按照時間先後來排列一下。

Thread 1 advanced to log sequence 82829 (LGWR switch)
Current log# 1 seq# 82829 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A1/redo/redo01A.log
Current log# 1 seq# 82829 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B1/redo/redo01B.log
Archived Log entry 82894 added for thread 1 sequence 82828 ID 0xb8c6d509 dest 1:
Wed Jun 10 07:10:17 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18508.trc.
Wed Jun 10 07:10:21 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18593.trc.
Wed Jun 10 07:12:14 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_16505.trc.
Wed Jun 10 07:26:54 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18558.trc.
Wed Jun 10 07:52:47 2015
Thread 1 advanced to log sequence 82830 (LGWR switch)
Current log# 2 seq# 82830 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A2/redo/redo02A.log
Current log# 2 seq# 82830 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B2/redo/redo02B.log

Wed Jun 10 12:09:40 2015
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x210] [PC:0x193F4DF, kxfpqrclb()+71] [flags: 0x0, count: 1]
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p135_15651.trc (incident=2116204):
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2116204/PRODB_p135_15651_i2116204.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jun 10 12:09:41 2015
Dumping diagnostic data in directory=[cdmp_20150610120941], requested by (instance=1, osid=15651 (P135)), summary=[incident=2116204].
Wed Jun 10 12:09:44 2015
Sweep [inc][2116204]: completed
Sweep [inc2][2116204]: completed
Wed Jun 10 12:09:59 2015

Wed Jun 10 13:46:27 2015
Archived Log entry 82967 added for thread 1 sequence 82901 ID 0xb8c6d509 dest 1:
Wed Jun 10 13:47:17 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
Wed Jun 10 13:48:04 2015

Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
Wed Jun 10 14:38:53 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

為了更加清晰，我把對應的ORA錯誤和資料庫負載聯絡在一起。
對於deadlock的錯誤，很可能是應用死鎖造成的，簡單檢視了下trace日誌，做了基本確認，就交給開發去分析這部分了。
從後續的錯誤情況來看，似乎和後續的問題沒有直接關係。我們暫時先放下這個deaklock的錯誤。所以在圖中沒有標註出來。
其它三個都用紅色標註出來。可以看出在負載開始增加的幾個時間點裡，依次發生了幾個ORA錯誤。

第一個錯誤。
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
可以從日誌資訊看出，似乎是和並行是相關的。
對於這個錯誤。在metalink 中查到一篇有些相似的文章。
ORA-07445:[kxfpqrclb()+72] [SIGSEGV] And ORA-10382 (Doc ID 1987833.1)
看了solution讓我有些失望，需要升級到12.2版本。

SOLUTION

Unpublished Bug 16682786 : HIT ORA-7445 [KXFPQRCLB+1691] WHEN RUN SHARED CURSOR TEST

The base bug is fixed in 12.2 release.

Check patch downling link for availability of patch, if patch does not exists then create an SR with oracle support with output of "opatch lsinventory -details" to get a patch..

暫時也不明朗，繼續跳過看下一個錯誤。
第二個錯誤
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
對於這個問題，metalink中確實有一個相關的文章 Query Fails with ORA-00600: Internal Error Code, Arguments: [srsnext_3] (Doc ID 1589589.1)
但是對於這個問題，提供的solution讓我也有些無奈。因為段時間內確實沒有升級的計劃。

CAUSE

Bug 11852469 : TS11.2.0.3V3 - TRC - SRSNEXT.

Rediscovery information:

If the srsnext_3 internal error is raised and the query involves statistical functions or other aggregates that are treated as distinct aggregates then you may be encountering this problem.

SOLUTION

Apply patch 11852469 if it exists for your version/platform

or

Apply patchset 11.2.0.3 where the fix is included

第三個錯誤。
Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
Wed Jun 10 14:38:53 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

對於這個錯誤，metalink中有一篇相關的文章。診斷並解決 ORA-4030 錯誤 (Doc ID 1548826.1)

從metalink的描述來看，該錯誤意味著 Oracle Server 程式無法從作業系統分配更多記憶體。該記憶體由 PGA（Program Global Area）組成，其內容取決於伺服器配置。對於專用的伺服器程式，記憶體包含堆疊以及用於儲存使用者會話資料、遊標資訊和排序區的 UGA（User Global Area）。在多執行緒配置中（共享伺服器），UGA 被分配在 SGA（System Global Area）中，所以在這種配置下 UGA 不是造成 ORA-4030 錯誤的原因。
因此，ORA-4030 表示程式需要更多記憶體（堆疊 UGA 或 PGA）來執行其任務。
看起來是快取配置出問題了。
先賣個關子，其實事實並非如此，而且ora-00600的錯誤，如果不是反覆出現，嚴重影響，是不會直接去考慮打補丁，可能透過一些其它的方式去做。後續會結合一些分析方法來看。

一次ORA-00600問題的排查和分析(上）

SOLUTION

CAUSE

SOLUTION

相關文章