一次ORA-00600問題的排查和分析(上)
昨天處理了一起ora-00600的錯誤,其中也經歷了各種曲折,真是霧裡看花,看透了之後發現很多問題都是有原因的。
起初是開發說有一個job執行的時候報錯了,資料庫版本是11.2.0.2.0
等到問題提交到我這,客戶已經檢查了一些資訊了。但是還是沒有結論。
對於這個問題,我還是照例開始檢查資料庫日誌。
在那個時間段內裡出現了ora-00600的錯誤。
Wed Jun 10 13:47:17 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
Wed Jun 10 13:48:04 2015
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
在這個ora-00600錯誤前後又碰到了幾個奇怪的ora錯誤,我把錯誤日誌按照時間先後來排列一下。
Thread 1 advanced to log sequence 82829 (LGWR switch)
Current log# 1 seq# 82829 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A1/redo/redo01A.log
Current log# 1 seq# 82829 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B1/redo/redo01B.log
Archived Log entry 82894 added for thread 1 sequence 82828 ID 0xb8c6d509 dest 1:
Wed Jun 10 07:10:17 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18508.trc.
Wed Jun 10 07:10:21 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18593.trc.
Wed Jun 10 07:12:14 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_16505.trc.
Wed Jun 10 07:26:54 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18558.trc.
Wed Jun 10 07:52:47 2015
Thread 1 advanced to log sequence 82830 (LGWR switch)
Current log# 2 seq# 82830 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A2/redo/redo02A.log
Current log# 2 seq# 82830 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B2/redo/redo02B.log
Wed Jun 10 12:09:40 2015
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x210] [PC:0x193F4DF, kxfpqrclb()+71] [flags: 0x0, count: 1]
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p135_15651.trc (incident=2116204):
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2116204/PRODB_p135_15651_i2116204.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jun 10 12:09:41 2015
Dumping diagnostic data in directory=[cdmp_20150610120941], requested by (instance=1, osid=15651 (P135)), summary=[incident=2116204].
Wed Jun 10 12:09:44 2015
Sweep [inc][2116204]: completed
Sweep [inc2][2116204]: completed
Wed Jun 10 12:09:59 2015
Wed Jun 10 13:46:27 2015
Archived Log entry 82967 added for thread 1 sequence 82901 ID 0xb8c6d509 dest 1:
Wed Jun 10 13:47:17 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
Wed Jun 10 13:48:04 2015
Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
Wed Jun 10 14:38:53 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
為了更加清晰,我把對應的ORA錯誤和資料庫負載聯絡在一起。
對於deadlock的錯誤,很可能是應用死鎖造成的,簡單檢視了下trace日誌,做了基本確認,就交給開發去分析這部分了。
從後續的錯誤情況來看,似乎和後續的問題沒有直接關係。我們暫時先放下這個deaklock的錯誤。所以在圖中沒有標註出來。
其它三個都用紅色標註出來。可以看出在負載開始增加的幾個時間點裡,依次發生了幾個ORA錯誤。
第一個錯誤。
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
可以從日誌資訊看出,似乎是和並行是相關的。
對於這個錯誤。在metalink 中查到一篇有些相似的文章。
ORA-07445:[kxfpqrclb()+72] [SIGSEGV] And ORA-10382 (Doc ID 1987833.1)
看了solution讓我有些失望,需要升級到12.2版本。
第二個錯誤
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
對於這個問題,metalink中確實有一個相關的文章 Query Fails with ORA-00600: Internal Error Code, Arguments: [srsnext_3] (Doc ID 1589589.1)
但是對於這個問題,提供的solution讓我也有些無奈。因為段時間內確實沒有升級的計劃。
起初是開發說有一個job執行的時候報錯了,資料庫版本是11.2.0.2.0
等到問題提交到我這,客戶已經檢查了一些資訊了。但是還是沒有結論。
對於這個問題,我還是照例開始檢查資料庫日誌。
在那個時間段內裡出現了ora-00600的錯誤。
Wed Jun 10 13:47:17 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
Wed Jun 10 13:48:04 2015
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
在這個ora-00600錯誤前後又碰到了幾個奇怪的ora錯誤,我把錯誤日誌按照時間先後來排列一下。
Thread 1 advanced to log sequence 82829 (LGWR switch)
Current log# 1 seq# 82829 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A1/redo/redo01A.log
Current log# 1 seq# 82829 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B1/redo/redo01B.log
Archived Log entry 82894 added for thread 1 sequence 82828 ID 0xb8c6d509 dest 1:
Wed Jun 10 07:10:17 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18508.trc.
Wed Jun 10 07:10:21 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18593.trc.
Wed Jun 10 07:12:14 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_16505.trc.
Wed Jun 10 07:26:54 2015
ORA-00060: Deadlock detected. More info in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_18558.trc.
Wed Jun 10 07:52:47 2015
Thread 1 advanced to log sequence 82830 (LGWR switch)
Current log# 2 seq# 82830 mem# 0: /dbccbPR1/oracle/PRODB/redolog_A2/redo/redo02A.log
Current log# 2 seq# 82830 mem# 1: /dbccbPR1/oracle/PRODB/redolog_B2/redo/redo02B.log
Wed Jun 10 12:09:40 2015
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x210] [PC:0x193F4DF, kxfpqrclb()+71] [flags: 0x0, count: 1]
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p135_15651.trc (incident=2116204):
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2116204/PRODB_p135_15651_i2116204.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Wed Jun 10 12:09:41 2015
Dumping diagnostic data in directory=[cdmp_20150610120941], requested by (instance=1, osid=15651 (P135)), summary=[incident=2116204].
Wed Jun 10 12:09:44 2015
Sweep [inc][2116204]: completed
Sweep [inc2][2116204]: completed
Wed Jun 10 12:09:59 2015
Wed Jun 10 13:46:27 2015
Archived Log entry 82967 added for thread 1 sequence 82901 ID 0xb8c6d509 dest 1:
Wed Jun 10 13:47:17 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_p070_1200.trc (incident=2124332):
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2124332/PRODB_p070_1200_i2124332.trc
Wed Jun 10 13:48:04 2015
Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
Wed Jun 10 14:38:53 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
為了更加清晰,我把對應的ORA錯誤和資料庫負載聯絡在一起。
對於deadlock的錯誤,很可能是應用死鎖造成的,簡單檢視了下trace日誌,做了基本確認,就交給開發去分析這部分了。
從後續的錯誤情況來看,似乎和後續的問題沒有直接關係。我們暫時先放下這個deaklock的錯誤。所以在圖中沒有標註出來。
其它三個都用紅色標註出來。可以看出在負載開始增加的幾個時間點裡,依次發生了幾個ORA錯誤。
第一個錯誤。
ORA-07445: exception encountered: core dump [kxfpqrclb()+71] [SIGSEGV] [ADDR:0x210] [PC:0x193F4DF] [Address not mapped to object] []
ORA-10382: parallel query server interrupt (reset)
可以從日誌資訊看出,似乎是和並行是相關的。
對於這個錯誤。在metalink 中查到一篇有些相似的文章。
ORA-07445:[kxfpqrclb()+72] [SIGSEGV] And ORA-10382 (Doc ID 1987833.1)
看了solution讓我有些失望,需要升級到12.2版本。
SOLUTION
Unpublished Bug 16682786 : HIT ORA-7445 [KXFPQRCLB+1691] WHEN RUN SHARED CURSOR TEST
The base bug is fixed in 12.2 release.
Check patch downling link for availability of patch, if patch does not exists then create an SR with oracle support with output of "opatch lsinventory -details" to get a patch..
暫時也不明朗,繼續跳過看下一個錯誤。第二個錯誤
ORA-00600: internal error code, arguments: [srsnext_3], [], [], [], [], [], [], [], [], [], [], []
對於這個問題,metalink中確實有一個相關的文章 Query Fails with ORA-00600: Internal Error Code, Arguments: [srsnext_3] (Doc ID 1589589.1)
但是對於這個問題,提供的solution讓我也有些無奈。因為段時間內確實沒有升級的計劃。
CAUSE
Bug 11852469 : TS11.2.0.3V3 - TRC - SRSNEXT.
Rediscovery information:
If the srsnext_3 internal error is raised and the query involves statistical functions or other aggregates that are treated as distinct aggregates then you may be encountering this problem.
SOLUTION
Apply patch 11852469 if it exists for your version/platform
or
Apply patchset 11.2.0.3 where the fix is included
第三個錯誤。
Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
Wed Jun 10 14:38:53 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Archived Log entry 82993 added for thread 1 sequence 82927 ID 0xb8c6d509 dest 1:
Wed Jun 10 14:38:53 2015
Errors in file /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/trace/PRODB_ora_15548.trc (incident=2110436):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pmucalm coll)
Incident details in: /opt/app/oracle/dbccbspr1/diag/rdbms/PRODB/PRODB/incident/incdir_2110436/PRODB_ora_15548_i2110436.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
對於這個錯誤,metalink中有一篇相關的文章。診斷並解決 ORA-4030 錯誤 (Doc ID 1548826.1)
從metalink的描述來看,該錯誤意味著 Oracle Server 程式無法從作業系統分配更多記憶體。該記憶體由 PGA(Program Global Area)組成,其內容取決於伺服器配置。對於專用的伺服器程式,記憶體包含堆疊以及用於儲存使用者會話資料、遊標資訊和排序區的 UGA(User Global Area)。在多執行緒配置中(共享伺服器),UGA 被分配在 SGA(System Global Area)中,所以在這種配置下 UGA 不是造成 ORA-4030 錯誤的原因。
因此,ORA-4030 表示程式需要更多記憶體(堆疊 UGA 或 PGA)來執行其任務。
看起來是快取配置出問題了。
先賣個關子,其實事實並非如此,而且ora-00600的錯誤,如果不是反覆出現,嚴重影響,是不會直接去考慮打補丁,可能透過一些其它的方式去做。後續會結合一些分析方法來看。
從metalink的描述來看,該錯誤意味著 Oracle Server 程式無法從作業系統分配更多記憶體。該記憶體由 PGA(Program Global Area)組成,其內容取決於伺服器配置。對於專用的伺服器程式,記憶體包含堆疊以及用於儲存使用者會話資料、遊標資訊和排序區的 UGA(User Global Area)。在多執行緒配置中(共享伺服器),UGA 被分配在 SGA(System Global Area)中,所以在這種配置下 UGA 不是造成 ORA-4030 錯誤的原因。
因此,ORA-4030 表示程式需要更多記憶體(堆疊 UGA 或 PGA)來執行其任務。
看起來是快取配置出問題了。
先賣個關子,其實事實並非如此,而且ora-00600的錯誤,如果不是反覆出現,嚴重影響,是不會直接去考慮打補丁,可能透過一些其它的方式去做。後續會結合一些分析方法來看。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23718752/viewspace-1696076/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 一次ORA-00600問題的排查和分析(下)
- 關於SQLRecoverableException問題的排查和分析SQLException
- 記一次oom問題排查OOM
- 記錄一次問題排查
- 一次容器MySQL的效能問題排查MySql
- 記一次排查CPU高的問題
- 記一次 Laravel MethodNotAllowedHttpException 問題排查LaravelHTTPException
- 一次快取效能問題排查快取
- 記一次線上FGC問題排查GC
- 記一次OOM問題排查過程OOM
- 一次線上問題的排查解決過程
- 一次線上CPU高的問題排查實踐
- 一次線上問題排查所引發的思考
- 排查問題的思路和清單
- 【問題排查篇】一次業務問題對 ES 的 cardinality 原理探究
- 記一次線上websocket返回400問題排查Web
- 記一次 MySQL 資料庫問題排查MySql資料庫
- 一次ygc越來越慢的問題排查過程GC
- 記一次線上崩潰問題的排查過程
- 記一次棧溢位異常問題的排查
- 從一次問題排查聊聊問什麼要懂原理
- 記一次記憶體溢位問題的排查、分析過程及解決思路記憶體溢位
- 一次資料庫響應緩慢的問題排查資料庫
- 記一次 Kafka 重啟失敗問題排查Kafka
- Arthas常用功能及一次線上問題排查
- 一次IOS通知推送問題排查全過程iOS
- 記一次線上報錯日誌問題排查
- 線上問題排查:記一次 Redis Cluster Pipeline 導致的死鎖問題Redis
- 記一次協助排查許可權問題的經歷
- 測不準原理?記一次Guava佇列問題的排查Guava佇列
- 一次資料庫無法登陸的問題及排查資料庫
- 一次資料庫無法登陸的"問題"及排查資料庫
- job處理緩慢的效能問題排查與分析
- 一次郵件傳送協議SMTP問題排查協議
- Java線上問題排查神器Arthas實戰分析Java
- 記一次SparkStreaming不產生新的batchJob的問題排查SparkBAT
- Dev 日誌 | 一次 Segmentation Fault 和 GCC Illegal Instruction 編譯問題排查devSegmentationGCStruct編譯
- java問題排查Java