頻發:故障排除之又見 ORA-4031丨雲和恩墨技術通訊
親愛的讀者朋友:
為了及時共享行業案例,通知共性問題,達成共享和提前預防,我們整理和編輯了《雲和恩墨技術通訊》,透過對過去一段時間的知識回顧,故障歸納,以期提供有價值的資訊供大家參考。同時,我們也希望能夠將熱點事件、新的產品特性及其他有價值的資訊聚集起來,為您提供具有前瞻性的支援資訊,保持對於當前最新的資料庫新聞和事件的瞭解,其中包括重要資料庫產品釋出、警報、更新、新版本、補丁等。
本期目錄:
新聞:2019年11月資料庫流行度排行
經驗:Oracle RAC跨節點訪問資料塊,節點長事務加劇gc等待
經驗:IBM MQ通道連線數達到最大故障分析
問題:子游標過多導致資料庫HANG
頻發:再談Library Cache Lock
頻發:故障排除之又見ORA-4031
警示:強制關閉OGG程式觸發bug致abended
公告:首屆墨天輪年度十大突出貢獻人物評選活動
雲和恩墨技術通訊集錦:
部分精選-頻發:故障排除之又見 ORA-4031 作者:候靜遠
當遇到ORA-4031錯誤時,你會不會內心一緊。Oracle程式在向SGA申請記憶體時,如果申請失敗,則會丟擲這個錯誤,大部分情況下是在向SGA中的 shared pool申請記憶體時失敗。嚴重情況下,可能導致資料庫出現異常崩潰。本文分享客戶近期碰到的一起由於ORA-4031問題導致資料庫異常當機的案例,供大家參考。
問題描述
2019年9月4日凌晨3點左右,接到監控系統告警:資料庫出現異常,無法連線。登陸到資料庫1節點檢視後臺alert日誌發現有大量ORA-04031報錯,2節點有少量報錯。為了儘快恢復業務,嘗試直接重啟1節點資料庫,重啟完成之後恢復正常。
問題分析
1. 節點後臺對應alert日誌:
Wed Sep 04 03:57:50 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(2,0)","kglsim object batch")
Wed Sep 04 03:58:10 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(1,0)","kglsim object batch")
Wed Sep 04 03:58:26 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(7,0)","kglsim object batch")
Wed Sep 04 03:58:42 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(6,0)","kglsim object batch")
Wed Sep 04 03:58:57 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.mon_mods$ set ins...","sga heap(5,0)","kglsim object batch")
Wed Sep 04 03:59:08 2019
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_xxx0_42548.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select count(*) from sys.job...","sga heap(3,0)","kglsim object batch")
Wed Sep 04 03:59:10 2019
License high water mark = 97
USER (ospid: 28750): terminating the instance
統計1節點每個子池及duration出現04031的次數,sga heap(n,0)-n代表第幾個子池,0代表是第幾個duration:
根據alert日誌可以看出,所有的ora-4031都發生在shared pool子池的第0個duration上。
Summary of resize operations history:
shared pool start 3.19 GB now 3.19 GB 0 grows 0 shrinks
large pool start 0.50 GB now 0.50 GB 0 grows 0 shrinks
java pool start 0.50 GB now 0.50 GB 0 grows 0 shrinks
SGA Target start 32.00 GB now 32.00 GB 0 grows 0 shrinks
DEFAULT buffer cache start 27.59 GB now 27.59 GB 0 grows 0 shrinks
PGA Target start 11.00 GB now 11.00 GB 0 grows 0 shrinks
發現shared pool並沒有進行resize。
==============================================
TOP 20 MEMORY USES ACROSS SGA HEAP 1 - 7
----------------------------------------------
"KGLH0 " 1103 MB 19%
"SQLA " 1081 MB 18%
"free memory " 835 MB 14%
"gcs resources " 794 MB 14%
"gcs shadows " 550 MB 9%
"db_block_hash_buckets " 178 MB 3%
"ASH buffers " 160 MB 3%
"KGLHD " 157 MB 3%
"Checkpoint queue " 156 MB 3%
"kglsim object batch " 90 MB 2%
"kglsim heap " 56 MB 1%
"ges resource " 53 MB 1%
"ges enqueues " 43 MB 1%
"KGLDA " 41 MB 1%
"dbwriter coalesce buffer " 40 MB 1%
"dirty object counts array " 40 MB 1%
"object queue " 35 MB 1%
"gcs res hash bucket " 32 MB 1%
"dbktb: trace buffer " 31 MB 1%
"FileOpenBlock " 30 MB 1%
TOTALS ---------------------------------------
Total free memory 830 MB
Total memory alloc. 5026 MB
Grand total 5856 MB
==============================================
2. 節點後臺對應的alert日誌:
Wed Sep 04 03:23:18 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_35378.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")
Wed Sep 04 03:23:23 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_35453.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(1,0)","kglsim object batch")
Wed Sep 04 03:23:29 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_35725.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(5,0)","kglsim object batch")
Wed Sep 04 03:23:34 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_35778.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(2,0)","kglsim object batch")
Wed Sep 04 03:23:39 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_36069.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(6,0)","kglsim object batch")
Wed Sep 04 03:23:45 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_36151.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(3,0)","kglsim object batch")
Wed Sep 04 03:23:50 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_36242.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(7,0)","kglsim object batch")
Wed Sep 04 03:23:55 2019
Emon ping encountered error 12801
Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_36305.trc:
ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)
ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")
統計2節點每個子池及duration出現04031的次數:
根據alert日誌可以看出,所有的ora-4031同樣都發生在shared pool子池的第0個duration上,導致4031的根本原因是因為shared pool子池的第0個duration記憶體不足。
透過設定sga_target的ASMM管理後,共享池(shared_pool)和流池(streams pool)每個子池都是4個duration。它們分別是:instance,session,cursor,execution,只有第四個duration,也就是execution是可以resize的,而當第0個duration記憶體不足的時候不能resize,就直接會報錯ora-4031。
問題解決
透過禁用duration,必須設定引數"_enable_shared_pool_durations=fales",並重啟資料庫。
alter system set "_enable_shared_pool_durations"=false scope=spfile;
透過該引數設定後,把它們四個duration都合併到一個池中,不會再出現一個duration的記憶體被耗盡,而另外一個duration仍具有空閒記憶體,對於共享池和流池都是這樣;設定sga_target之後,所有池都會透過buffer cache來傳輸granules(顆粒)整數倍大小的記憶體,如果shrink,則返回buffer cache,沒有從一個pool到另外一個pool的直接傳輸,所有的記憶體resize都會以buffer cache作為源和目標。
設定該引數的唯一負面影響是SGA resize的時候,不能從shared pool中取記憶體到其他的pool。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31556440/viewspace-2666444/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 《雲和恩墨技術通訊-2020.03》.pdf
- 串列埠通訊常見的錯誤和故障排除方法串列埠
- 【雲和恩墨大講堂】故障分析 | library cache latch 競爭案例分享
- 澳洲皇家墨爾本理工大學:技術故障是導致無人機事故頻發的原因無人機
- 雲資料庫的雲端故障排除策略:關鍵技術與實施方案資料庫
- 雲和恩墨獨家搶先測試In-Memory Option 特性!
- 併發技術3:管道通訊
- 筆記本常見故障與排除方法筆記
- 常見的網路故障排除辦法
- 通過redis的monitor命令排除故障Redis
- 【雲和恩墨】內外兼修:Oracle ACED熊軍談Oracle學習Oracle
- 【雲和恩墨】一次 truncate 核心表衍生的安全管理思考
- 從Oracle DBA出發,走進GaussDB的世界 - 雲和恩墨大講堂GaussDB專題Oracle
- 直播預告丨先睹為快!Oracle 20c新特性解析 - 2020雲和恩墨大講堂Oracle
- 電腦常見故障排除學習方法彙總
- Magic Box雲萌魔盒系統技術開發分析丨DAPP丨DEFI丨NFTAPP
- 【恩墨學院】恩墨學院獲得Oracle WDP全國授權Oracle
- 直播預告丨一名PGer帶你走進PostgreSQL的世界 - 雲和恩墨大講堂PG系列分享(1)SQL
- TCP/IP故障排除TCP
- 常見物聯網近距離無線通訊技術解析
- 樂訊通雲通訊:物聯卡,現代智慧家居技術的必不可缺
- 【雲和恩墨】嵌入雲端:12c Policy-Managed Cluster為Oracle DBaaS助力Oracle
- 【雲和恩墨】Oracle初學者入門指南-什麼是 Metalink 或 MOS ?Oracle
- 故障排除提示:5 個最常見的 Linux 問題Linux
- Java併發技術05:傳統執行緒同步通訊技術Java執行緒
- Longhorn 雲原生容器分散式儲存 - 故障排除指南分散式
- 訊號處理技術:現代通訊技術的基石
- 實時通訊技術大亂鬥
- 前端常用的通訊技術前端
- 雲和恩墨版Oracle Database 12c 最新體系結構圖下載OracleDatabase
- 常見電腦記憶體故障現象與排除方法記憶體
- 寬頻連線錯誤651怎麼辦 寬頻連線錯誤651故障排除方法
- 【恩墨學院】經典故障分析 - ASSM引發的索引爭用與 enq HW -contention 等待事件SSM索引ENQ事件
- ORACLE RAC GUARD故障排除——RAC GUARD概念和管理Oracle
- 雲和恩墨蓋國強:2020,這是資料庫最好的時代!資料庫
- 英飛凌成為GSMA會員:為行動通訊行業奉獻安全和寬頻技術專長行業
- 網路通訊技術基礎
- 無線通訊的分集技術