ORA-04031錯誤導致當機案例分析

學步159發表於2020-12-16

原文網址 : https://blog.csdn.net/wgmes/article/details/111268346

ORA-04031錯誤導致當機案例分析

2015-11-03 15:34 瀟湘隱者閱讀(10870) 評論(1) 編輯收藏

今天遇到一起ORACLE資料庫當機案例，下面是對這起資料庫當機案例的原因進行分析、解讀。分析過程中順便記錄一下這個案例的前因後果，攢點經驗值，培養一下分析、解決問題的能力。

案例環境：

作業系統：Oracle Linux Server release 5.7 64 bit

資料庫版本：Oracle Database 10g Release 10.2.0.4.0 - 64bit Production

案例分析：

收到告警去檢查資料庫時，發現例項已經當機。檢查告警日誌，發現下面錯誤資訊：

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")

Mon Nov  2 11:43:00 2015

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")

Mon Nov  2 11:43:00 2015

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")

Mon Nov  2 11:43:05 2015

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")

Mon Nov  2 11:43:05 2015

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")

Mon Nov  2 11:43:08 2015

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_reco_6569.trc:

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select host,userid,password,...","sql area","tmp")

Mon Nov  2 11:43:08 2015

RECO: terminating instance due to error 4031

Mon Nov  2 11:43:08 2015

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_6555.trc:

ORA-04031: unable to allocate  bytes of shared memory ("","","","")

Instance terminated by RECO, pid = 6569

從告警日誌我們可以看到ORA-00604與ORA-04031錯誤導致了這次當機事故（RECO: terminating instance due to error 4031）：

$ oerr ora 4031

04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\",\"%s\")"

// *Cause: More shared memory is needed than was allocated in the shared

// pool.

// *Action: If the shared pool is out of memory, either use the

// dbms_shared_pool package to pin large packages,

// reduce your use of shared memory, or increase the amount of

// available shared memory by increasing the value of the

// INIT.ORA parameters "shared_pool_reserved_size" and

// "shared_pool_size".

// If the large pool is out of memory, increase the INIT.ORA

// parameter "large_pool_size".

一般出現ORA-04031錯誤可能由兩個原因引起：

1：記憶體中存在大量碎片，導致在分配記憶體的時候，沒有連續的記憶體可存放，此問題一般是需要在開發的角度上入手，比如增加繫結變數，減少硬解析來改善和避免；

2.記憶體容量不足，需要擴大記憶體。

這臺機器分配的實體記憶體為8G，結果檢查發現SGA只分配了1168M，不到2G，瞬時碉堡了。此時真是很無語。ASH Report分析當機前後的Buffer Cache和Shared Pool大小如下所示。

檢視跟蹤檔案，可以看到SGA: allocation forcing component growth等待事件，可以確認的是由於SGA無法增長導致，也就是SGA被撐爆了，結合ASH Report我們可以看到當時Shared Pool的大小已經接近SGA的69.6%大小。

SO: 0xa617d9c0, type: 4, owner: 0xa8a26c68, flag: INIT/-/-/0x00

  (session) sid: 932 trans: (nil), creator: 0xa8a26c68, flag: (51) USR/- BSY/-/-/-/-/-

            DID: 0001-000A-00000003, short-term DID: 0000-0000-00000000

            txn branch: (nil)

            oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS

  last wait for 'SGA: allocation forcing component growth' blocking sess=0x(nil) seq=51324 wait_time=10714 seconds since wait started=0

          =0, =0, =0

  Dumping Session Wait History

   for 'SGA: allocation forcing component growth' count=1 wait_time=10714

          =0, =0, =0

   for 'SGA: allocation forcing component growth' count=1 wait_time=10512

          =0, =0, =0

   for 'latch: shared pool' count=1 wait_time=892

          address=600e7320, number=d6, tries=0

   for 'latch: shared pool' count=1 wait_time=28

          address=600e7320, number=d6, tries=0

   for 'latch: shared pool' count=1 wait_time=51

          address=600e7320, number=d6, tries=0

   for 'latch: shared pool' count=1 wait_time=114

          address=600e7320, number=d6, tries=0

   for 'latch: shared pool' count=1 wait_time=120

          address=600e7320, number=d6, tries=0

   for 'latch: library cache' count=1 wait_time=33

          address=a3fa46e8, number=d7, tries=1

結合上面的一些分析，可以斷定SGA的不合理設定導致shared pool的記憶體被全部耗盡，SGA被撐爆了。於是調整SGA的引數才是解決問題的正確對策。另外考慮到這個資料庫也正常執行了較長一段時間，也分析了一下awr、addm報告，發現系統的硬解析相當嚴重。另外通過下面指令碼觀察了一段時間shared pool的變化，發現其收縮、增長較頻繁。

SELECT start_time,

       component,

       oper_type,

       oper_mode,

       initial_size / 1024 / 1024 "INITIAL",

       final_size / 1024 / 1024   "FINAL",

       end_time

FROM   v$sga_resize_ops

WHERE  component IN ( 'DEFAULT buffer cache', 'shared pool' )

       AND status = 'COMPLETE'

ORDER  BY start_time,

          component;

這個可以通過設定資料庫引數SHARED_POOL_SIZE，保證SHARED_POOL_SIZE大小不會由於記憶體緊張而低於這個大小，另外可以設定SGA resize的時間間隔

ALTER SYSTEM SET “_memory_broker_stat_interval”=n SCOPE=SPFILE;

問題雖然解決了，但是真正需要反思的是為什麼這個SGA_MAX_SIZE設定為1168M大小的事情！而且沒有在巡檢當中被發現。

參考資料：

http://blog.csdn.net/wenzhongyan/article/details/29866845

http://blog.chinaunix.net/uid-20802110-id-4188357.html

http://www.oraclefreebase.com/blog/2015/10/%E6%95%B0%E6%8D%AE%E5%BA%93ora-4031%E6%95%B0%E6%8D%AE%E5%BA%93crash/

作者：瀟湘隱者

出處： http://www.cnblogs.com/kerrycode/

win10系統提示dcom遇到錯誤1068導致當機的解決步驟
2020-08-30
Win10
如何解決ORA-04031 錯誤(轉)
2019-07-11
又見想當然導致的誤譯
2018-08-10
動態建立 @ViewChild 導致執行時錯誤的原因分析
2022-03-05
View
win10系統出現dcom錯誤1068導致藍色畫面當機如何解決
2020-04-27
Win10
MySQL 網路導致的複製報錯案例
2022-08-02
MySql
Lombok 的@ToString導致的Maven編譯錯誤
2021-06-01
LombokMaven編譯
MySQL Online DDL導致全域性鎖表案例分析
2019-05-11
MySql
git合併丟失程式碼問題分析與解決(錯誤操作導致)
2020-11-27
Git
升級Xcode10導致的編譯錯誤
2018-11-01
XCode編譯
[譯] RxJS: 避免因濫用 switchMap 而導致錯誤
2019-04-14
JS
php輸出json資料，導致前端js判斷錯誤的分析及解決
2022-04-01
PHPJSON前端
[20190104]sga_target 的設定和ORA-04031錯誤.txt
2019-01-04
12.2.0.1bug導致的Failed to register in OCRLOCAL group.錯誤
2018-07-24
AI
smt加工這些失誤會導致smt加工出錯
2020-07-29
PHP程式導致伺服器當機怎麼辦
2022-06-01
PHP伺服器
利用Windbg分析Magicodes.IE一次錯誤編寫導致記憶體劇增
2021-12-12
記憶體
LGWR寫操作會導致效能全域性卡頓案例分析
2024-02-20
【北亞伺服器資料恢復】LUN對映出錯導致檔案系統一致性錯誤的資料恢復案例
2022-03-31
伺服器資料恢復
案例解析：執行緒池使用不當導致的系統崩潰
2019-06-16
執行緒
Runtime PM 處理不當導致的 external abort on non-linefetch 案例分享
2021-06-02
PostgreSQL DBA(29) - Backup&Recovery#2(日期格式導致的錯誤)
2019-03-11
SQL
Session物件改變請求頭值導致的401錯誤
2023-11-20
Session物件
什麼會導致HTTP代理出現400錯誤請求
2021-10-15
HTTP
記錄一次homestead意外關閉導致的錯誤
2021-05-31
Oracle 12c因bug導致ORA-04031問題處理過程
2019-08-23
Oracle
Apache Solr錯誤預設配置導致的RCE（CVE-2019-12409）
2020-11-22
ApacheSolr
什麼會導致HTTP出現429請求過多錯誤？
2021-11-22
HTTP
laravel Route RESTful 因路由先後順序導致的解析錯誤
2021-11-23
LaravelREST路由
故障分析 | 手動 rm 掉 binlog 導致主從報錯
2022-05-16
Hadoop錯誤之namenode當機的資料恢復
2018-04-08
Hadoop資料恢復
Oracle當number型別超過一定長度直方圖限制導致SQL執行計劃錯誤
2021-01-05
Oracle型別直方圖SQL
解決 PBootCMS 中因資料庫名稱錯誤導致的“執行 SQL 發生錯誤！錯誤：no such table: ay_config”問題
2024-10-01
boot資料庫SQL
伺服器當機會導致Kafka訊息丟失嗎
2022-08-06
伺服器Kafka
sql最佳化-錯誤強制型別轉換導致索引失效
2024-08-26
SQL型別索引
Redis使用不當導致應用卡死
2022-12-08
Redis
複製錯誤案例分享（一）
2018-11-12
複製錯誤案例分享（二）
2018-11-12

ORA-04031錯誤導致當機案例分析

ORA-04031錯誤導致當機案例分析

相關文章