ORA-04031錯誤導致當機案例分析
今天遇到一起ORACLE資料庫當機案例,下面是對這起資料庫當機案例的原因進行分析、解讀。分析過程中順便記錄一下這個案例的前因後果,攢點經驗值,培養一下分析、解決問題的能力。
案例環境:
作業系統 :Oracle Linux Server release 5.7 64 bit
資料庫版本:Oracle Database 10g Release 10.2.0.4.0 - 64bit Production
案例分析:
收到告警去檢查資料庫時,發現例項已經當機。檢查告警日誌,發現下面錯誤資訊:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
Mon Nov 2 11:43:00 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")
Mon Nov 2 11:43:00 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
Mon Nov 2 11:43:05 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")
Mon Nov 2 11:43:05 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
Mon Nov 2 11:43:08 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_reco_6569.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select host,userid,password,...","sql area","tmp")
Mon Nov 2 11:43:08 2015
RECO: terminating instance due to error 4031
Mon Nov 2 11:43:08 2015
Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_6555.trc:
ORA-04031: unable to allocate bytes of shared memory ("","","","")
Instance terminated by RECO, pid = 6569
從告警日誌我們可以看到ORA-00604與ORA-04031錯誤導致了這次當機事故(RECO: terminating instance due to error 4031):
$ oerr ora 4031
04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\",\"%s\")"
// *Cause: More shared memory is needed than was allocated in the shared
// pool.
// *Action: If the shared pool is out of memory, either use the
// dbms_shared_pool package to pin large packages,
// reduce your use of shared memory, or increase the amount of
// available shared memory by increasing the value of the
// INIT.ORA parameters "shared_pool_reserved_size" and
// "shared_pool_size".
// If the large pool is out of memory, increase the INIT.ORA
// parameter "large_pool_size".
一般出現ORA-04031錯誤可能由兩個原因引起:
1:記憶體中存在大量碎片,導致在分配記憶體的時候,沒有連續的記憶體可存放,此問題一般是需要在開發的角度上入手,比如增加繫結變數,減少硬解析來改善和避免;
2.記憶體容量不足,需要擴大記憶體。
這臺機器分配的實體記憶體為8G,結果檢查發現SGA只分配了1168M,不到2G,瞬時碉堡了。此時真是很無語。ASH Report分析當機前後的Buffer Cache和Shared Pool大小如下所示。
檢視跟蹤檔案,可以看到SGA: allocation forcing component growth等待事件,可以確認的是由於SGA無法增長導致,也就是SGA被撐爆了,結合ASH Report我們可以看到當時Shared Pool的大小已經接近SGA的69.6%大小。
SO: 0xa617d9c0, type: 4, owner: 0xa8a26c68, flag: INIT/-/-/0x00
(session) sid: 932 trans: (nil), creator: 0xa8a26c68, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-000A-00000003, short-term DID: 0000-0000-00000000
txn branch: (nil)
oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
last wait for 'SGA: allocation forcing component growth' blocking sess=0x(nil) seq=51324 wait_time=10714 seconds since wait started=0
=0, =0, =0
Dumping Session Wait History
for 'SGA: allocation forcing component growth' count=1 wait_time=10714
=0, =0, =0
for 'SGA: allocation forcing component growth' count=1 wait_time=10512
=0, =0, =0
for 'latch: shared pool' count=1 wait_time=892
address=600e7320, number=d6, tries=0
for 'latch: shared pool' count=1 wait_time=28
address=600e7320, number=d6, tries=0
for 'latch: shared pool' count=1 wait_time=51
address=600e7320, number=d6, tries=0
for 'latch: shared pool' count=1 wait_time=114
address=600e7320, number=d6, tries=0
for 'latch: shared pool' count=1 wait_time=120
address=600e7320, number=d6, tries=0
for 'latch: library cache' count=1 wait_time=33
address=a3fa46e8, number=d7, tries=1
結合上面的一些分析,可以斷定SGA的不合理設定導致shared pool的記憶體被全部耗盡,SGA被撐爆了。於是調整SGA的引數才是解決問題的正確對策。另外考慮到這個資料庫也正常執行了較長一段時間,也分析了一下awr、addm報告,發現系統的硬解析相當嚴重。另外通過下面指令碼觀察了一段時間shared pool的變化,發現其收縮、增長較頻繁。
SELECT start_time,
component,
oper_type,
oper_mode,
initial_size / 1024 / 1024 "INITIAL",
final_size / 1024 / 1024 "FINAL",
end_time
FROM v$sga_resize_ops
WHERE component IN ( 'DEFAULT buffer cache', 'shared pool' )
AND status = 'COMPLETE'
ORDER BY start_time,
component;
這個可以通過設定資料庫引數SHARED_POOL_SIZE,保證SHARED_POOL_SIZE大小不會由於記憶體緊張而低於這個大小,另外可以設定SGA resize的時間間隔
ALTER SYSTEM SET “_memory_broker_stat_interval”=n SCOPE=SPFILE;
問題雖然解決了,但是真正需要反思的是為什麼這個SGA_MAX_SIZE設定為1168M大小的事情!而且沒有在巡檢當中被發現。
參考資料:
http://blog.csdn.net/wenzhongyan/article/details/29866845
http://blog.chinaunix.net/uid-20802110-id-4188357.html
相關文章
- ORA-04031錯誤分析
- 當機導致slave異常分析
- 案例分析:ora-04031與ora-04030錯誤分析與解決
- ORA-04031錯誤分析和解決
- SHARED POOL ORA-04031錯誤分析
- merge語句導致的ORA錯誤分析
- MySQL Bug導致異常當機的分析流程MySql
- 核心引數導致的備庫當機分析
- 版本不當導致的exp出錯
- 多餘索引導致explain錯誤索引AI
- 又見想當然導致的誤譯
- 動態建立 @ViewChild 導致執行時錯誤的原因分析View
- ORA-04031錯誤詳解
- impdp時parallel=4導致的錯誤Parallel
- ORA-04031錯誤的處理
- win10系統提示dcom遇到錯誤1068導致當機的解決步驟Win10
- MySQL 網路導致的複製報錯案例MySql
- ORA-07445錯誤導致叢集CI鎖的問題分析
- 如何解決ORA-04031 錯誤(轉)
- 如何解決ORA-04031 錯誤(zt)
- 一條sql語句導致的資料庫當機問題及分析SQL資料庫
- 一條sql語句“導致”的資料庫當機問題及分析SQL資料庫
- win10系統出現dcom錯誤1068導致藍色畫面當機如何解決Win10
- 硬體或軟體衝突導致當機
- 驅動導致的當機怎麼解決
- 診斷並解決ORA-04031 錯誤
- LGWR寫操作會導致效能全域性卡頓案例分析
- git合併丟失程式碼問題分析與解決(錯誤操作導致)Git
- 【LISTENER】修改監聽密碼導致NL-00051錯誤的分析與總結密碼
- Lombok 的@ToString導致的Maven編譯錯誤LombokMaven編譯
- 如何解決url傳參導致錯誤問題
- 修復svn hook導致的字符集錯誤Hook
- Oracle GoldenGate導致IMP出現ORACLE 32588錯誤OracleGo
- 錯誤思維導向致IT專案問題多
- sys密碼修改導致的RMAN-00571錯誤密碼
- goldengate命令輸入錯誤導致的血案2Go
- 編譯過程導致ORA-4068錯誤編譯
- CHAR型別函式索引導致結果錯誤型別函式索引