頻發:故障排除之又見 ORA-4031丨雲和恩墨技術通訊

資料和雲發表於2019-12-02


親愛的讀者朋友:


為了及時共享行業案例,通知共性問題,達成共享和提前預防,我們整理和編輯了《雲和恩墨技術通訊》,透過對過去一段時間的知識回顧,故障歸納,以期提供有價值的資訊供大家參考。同時,我們也希望能夠將熱點事件、新的產品特性及其他有價值的資訊聚集起來,為您提供具有前瞻性的支援資訊,保持對於當前最新的資料庫新聞和事件的瞭解,其中包括重要資料庫產品釋出、警報、更新、新版本、補丁等。


本期目錄:


新聞:2019年11月資料庫流行度排行

經驗:Oracle RAC跨節點訪問資料塊,節點長事務加劇gc等待

經驗:IBM MQ通道連線數達到最大故障分析

問題:子游標過多導致資料庫HANG

頻發:再談Library Cache Lock

頻發:故障排除之又見ORA-4031

警示:強制關閉OGG程式觸發bug致abended

公告:首屆墨天輪年度十大突出貢獻人物評選活動


雲和恩墨技術通訊集錦:


部分精選-頻發:故障排除之又見 ORA-4031  作者:候靜遠



當遇到ORA-4031錯誤時,你會不會內心一緊。Oracle程式在向SGA申請記憶體時,如果申請失敗,則會丟擲這個錯誤,大部分情況下是在向SGA中的 shared pool申請記憶體時失敗。嚴重情況下,可能導致資料庫出現異常崩潰。本文分享客戶近期碰到的一起由於ORA-4031問題導致資料庫異常當機的案例,供大家參考。


問題描述


2019年9月4日凌晨3點左右,接到監控系統告警:資料庫出現異常,無法連線。登陸到資料庫1節點檢視後臺alert日誌發現有大量ORA-04031報錯,2節點有少量報錯。為了儘快恢復業務,嘗試直接重啟1節點資料庫,重啟完成之後恢復正常。


問題分析


1. 節點後臺對應alert日誌:


Wed Sep 04 03:57:50 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(2,0)","kglsim object batch")

Wed Sep 04 03:58:10 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(1,0)","kglsim object batch")

Wed Sep 04 03:58:26 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(7,0)","kglsim object batch")

Wed Sep 04 03:58:42 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set   ...","sga heap(6,0)","kglsim object batch")

Wed Sep 04 03:58:57 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.mon_mods$ set ins...","sga heap(5,0)","kglsim object batch")

Wed Sep 04 03:59:08 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_xxx0_42548.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select count(*) from sys.job...","sga heap(3,0)","kglsim object batch")

Wed Sep 04 03:59:10 2019

License high water mark = 97

USER (ospid: 28750): terminating the instance


統計1節點每個子池及duration出現04031的次數,sga heap(n,0)-n代表第幾個子池,0代表是第幾個duration:



根據alert日誌可以看出,所有的ora-4031都發生在shared pool子池的第0個duration上。


Summary of resize operations history:

shared pool            start   3.19 GB  now   3.19 GB  0 grows   0 shrinks

large pool             start   0.50 GB  now   0.50 GB  0 grows   0 shrinks

java pool              start   0.50 GB  now   0.50 GB  0 grows   0 shrinks

SGA Target             start  32.00 GB  now  32.00 GB  0 grows   0 shrinks

DEFAULT buffer cache   start  27.59 GB  now  27.59 GB  0 grows   0 shrinks

PGA Target             start  11.00 GB  now  11.00 GB  0 grows   0 shrinks


發現shared pool並沒有進行resize。


==============================================

TOP 20 MEMORY USES ACROSS SGA HEAP 1 - 7

----------------------------------------------

"KGLH0                     "       1103 MB 19%

"SQLA                      "       1081 MB 18%

"free memory               "        835 MB 14%

"gcs resources             "        794 MB 14%

"gcs shadows               "        550 MB  9%

"db_block_hash_buckets     "        178 MB  3%

"ASH buffers               "        160 MB  3%

"KGLHD                     "        157 MB  3%

"Checkpoint queue          "        156 MB  3%

"kglsim object batch       "         90 MB  2%

"kglsim heap               "         56 MB  1%

"ges resource              "         53 MB  1%

"ges enqueues              "         43 MB  1%

"KGLDA                     "         41 MB  1%

"dbwriter coalesce buffer  "         40 MB  1%

"dirty object counts array "         40 MB  1%

"object queue              "         35 MB  1%

"gcs res hash bucket       "         32 MB  1%

"dbktb: trace buffer       "         31 MB  1%

"FileOpenBlock             "         30 MB  1%

TOTALS ---------------------------------------

Total free memory                   830 MB

Total memory alloc.                5026 MB

Grand total                        5856 MB

==============================================


2. 節點後臺對應的alert日誌:


Wed Sep 04 03:23:18 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_35378.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")

Wed Sep 04 03:23:23 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_35453.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(1,0)","kglsim object batch")

Wed Sep 04 03:23:29 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_35725.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(5,0)","kglsim object batch")

Wed Sep 04 03:23:34 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_35778.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(2,0)","kglsim object batch")

Wed Sep 04 03:23:39 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_36069.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(6,0)","kglsim object batch")

Wed Sep 04 03:23:45 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_36151.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(3,0)","kglsim object batch")

Wed Sep 04 03:23:50 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_36242.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(7,0)","kglsim object batch")

Wed Sep 04 03:23:55 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_36305.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")


統計2節點每個子池及duration出現04031的次數:



根據alert日誌可以看出,所有的ora-4031同樣都發生在shared pool子池的第0個duration上,導致4031的根本原因是因為shared pool子池的第0個duration記憶體不足。


透過設定sga_target的ASMM管理後,共享池(shared_pool)和流池(streams pool)每個子池都是4個duration。它們分別是:instance,session,cursor,execution,只有第四個duration,也就是execution是可以resize的,而當第0個duration記憶體不足的時候不能resize,就直接會報錯ora-4031。


問題解決


透過禁用duration,必須設定引數"_enable_shared_pool_durations=fales",並重啟資料庫。

alter system set "_enable_shared_pool_durations"=false scope=spfile;

透過該引數設定後,把它們四個duration都合併到一個池中,不會再出現一個duration的記憶體被耗盡,而另外一個duration仍具有空閒記憶體,對於共享池和流池都是這樣;設定sga_target之後,所有池都會透過buffer cache來傳輸granules(顆粒)整數倍大小的記憶體,如果shrink,則返回buffer cache,沒有從一個pool到另外一個pool的直接傳輸,所有的記憶體resize都會以buffer cache作為源和目標。

設定該引數的唯一負面影響是SGA resize的時候,不能從shared pool中取記憶體到其他的pool。



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31556440/viewspace-2666444/,如需轉載,請註明出處,否則將追究法律責任。

相關文章