親愛的讀者朋友：

為了及時共享行業案例，通知共性問題，達成共享和提前預防，我們整理和編輯了《雲和恩墨技術通訊》，透過對過去一段時間的知識回顧，故障歸納，以期提供有價值的資訊供大家參考。同時，我們也希望能夠將熱點事件、新的產品特性及其他有價值的資訊聚集起來，為您提供具有前瞻性的支援資訊，保持對於當前最新的資料庫新聞和事件的瞭解，其中包括重要資料庫產品釋出、警報、更新、新版本、補丁等。

本期目錄：

新聞：2019年11月資料庫流行度排行

經驗：Oracle RAC跨節點訪問資料塊，節點長事務加劇gc等待

經驗：IBM MQ通道連線數達到最大故障分析

問題：子游標過多導致資料庫HANG

頻發：再談Library Cache Lock

頻發：故障排除之又見ORA-4031

警示：強制關閉OGG程式觸發bug致abended

公告：首屆墨天輪年度十大突出貢獻人物評選活動

雲和恩墨技術通訊集錦：

部分精選-頻發：故障排除之又見 ORA-4031 作者：候靜遠

當遇到ORA-4031錯誤時，你會不會內心一緊。Oracle程式在向SGA申請記憶體時，如果申請失敗，則會丟擲這個錯誤，大部分情況下是在向SGA中的 shared pool申請記憶體時失敗。嚴重情況下，可能導致資料庫出現異常崩潰。本文分享客戶近期碰到的一起由於ORA-4031問題導致資料庫異常當機的案例，供大家參考。

問題描述

2019年9月4日凌晨3點左右，接到監控系統告警：資料庫出現異常，無法連線。登陸到資料庫1節點檢視後臺alert日誌發現有大量ORA-04031報錯，2節點有少量報錯。為了儘快恢復業務，嘗試直接重啟1節點資料庫，重啟完成之後恢復正常。

問題分析

1. 節點後臺對應alert日誌：

Wed Sep 04 03:57:50 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(2,0)","kglsim object batch")

Wed Sep 04 03:58:10 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(1,0)","kglsim object batch")

Wed Sep 04 03:58:26 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(7,0)","kglsim object batch")

Wed Sep 04 03:58:42 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.col_usage$ set ...","sga heap(6,0)","kglsim object batch")

Wed Sep 04 03:58:57 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_smon_29747.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","update sys.mon_mods$ set ins...","sga heap(5,0)","kglsim object batch")

Wed Sep 04 03:59:08 2019

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx/trace/xxxxx_xxx0_42548.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select count(*) from sys.job...","sga heap(3,0)","kglsim object batch")

Wed Sep 04 03:59:10 2019

License high water mark = 97

USER (ospid: 28750): terminating the instance

統計1節點每個子池及duration出現04031的次數，sga heap(n,0)-n代表第幾個子池，0代表是第幾個duration：

根據alert日誌可以看出，所有的ora-4031都發生在shared pool子池的第0個duration上。

Summary of resize operations history:

shared pool start 3.19 GB now 3.19 GB 0 grows 0 shrinks

large pool start 0.50 GB now 0.50 GB 0 grows 0 shrinks

java pool start 0.50 GB now 0.50 GB 0 grows 0 shrinks

SGA Target start 32.00 GB now 32.00 GB 0 grows 0 shrinks

DEFAULT buffer cache start 27.59 GB now 27.59 GB 0 grows 0 shrinks

PGA Target start 11.00 GB now 11.00 GB 0 grows 0 shrinks

發現shared pool並沒有進行resize。

==============================================

TOP 20 MEMORY USES ACROSS SGA HEAP 1 - 7

----------------------------------------------

"KGLH0 " 1103 MB 19%

"SQLA " 1081 MB 18%

"free memory " 835 MB 14%

"gcs resources " 794 MB 14%

"gcs shadows " 550 MB 9%

"db_block_hash_buckets " 178 MB 3%

"ASH buffers " 160 MB 3%

"KGLHD " 157 MB 3%

"Checkpoint queue " 156 MB 3%

"kglsim object batch " 90 MB 2%

"kglsim heap " 56 MB 1%

"ges resource " 53 MB 1%

"ges enqueues " 43 MB 1%

"KGLDA " 41 MB 1%

"dbwriter coalesce buffer " 40 MB 1%

"dirty object counts array " 40 MB 1%

"object queue " 35 MB 1%

"gcs res hash bucket " 32 MB 1%

"dbktb: trace buffer " 31 MB 1%

"FileOpenBlock " 30 MB 1%

TOTALS ---------------------------------------

Total free memory 830 MB

Total memory alloc. 5026 MB

Grand total 5856 MB

==============================================

2. 節點後臺對應的alert日誌：

Wed Sep 04 03:23:18 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_35378.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")

Wed Sep 04 03:23:23 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_35453.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(1,0)","kglsim object batch")

Wed Sep 04 03:23:29 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_35725.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(5,0)","kglsim object batch")

Wed Sep 04 03:23:34 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_35778.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(2,0)","kglsim object batch")

Wed Sep 04 03:23:39 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q002_36069.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(6,0)","kglsim object batch")

Wed Sep 04 03:23:45 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q003_36151.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(3,0)","kglsim object batch")

Wed Sep 04 03:23:50 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q004_36242.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(7,0)","kglsim object batch")

Wed Sep 04 03:23:55 2019

Emon ping encountered error 12801

Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx2/trace/xxxxx2_q001_36305.trc:

ORA-12801: error signaled in parallel query server PZ99, instance dnfwglpt1:xxxxx (1)

ORA-04031: unable to allocate 3896 bytes of shared memory ("shared pool","select inst_id, reg_id, num_...","sga heap(4,0)","kglsim object batch")

統計2節點每個子池及duration出現04031的次數：

根據alert日誌可以看出，所有的ora-4031同樣都發生在shared pool子池的第0個duration上，導致4031的根本原因是因為shared pool子池的第0個duration記憶體不足。

透過設定sga_target的ASMM管理後，共享池(shared_pool)和流池(streams pool)每個子池都是4個duration。它們分別是：instance，session，cursor，execution，只有第四個duration，也就是execution是可以resize的，而當第0個duration記憶體不足的時候不能resize，就直接會報錯ora-4031。

問題解決

透過禁用duration，必須設定引數"_enable_shared_pool_durations=fales"，並重啟資料庫。

alter system set "_enable_shared_pool_durations"=false scope=spfile;

透過該引數設定後，把它們四個duration都合併到一個池中，不會再出現一個duration的記憶體被耗盡，而另外一個duration仍具有空閒記憶體，對於共享池和流池都是這樣；設定sga_target之後，所有池都會透過buffer cache來傳輸granules（顆粒）整數倍大小的記憶體，如果shrink，則返回buffer cache，沒有從一個pool到另外一個pool的直接傳輸，所有的記憶體resize都會以buffer cache作為源和目標。

設定該引數的唯一負面影響是SGA resize的時候，不能從shared pool中取記憶體到其他的pool。

頻發：故障排除之又見 ORA-4031丨雲和恩墨技術通訊

相關文章