一次RAC資源故障的處理 skgpspawn5 status 11 12

murkey發表於2013-12-18

1 事件概述

  9月9日,業務無法連線到RAC資料庫。

1.1時間

   時間為2013年09月9日

1.2地點

         北京、現場&遠端操作

 

1.3

1.4事件

  接到使用者通知,RAC資料庫無法處理業務的連線。

 

 

 

 

2.分析過程

經過現場工程師緊急到達現場,收集現場日誌,分析:發現alert日誌有如下報錯:

 

     Sun Sep  8 16:31:35 2013

Process startup failed, error stack:

Sun Sep  8 16:31:35 2013

Errors in file /oracle/admin/drutt/bdump/drutt1_psp0_3820.trc:

ORA-27300: OS system dependent operation:fork failed with status: 12

ORA-27301: OS failure message: Not enough space

ORA-27302: failure occurred at: skgpspawn3

Sun Sep  8 16:31:35 2013

Process J005 died, see its trace file

Sun Sep  8 16:31:35 2013

kkjcre1p: unable to spawn jobq slave process

Sun Sep  8 16:31:35 2013

Errors in file /oracle/admin/drutt/bdump/drutt1_cjq0_3881.trc:

 

 

 

 

Mon Sep  9 01:40:34 2013

Process startup failed, error stack:

Mon Sep  9 01:40:34 2013

Errors in file /oracle/admin/drutt/bdump/drutt1_psp0_3820.trc:

ORA-27300: OS system dependent operation:fork failed with status: 11

ORA-27301: OS failure message: Resource temporarily unavailable

ORA-27302: failure occurred at: skgpspawn5

Mon Sep  9 01:40:35 2013

Process J005 died, see its trace file

Mon Sep  9 01:40:35 2013

kkjcre1p: unable to spawn jobq slave process

Mon Sep  9 01:40:35 2013

Errors in file /oracle/admin/drutt/bdump/drutt1_cjq0_3881.trc:

 

 

3.問題定位

  判斷是由於系統無法分配新的記憶體空間處理會話連線,導致連線故障:

  透過metalink(oracle官方)查詢相關文件:

roubleshooting ORA-27300 ORA-27301 ORA-27302 errors (文件 ID 579365.1)
Ora-27300 OS system dependent operation:fork failed with status: 11 (
文件 ID 392006.1)

Database Crashes With ORA-04030 ORA-07445 ORA-27300 ORA-27301 ORA-27302 (文件 ID 580552.1)


Skgpspawn Errors In Alert Log, New Connections to Database Fail (
文件 ID 435787.1)

 

  

   分析定位如下:

   Status 11AGAIN (status 11) : The system lacked the necessary resources to create another process, or the system-imposed limit on the total number of processes under execution system-wide or by a single user {CHILD_MAX} would be exceeded. EAGAIN corresponds to status 11.

            Maximum number of PROCESSES allowed per user may be too low

 

 

  Status12STATUS 12 - ENOMEM Not enough core / memory
During an exec or a break, the program asked for more memory than the one available by the system. This error also occurs when there are too many segmentation registers which are required for the arrangement of text data or stack segments. 

    Swap空間分配不足

 

 

 

4.  處理建議

   1.查詢系統分配引數nproc大小,根據Oracle的安裝文件nproc的值至少為4096,而maxuprc的值為nproc*9/10,如果當前程式數量超過設定的值,則根據實際需求重新調整兩個值。

 

   2.swap當時分配不足,建議檢查swap使用情況,注意系統效能情況,當前分配為8G大小,整個實體記憶體為16G。

  

   3.不排除系統記憶體溢位bug導致資源分配問題。

   4.如果再次出現此類問題,建議觀察記憶體和swap使用情況,系統日誌,建議重啟伺服器重新釋放資源來解決。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/500314/viewspace-1063633/,如需轉載,請註明出處,否則將追究法律責任。

相關文章