【故障處理】ORA- 2730*,status 12故障分析與處理
今天有朋友在AIX作業系統上的10g資料庫環境中遭遇了ORA-2730*一系列錯誤。導致系統使用一段時間後就無法連線,客戶端亦無法登陸,伺服器端也無法提交任何命令。
最後使用簡單的“重啟資料庫”方法暫時處理了這個問題(這種處理方法要堅決取締!)。我們簡單看一下這個故障。
1.警告日誌提示的錯誤資訊
Errors in file /home/oracle/admin/ora10g/bdump/ora10g_psp0_147610.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Fri Apr 16 06:14:42 2010
Process m000 died, see its trace file
Fri Apr 16 06:14:42 2010
ksvcreate: Process(m000) creation failed
Fri Apr 16 06:15:19 2010
Process startup failed, error stack:
Fri Apr 16 06:15:19 2010
Errors in file /home/oracle/admin/ora10g/bdump/ora10g_psp0_147610.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Fri Apr 16 06:15:19 2010
Process m000 died, see its trace file
Fri Apr 16 06:15:19 2010
ksvcreate: Process(m000) creation failed
Fri Apr 16 06:15:44 2010
Process startup failed, error stack:
Fri Apr 16 06:15:44 2010
2.Trace檔案中的記錄
/home/oracle/admin/ora10g/bdump/ora10g_cjq0_516352.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /home/oracle/product/10g
System name: AIX
Node name: smartcard1
Release: 3
Version: 5
Machine: 0006A849D600
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 10
Unix process pid: 516352, image: oracle@smartcard1 (CJQ0)
*** 2010-04-28 23:04:44.012
*** SERVICE NAME:(SYS$BACKGROUND) 2010-04-28 23:04:43.776
*** SESSION ID:(162.1) 2010-04-28 23:04:43.776
Waited for process J000 to initialize for 60 seconds
*** 2010-04-28 23:04:44.012
Dumping diagnostic information for J000:
*** 2010-04-28 23:06:08.301
OS pid = 425984
loadavg : 0.62 0.46 0.23
swap info: free_mem = 20.43M rsv = 32.00M
alloc = 4999.81M avail = 8192.00M swap_free = 3192.19M
skgpgpstack: fgets() timed out after 60 seconds
ERROR: process 425984 is not alive
*** 2010-04-28 23:06:08.332
*** 2010-04-29 23:12:22.828
Waited for process J000 to initialize for 60 seconds
*** 2010-04-29 23:12:24.493
Dumping diagnostic information for J000:
OS pid = 635374
loadavg : 1.23 1.04 0.62
swap info: free_mem = 11.09M rsv = 32.00M
alloc = 3162.42M avail = 8192.00M swap_free = 5029.58M
skgpgpstack: fgets() timed out after 60 seconds
open: Permission denied
procstack: open(/proc/635374/ctl): Permission denied
*** 2010-04-29 23:13:58.926
*** 2010-05-02 23:35:36.230
Waited for process J000 to initialize for 60 seconds
*** 2010-05-02 23:35:38.121
Dumping diagnostic information for J000:
OS pid = 553290
loadavg : 0.34 0.26 0.45
swap info: free_mem = 11.81M rsv = 32.00M
alloc = 6194.64M avail = 8192.00M swap_free = 1997.36M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
240001 A loohcs 553290 1 0 255 20 6064f510 87244 23:34:36 - 0:00 [oracle]
ERROR: process 553290 is not alive
*** 2010-05-02 23:37:02.247
*** 2010-05-02 23:37:02.247
Process J000 is dead (pid=553290, state=5):
3.問題分析
重點應該關注以下報錯資訊:
1)alert中的報錯
Errors in file /home/oracle/admin/ora10g/bdump/ora10g_psp0_147610.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
2)Trace中關鍵資訊
swap info: free_mem = 11.81M rsv = 32.00M
alloc = 6194.64M avail = 8192.00M swap_free = 1997.36M
顯然是因為交換空間被耗盡導致的問題。
4.MOS中對此問題也有描述
Database Cannot Start Due to Lack of Memory [ID 560309.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.4
This problem can occur on any platform.
Symptoms
The database can not start up due to the following errors:
*** SERVICE NAME:(SYS$BACKGROUND) 2008-03-24 17:02:34.855
*** SESSION ID:(1104.1) 2008-03-24 17:02:34.855
*** 2008-03-24 17:02:34.855
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
*** 2008-03-24 17:02:38.158
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Cause
This issue is mainly caused by lack of memory / swap. Checking the memory configuration on the server, we have found the following:
Total Physical Memory 38912 MB
Swap: Max Size 17664 MiB
So, RAM is 38 GB, SWAP space is only 17 GB
Solution
1- We should increase the server swap space (paging space) . The general rule of thumb is that swap space should be:
RAM SWAP
1GB to 2GB 1.5 times RAM
> 2GB and <= 8GB 1 times RAM
> 8GB .75 times RAM
So in our case, the recommended swap space should be 28 GB Instead of 17 GB
2- We can also try to increase physical memory, if possible.
3- In Unix Platforms , The user limits for user oracle should be checked , using the "ulimit -a " command.
4- We should also check memory parameters in the pfile/spfile that may add more load to the memory consumption on the server. For example setting the following parameters can add more overhead to memory consumption
-lock_sga=true
- db_keep_cache_size=
5.小結
既然找到了問題的癥結,處理起來就方便了,可以適當的增加交換空間,或最佳化資料庫以便減少記憶體的使用。
透過這個案例我們應該吸取些什麼經驗和教訓呢?
1)在系統上線的時候要充分考慮到應用的型別,是CPU密集型、MEM密集型還是磁碟讀寫密集型,據此給出系統科學有效的最佳化方式;
2)遇到問題,不可簡單的使用“重啟資料庫”的方法來處理,要充分挖掘出問題背後的真實原因,防止問題的再一次出現;
3)“化風險於無形”,也就是說要加強平時的系統監控,隨時發現問題進行最佳化,防止病入膏肓時無計可施。
Good luck.
secooler
10.05.11
-- The End --
最後使用簡單的“重啟資料庫”方法暫時處理了這個問題(這種處理方法要堅決取締!)。我們簡單看一下這個故障。
1.警告日誌提示的錯誤資訊
Errors in file /home/oracle/admin/ora10g/bdump/ora10g_psp0_147610.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Fri Apr 16 06:14:42 2010
Process m000 died, see its trace file
Fri Apr 16 06:14:42 2010
ksvcreate: Process(m000) creation failed
Fri Apr 16 06:15:19 2010
Process startup failed, error stack:
Fri Apr 16 06:15:19 2010
Errors in file /home/oracle/admin/ora10g/bdump/ora10g_psp0_147610.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Fri Apr 16 06:15:19 2010
Process m000 died, see its trace file
Fri Apr 16 06:15:19 2010
ksvcreate: Process(m000) creation failed
Fri Apr 16 06:15:44 2010
Process startup failed, error stack:
Fri Apr 16 06:15:44 2010
2.Trace檔案中的記錄
/home/oracle/admin/ora10g/bdump/ora10g_cjq0_516352.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /home/oracle/product/10g
System name: AIX
Node name: smartcard1
Release: 3
Version: 5
Machine: 0006A849D600
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 10
Unix process pid: 516352, image: oracle@smartcard1 (CJQ0)
*** 2010-04-28 23:04:44.012
*** SERVICE NAME:(SYS$BACKGROUND) 2010-04-28 23:04:43.776
*** SESSION ID:(162.1) 2010-04-28 23:04:43.776
Waited for process J000 to initialize for 60 seconds
*** 2010-04-28 23:04:44.012
Dumping diagnostic information for J000:
*** 2010-04-28 23:06:08.301
OS pid = 425984
loadavg : 0.62 0.46 0.23
swap info: free_mem = 20.43M rsv = 32.00M
alloc = 4999.81M avail = 8192.00M swap_free = 3192.19M
skgpgpstack: fgets() timed out after 60 seconds
ERROR: process 425984 is not alive
*** 2010-04-28 23:06:08.332
*** 2010-04-29 23:12:22.828
Waited for process J000 to initialize for 60 seconds
*** 2010-04-29 23:12:24.493
Dumping diagnostic information for J000:
OS pid = 635374
loadavg : 1.23 1.04 0.62
swap info: free_mem = 11.09M rsv = 32.00M
alloc = 3162.42M avail = 8192.00M swap_free = 5029.58M
skgpgpstack: fgets() timed out after 60 seconds
open: Permission denied
procstack: open(/proc/635374/ctl): Permission denied
*** 2010-04-29 23:13:58.926
*** 2010-05-02 23:35:36.230
Waited for process J000 to initialize for 60 seconds
*** 2010-05-02 23:35:38.121
Dumping diagnostic information for J000:
OS pid = 553290
loadavg : 0.34 0.26 0.45
swap info: free_mem = 11.81M rsv = 32.00M
alloc = 6194.64M avail = 8192.00M swap_free = 1997.36M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
240001 A loohcs 553290 1 0 255 20 6064f510 87244 23:34:36 - 0:00 [oracle]
ERROR: process 553290 is not alive
*** 2010-05-02 23:37:02.247
*** 2010-05-02 23:37:02.247
Process J000 is dead (pid=553290, state=5):
3.問題分析
重點應該關注以下報錯資訊:
1)alert中的報錯
Errors in file /home/oracle/admin/ora10g/bdump/ora10g_psp0_147610.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
2)Trace中關鍵資訊
swap info: free_mem = 11.81M rsv = 32.00M
alloc = 6194.64M avail = 8192.00M swap_free = 1997.36M
顯然是因為交換空間被耗盡導致的問題。
4.MOS中對此問題也有描述
Database Cannot Start Due to Lack of Memory [ID 560309.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.4
This problem can occur on any platform.
Symptoms
The database can not start up due to the following errors:
*** SERVICE NAME:(SYS$BACKGROUND) 2008-03-24 17:02:34.855
*** SESSION ID:(1104.1) 2008-03-24 17:02:34.855
*** 2008-03-24 17:02:34.855
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
*** 2008-03-24 17:02:38.158
Process startup failed, error stack:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Cause
This issue is mainly caused by lack of memory / swap. Checking the memory configuration on the server, we have found the following:
Total Physical Memory 38912 MB
Swap: Max Size 17664 MiB
So, RAM is 38 GB, SWAP space is only 17 GB
Solution
1- We should increase the server swap space (paging space) . The general rule of thumb is that swap space should be:
RAM SWAP
1GB to 2GB 1.5 times RAM
> 2GB and <= 8GB 1 times RAM
> 8GB .75 times RAM
So in our case, the recommended swap space should be 28 GB Instead of 17 GB
2- We can also try to increase physical memory, if possible.
3- In Unix Platforms , The user limits for user oracle should be checked , using the "ulimit -a " command.
4- We should also check memory parameters in the pfile/spfile that may add more load to the memory consumption on the server. For example setting the following parameters can add more overhead to memory consumption
-lock_sga=true
- db_keep_cache_size=
5.小結
既然找到了問題的癥結,處理起來就方便了,可以適當的增加交換空間,或最佳化資料庫以便減少記憶體的使用。
透過這個案例我們應該吸取些什麼經驗和教訓呢?
1)在系統上線的時候要充分考慮到應用的型別,是CPU密集型、MEM密集型還是磁碟讀寫密集型,據此給出系統科學有效的最佳化方式;
2)遇到問題,不可簡單的使用“重啟資料庫”的方法來處理,要充分挖掘出問題背後的真實原因,防止問題的再一次出現;
3)“化風險於無形”,也就是說要加強平時的系統監控,隨時發現問題進行最佳化,防止病入膏肓時無計可施。
Good luck.
secooler
10.05.11
-- The End --
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/519536/viewspace-662488/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 故障分析 | Greenplum Segment 故障處理
- 【故障處理】ORA-600:[13013],[5001]故障處理
- linux故障處理Linux
- GPON網路故障如何處理?GPON網路故障處理流程
- MySQL show processlist故障處理MySql
- 微服務的故障處理微服務
- Oracle更新Opatch故障處理Oracle
- teams登入故障處理
- 線上故障處理手冊
- 【故障處理】TNS-04610問題
- GaussDB(分散式)例項故障處理分散式
- Oracle 10g RAC故障處理Oracle 10g
- ORA-01591錯誤故障處理
- 如何處理HTTP 503故障問題?HTTP
- Oracle 11.2.0.4 Dataguard兩則故障處理Oracle
- 【CHECKPOINT】Oracle檢查點優化與故障處理Oracle優化
- hbase 故障的處理方案。 (轉載文章)
- Oracle DG同步失敗故障處理(二)Oracle
- NO.A.0001——zabbix常見故障的處理
- 體檢伺服器nginx故障處理伺服器Nginx
- Oracle client安裝the jre is 0故障處理Oracleclient
- 【故障處理】ORA-3113 "end of file on communication channel"
- hillstone現場故障處理指導手冊
- 金融行業現場故障處理實錄行業
- TS - 處理故障的一些通用方法
- OracleORA-03113 ORA-600 [4193]故障處理Oracle
- 【故障處理】ORA-28547: connection to server failed, probableServerAI
- 叢集故障處理之處理思路以及健康狀態檢查(三十二)
- Bumblebee之負載、限流和故障處理實踐負載
- TiDB故障處理之讓人迷惑的Region is UnavailableTiDBAI
- 專案02(Mysql gtid複製故障處理01)MySql
- 不停機處理oracle超過最大processes數故障Oracle
- 銀河麒麟系統部署keepalived.故障處理
- 故障處理】佇列等待之enq: US - contention案例佇列ENQ
- 【故障處理】佇列等待之enq: US - contention案例佇列ENQ
- 經典乾貨:Docker 常見故障排查處理Docker
- 轉載ORA-01591錯誤故障處理(ji)
- 故障分析 | MySQL convert 函式導致的字符集報錯處理MySql函式
- 資料中心儲存系統故障的處理方式