案列分析 p570a主機掛起處理報告
環境:oracle 11.2.0.1 +rac +AIX 6.1建立兩套資料庫
2010年11月29日下午15點左右,p570a主機 telnet不進去,應用新建連線不成功,嚴重影響到業務,16點趕到使用者現場,進行應急處理。
現把本次資料庫應急故障處理、問題分析過程總結如下:
透過hmc控制檯,登入到p570a主機,輸入任何命令都報記憶體不足,如下;
root@p570a:/> errpt|more
ksh: 0403-031 The fork function failed. There is not enough memory available.
ksh: 0403-031 The fork function failed. There is not enough memory available.
root@p570a:/> ps -ef | grep LOCAL=NO|wc -l
ksh: 0403-031 The fork function failed. There is not enough memory available.
root@p570a:/> ls
ksh: 0403-031 The fork function failed. There is not enough memory available.
徵求使用者意見同意後,透過hmc控制檯,重啟p570a主機。
故障分析
p570a@root#errpt|more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
A6DF45AA 1129164210 I O RMCdaemon The daemon is started.
EC0BCCD4 1129164110 T H ent1 ETHERNET DOWN
67145A39 1129163910 U S SYSDUMP SYSTEM DUMP
F48137AC 1129163810 U O minidump COMPRESSED MINIMAL DUMP
1104AA28 1129163810 T S SYSPROC SYSTEM RESET INTERRUPT RECEIVED
9DBCFDEE 1129164110 T O errdemon ERROR LOGGING TURNED ON
B6267342 1126235510 P H hdisk3 DISK OPERATION ERROR
B6267342 1125235510 P H hdisk3 DISK OPERATION ERROR
C5C09FFA 1125062110 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1125051010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
C5C09FFA 1124144010 P S SYSVMM SOFTWARE PROGRAM ABNORMALLY TERMINATED
p570a@root#errpt -aj C5C09FFA |more
---------------------------------------------------------------------------
LABEL: PGSP_KILL
IDENTIFIER: C5C09FFA
Date/Time: Thu Nov 25 06:21:13 BEIST 2010
Sequence Number: 99122
Machine Id: 00C6E9C54C00
Node Id: p570a
Class: S
Type: PERM
WPAR: Global
Resource Name: SYSVMM
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Probable Causes
SYSTEM RUNNING OUT OF PAGING SPACE
Failure Causes
INSUFFICIENT PAGING SPACE DEFINED FOR THE SYSTEM
PROGRAM USING EXCESSIVE AMOUNT OF PAGING SPACE
11月24號開始已經報沒有足夠的頁面交換空間可以使用,可見實體記憶體早就用完。
alert_gzjb1.log從11月24號開始就有大量如下報錯:
Wed Nov 24 22:36:15 2010
ORA-27302: failure occurred at: skgpspawn3
ORA-27301: OS failure message: Not enough space
ORA-27300: OS system dependent operation:fork failed with status: 12
Errors in file /oracle/app/oracle/diag/rdbms/gdjb/gdjb1/trace/gdjb1_psp0_352314.trc:
Process startup failed, error stack:
Thu Nov 25 02:56:24 2010
Process q000 died, see its trace file
Thu Nov 25 02:56:13 2010
ORA-27302: failure occurred at: skgpspawn3
ORA-27301: OS failure message: Not enough space
ORA-27300: OS system dependent operation:fork failed with status: 12
Errors in file /oracle/app/oracle/diag/rdbms/gdjb/gdjb1/trace/gdjb1_psp0_352314.trc:
Process startup failed, error stack:
Instance terminated by USER, pid = 144242
USER (ospid: 144242): terminating the instance due to error 443
Process LMHB died, see its trace file
ORA-27302: failure occurred at: skgpspawn3
ORA-27301: OS failure message: Not enough space
ORA-27300: OS system dependent operation:fork failed with status: 12
Errors in file /oracle/app/oracle/diag/rdbms/gdjb/gdjb1/trace/gdjb1_ora_144242.trc:
p570a節點資料庫down機是由於實體記憶體和頁面交換空間已經使用完,無法得到請求引起的。
TNS-12500: TNS:監聽器未能啟動專用的伺服器程式
TNS-12540: TNS:超出內部極限限制
TNS-12560: TNS: 協議介面卡錯誤
TNS-00510: 超出內部極限限制
IBM/AIX RISC System/6000 Error: 12: Not enough space
監聽日誌也報無法請求外部連線錯誤。
記憶體引數
實體記憶體
p570a
AIX
System Model: IBM,9117-MMA
Machine Serial Number: 066E9C5
Processor Type: PowerPC_POWER6
Processor Implementation Mode: POWER 6
Processor Version: PV_6_Compat
Number Of Processors: 8
Processor Clock Speed: 3504 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 1 06-6E9C5
Memory Size: 15232 MB
Good Memory Size: 15232 MB
Platform. Firmware level: EM350_038
Firmware Version: IBM,EM350_038
Console Login: enable
Auto Restart: true
Full Core: false
可以看出總實體記憶體為15G左右
資料庫A
SQL> show sga
Total System Global Area 2137886720 bytes
Fixed Size 2208496 bytes
Variable Size 1207962896 bytes
Database Buffers 922746880 bytes
Redo Buffers 4968448 bytes
SQL> show parameter sga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 2G
sga_target big integer 2G
SQL> show parameter pga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 1G
SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_name string gd1
可以看出A資料庫佔用3G實體記憶體
資料庫B
SQL> show sga
Total System Global Area 8551575552 bytes
Fixed Size 2223904 bytes
Variable Size 1778385120 bytes
Database Buffers 6761218048 bytes
Redo Buffers 9748480 bytes
SQL> show parameter sga
NAME TYPE VALUE
lock_sga Boolean FALSE
pre_page_sga Boolean FALSE
sga_max_size big integer 8G
sga_target big integer 8G
SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_name string gd2
SQL> show parameter pga
NAME TYPE VALUE
pga_aggregate_target big integer 2G
可以看出B資料庫佔用10G實體記憶體,分配的值佔用總記憶體較多。
總實體記憶體15G,分配給兩個資料庫總共記憶體13G,只剩2G給作業系統使用,隨著業務連線數增多或不釋放等原因,很容易把實體記憶體和頁面交換空間耗用完,導致資料庫down機和主機掛起。
1) gzcdc資料庫oracle記憶體引數值設定過大,建議調整,跟開發商,使用者商量後,將gzcdc資料庫sga調整為5G,pga設定為1G,這樣作業系統還剩餘7G。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7199859/viewspace-680613/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 主機被入侵分析過程報告
- ora-00494引起rac當機的分析處理
- OGG複製程式掛起abended處理
- WEB安全漏洞掃描與處理(下)——安全報告分析和漏洞處理Web
- 使用AD+處理崩潰和掛起 (轉)
- sqlldr標準輸出未處理導致批處理掛起問題SQL
- 使用AD+處理崩潰和掛起(2) (轉)
- sqlsever處理資料庫的恢復掛起狀態SQL資料庫
- 聯機分析處理(OLAP)概述
- 【Ansible】Ansible 連線主機顯示報錯的處理方案
- 報告:明智地處理廣告浪費
- Oracle起動庫時1102報錯處理Oracle
- 取款機專案實驗報告
- 專案需求分析報告怎麼做
- 歸檔日誌滿導致的資料庫掛起故障處理資料庫
- Looper中的訊息佇列處理機制OOP佇列
- 通過AWR報告處理故障一次心得
- MySQL主從不同步問題分析與處理思路MySql
- 一條報警資訊的快速處理和分析
- 匯入專案@override 報錯處理IDE
- 來自機構的投資分析報告(模板)
- Novter無檔案攻擊分析報告
- statspack 報告分析
- awr報告中顯示enq: TM - contention 處理方法ENQ
- 資料庫主機重啟卡住問題處理分享資料庫
- 改主機名後Oracle OEM無法使用處理方法Oracle
- redat 5.8由於檔案系統100%,導致oracle資料庫例項掛起處理例項Oracle資料庫
- 一次歸檔報錯的處理和分析
- 歸檔日誌滿導致的資料庫掛起故障處理【轉載】資料庫
- 原始碼分析:Android訊息處理機制原始碼Android
- android的視窗機制分析------事件處理Android事件
- job掛死處理步驟
- PostgreSQL 原始碼解讀(157)- 後臺程式#9(同步複製主庫掛起分析)SQL原始碼
- 陣列處理函式陣列函式
- PHP 陣列 & 字串處理PHP陣列字串
- 1470 數列處理
- OEM分析TNSNAME.ORA檔案失敗處理
- idea外掛報錯導致不能啟動的處理技巧Idea