ASM空間爆滿導致資料庫掛起

space6212發表於2019-03-31

今天用客戶端連結一個測試庫時出現錯誤ORA-00257: archiver error. Connect internal only, until freed。
資料庫環境是SOLARIS 10 + 10203 RAC + ASM。


之前遇到過幾次這個問題,是因為資料庫出了問題,導致無法歸檔。登上資料庫伺服器,用sys連線進去檢視日誌的狀態:
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------------
1 1 54 1073741824 2 NO CURRENT 5678852562 11-10??-07
2 1 51 1073741824 2 NO INACTIVE 5678784185 11-10??-07
3 1 53 1073741824 2 NO INACTIVE 5678835269 11-10??-07
4 1 52 1073741824 2 NO INACTIVE 5678800967 11-10??-07
5 2 22 1073741824 2 YES INACTIVE 5678750740 11-10??-07
6 2 23 1073741824 2 NO CURRENT 5678835302 11-10??-07
7 2 20 1073741824 2 YES INACTIVE 5678603818 11-10??-07
8 2 21 1073741824 2 YES INACTIVE 5678685075 11-10??-07

果然是有很多日誌無法歸檔。再檢視alert檔案:
ASM的ALERT報警:
Sun Sep 30 02:16:22 2007
WARNING: allocation failure on disk DATA_0000 for file 364 xnum 2147483648
Sun Sep 30 03:26:59 2007
WARNING: allocation failure on disk DATA_0000 for file 370 xnum 2147483648
........

資料庫其中一個節點的ALERT
Sun Sep 30 02:16:22 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc1_13668.trc:
ORA-19504: failed to create file "+DATA/archivelog/1_19_634432026.dbf"
ORA-17502: ksfdcre:4 Failed to create file +DATA/archivelog/1_19_634432026.dbf
ORA-15041: diskgroup space exhausted
Sun Sep 30 02:16:22 2007
ARC1: Error 19504 Creating archive log file to '+DATA/archivelog/1_19_634432026.dbf'
ARCH: Archival stopped, error occurred. Will continue retrying
Sun Sep 30 02:16:22 2007
ORACLE Instance prerac1 - Archival Error
........
Thu Oct 11 23:55:03 2007
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc2_15831.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded

從alert檔案可以知道,不能歸檔的原因是因為ASM DISK的空間被用完了,但後面的錯誤:
ORA-00020: maximum number of processes () exceeded是怎麼回事呢?
當時整個資料庫都已經頓住了,查詢v$process完全沒有反映,只能從作業系統上查:
bash-3.00$ ps -ef|grep ora|wc -l
84

這個程式數離process引數設定值還差很遠,所以基本可以排除是資料庫的程式數滿了,再看一下asm的程式:
bash-3.00$ ps -ef|grep asm|wc -l
39
asm的程式設定值是40,所以應該是asm的程式達到的設定值。正常情況下asm只有十幾個程式,現在出現39個程式肯定是不正常的。
查了一會資料發現,原因還是因為歸檔引起的。因為我的歸檔是放在ASM上的,當oracle不能歸檔時,oracle會新起程式嘗試進行歸檔,而不釋放之前的歸檔程式,而歸檔程式是需要連結ASM例項的,所以導致ASM的程式數增大,最終出現ORA-00020錯誤。

為了解決空間不足問題,我用asmcmd刪除了大部分的歸檔日誌,用在RMAN中清除了相關的歸檔日誌資訊。
空間雖然有了,但此時問題讓人存在,估計是因為歸檔程式並不清楚已經有空間了。此時解決這一系列問題有兩種方法:
1、設定log_archive_dest_state_n,讓歸檔暫時放在其他路徑
2、重啟資料庫

在我解決過程中,ASM的連線數不斷增加,最終我用sys連線資料庫無法shutdown資料庫:
Errors in file /oracle/app/admin/pre/bdump/prerac1_arc0_13665.trc:
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_2.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_2.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-00312: online log 1 thread 1: '+DATA/onlinelog/redo1_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DATA/onlinelog/redo1_1_1.log
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded

最後透過重啟crs的方式把資料庫重啟了:
bash-3.00# ./crsctl stop crs
Stopping resources.
Successfully stopped CRS resources
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.

bash-3.00# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly

重啟後v$log的情況如下:
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME
---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- --------------
1 1 54 1073741824 2 YES INACTIVE 5678852562 11-10? -07
2 1 55 1073741824 2 YES INACTIVE 5678932182 11-10? -07
3 1 53 1073741824 2 YES INACTIVE 5678835269 11-10? -07
4 1 56 1073741824 2 NO CURRENT 5678958358 12-10? -07
5 2 22 1073741824 2 YES INACTIVE 5678750740 11-10? -07
6 2 23 1073741824 2 YES INACTIVE 5678835302 11-10? -07
7 2 24 1073741824 2 NO CURRENT 5678958359 12-10? -07
8 2 21 1073741824 2 YES INACTIVE 5678685075 11-10? -07

8 rows selected.

可見,資料庫已經恢復正常。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/231499/viewspace-63861/,如需轉載,請註明出處,否則將追究法律責任。

相關文章