10g ORACLE_HOME空間滿導致SYSAUX表空間離線

yangtingkun發表於2010-12-08

客戶伺服器上資料庫的SYSAUX表空間自動離線,幫客戶診斷導致問題的原因。

 

 

問題是SYSAUX表空間自動離線並需要恢復,有歸檔的存在,表空間的恢復很簡單,需要找出是什麼問題導致了表空間離線。

Wed Dec 01 16:49:59 2010
LGWR: Standby redo logfile selected for thread 1 sequence 66443 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 66443
  Current log# 6 seq# 66443 mem# 0: +DG00/ora00/onlinelog/group_6.5108.697636779
Wed Dec 01 17:14:11 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_ora_4262.trc:
ORA-03137: TTC
協議內部錯誤: [12333] [5] [113] [105] [] [] [] []
Wed Dec 01 17:14:12 2010
Trace dumping is performing id=[cdmp_20101201171412]
Wed Dec 01 17:27:39 2010
LGWR: Standby redo logfile selected for thread 1 sequence 66444 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 66444
  Current log# 2 seq# 66444 mem# 0: +DG00/ora00/onlinelog/group_2.262.696608025
Wed Dec 01 18:01:15 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_ora_5879.trc  (incident=409507):
ORA-00600:
內部錯誤程式碼, 引數: [kkocxj : pjpCtx], [], [], [], [], [], [], []
Incident details in: /db/oracle/app/oracle/diag/rdbms/ora00/ora001/incident/incdir_409507/ora001_ora_5879_i409507.trc
Wed Dec 01 18:01:38 2010
Trace dumping is performing id=[cdmp_20101201180138]
Wed Dec 01 18:01:40 2010
Sweep Incident[409507]: completed
Wed Dec 01 18:02:07 2010
Trace dumping is performing id=[cdmp_20101201180100]
Wed Dec 01 18:21:00 2010
LGWR: Standby redo logfile selected for thread 1 sequence 66445 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 66445
  Current log# 1 seq# 66445 mem# 0: +DG00/ora00/onlinelog/group_1.261.696608025
Wed Dec 01 20:04:34 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_j000_2866.trc:
ORA-12012:
自動執行作業 29 出錯
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 53
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 1157
ORA-06512:
line 1
Wed Dec 01 20:04:41 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q000_22441.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:05:12 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q003_27619.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:05:34 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_j000_2866.trc:
ORA-12012:
自動執行作業 29 出錯
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 53
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 1157
ORA-06512:
line 1
Wed Dec 01 20:05:44 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q002_22446.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:06:14 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_j000_2866.trc:
ORA-12012: error on auto execute of job 4002
ORA-00376: file 2 cannot be read at this time
ORA-01110: data file 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-06512: at "FLOWS_030000.WWV_FLOW_PLATFORM", line 9
ORA-06512: at line 1
Wed Dec 01 20:06:14 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q001_4013.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:06:34 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_j000_2866.trc:
ORA-12012:
自動執行作業 29 出錯
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 53
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 1157
ORA-06512:
line 1
Wed Dec 01 20:06:45 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q000_6265.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:07:16 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q003_6575.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:07:34 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_j000_2866.trc:
ORA-12012:
自動執行作業 29 出錯
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 53
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 1157
ORA-06512:
line 1
Wed Dec 01 20:07:48 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q002_8014.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:08:14 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_j000_2866.trc:
ORA-12012: error on auto execute of job 4002
ORA-00376: file 2 cannot be read at this time
ORA-01110: data file 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-06512: at "FLOWS_030000.WWV_FLOW_PLATFORM", line 9
ORA-06512: at line 1
Wed Dec 01 20:08:18 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_q001_9342.trc:
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
Wed Dec 01 20:08:34 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_j000_2866.trc:
ORA-12012:
自動執行作業 29 出錯
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 53
ORA-06512:
"SYSMAN.EMD_MAINTENANCE", line 1157
ORA-06512:
line 1

後面類似的錯誤還有很多,不過後面的所有錯誤都是表現,導致錯誤的原因是資料檔案2離線造成的。而在錯誤第一次發生之前,存在一個ORA-03137和一個ORA-00600[kkocxj : pjpCtx]錯誤。

這兩個錯誤都是Oraclebug,但是顯然都不是造成表空間離線的主要原因。

我不是直接從客戶處獲取資訊,因此所有的資訊都來自日誌檔案,雖然獲取的日誌檔案只有最後兩三天的,但是從中已經足夠找到一些有用的內容了。

從上面貼出的部分內容就可以看到,這個資料庫配置了STANDBY環境,從歸檔的路徑可以看出,這個資料庫使用了ASM磁碟組。

一般來說,使用ASM的資料庫99%RAC環境,確實ASM支援單例項環境,但是正式環境沒有人會為單例項資料庫去部署ASM

而且在錯誤日誌中還看到這樣的資訊:

Wed Dec 01 15:12:05 2010
Global Enqueue Services Deadlock detected. More info in file
 /db/oracle/app/oracle/diag/rdbms/ora00/ora001/trace/ora001_lmd0_21641.trc.

顯然這是一個RAC環境資料庫,那麼導致問題的多半是其他的例項,聯絡客戶獲得例項2上的告警日誌,導致錯誤的原因這次一目瞭然了:

Wed Dec 01 17:26:26 2010
LGWR: Standby redo logfile selected for thread 2 sequence 25580 for destination LOG_ARCHIVE_DEST_2
Thread 2 advanced to log sequence 25580
  Current log# 15 seq# 25580 mem# 0: +DG00/ora00/onlinelog/group_15.774.729989389
Wed Dec 01 18:00:32 2010
Trace dumping is performing id=[cdmp_20101201180138]
Wed Dec 01 18:00:40 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora002/trace/ora002_ora_28318.trc  (incident=217991):
ORA-00600:
內部錯誤程式碼, 引數: [kkocxj : pjpCtx], [], [], [], [], [], [], []
Incident details in: /db/oracle/app/oracle/diag/rdbms/ora00/ora002/incident/incdir_217991/ora002_ora_28318_i217991.trc
Wed Dec 01 18:01:00 2010
Trace dumping is performing id=[cdmp_20101201180100]
Wed Dec 01 18:01:01 2010
Sweep Incident[217991]: completed
Wed Dec 01 18:42:10 2010
Thread 2 advanced to log sequence 25581
  Current log# 16 seq# 25581 mem# 0: +DG00/ora00/onlinelog/group_16.775.729989389
Wed Dec 01 19:15:51 2010
Non critical error ORA-00001 caught while writing to trace file "/db/oracle/app/oracle/diag/rdbms/ora00/ora002/trace/ora002_dia0_1187.trc"
Error message: SVR4 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on...
Wed Dec 01 19:59:15 2010
Non critical error ORA-00001 caught while writing to trace file "/db/oracle/app/oracle/diag/rdbms/ora00/ora002/trace/ora002_m000_23401.trc"
Error message: SVR4 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on...
Wed Dec 01 20:03:17 2010
Non critical error ORA-00001 caught while writing to trace file "/db/oracle/app/oracle/diag/rdbms/ora00/ora002/trace/ora002_dbw0_1221.trc"
Error message: SVR4 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on...
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora002/trace/ora002_dbw0_1221.trc:
ORA-01148:
無法重新整理資料檔案 2 的檔案大小
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'
ORA-09817: Write to audit file failed.
SVR4 Error: 28: No space left on device
Automatic datafile offline due to media error on
file 2: +DG00/ora00/datafile/sysaux.257.696607929
Wed Dec 01 20:08:10 2010
Non critical error ORA-0Thu Dec 02 08:59:07 2010
Errors in file /db/oracle/app/oracle/diag/rdbms/ora00/ora002/trace/ora002_smon_1235.trc:
ORA-00604:
遞迴 SQL 級別 1 出現錯誤
ORA-00376:
此時無法讀取檔案 2
ORA-01110:
資料檔案 2: '+DG00/ora00/datafile/sysaux.257.696607929'

導致錯誤的原因很簡單,節點2上的ORACLE_BASE/ORACLE_HOME目錄沒有空間了。在錯誤的最開始也產生了一個ORA-600錯誤,同樣這個錯誤不是導致問題的真正原因,但是這是誘因,由於這個ORA-600產生的trace檔案,使得ORACLE_BASE目錄沒有空閒空間,當隨後的內部事件發生,Oracle將事件記錄到SYSAUX表空間的時候,發現出現了問題,由於目錄已經沒有空間,Oracle無法向ORACLE_BASE寫入資訊,使得寫SYSAUX表空間的內部事務無法正確完成,於是Oracle將資料檔案置於了OFFLINE狀態。

這個檔案本身並沒有任何的問題,而導致問題的原因僅僅是ORACLE_BASE磁碟空間不足,而不是資料檔案所在的ASM磁碟組空間不足,Oraclemetalink文件ID 759486.1描述了這個問題。

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-681316/,如需轉載,請註明出處,否則將追究法律責任。

相關文章