Alter日誌中出現如下錯誤，而後資料庫就奔潰了。

Thu Aug 16 20:21:25 2012

Detected change in CPU count to 8

Thu Aug 16 20:23:29 2012

Process J000 died, see its trace file

Thu Aug 16 20:23:29 2012

kkjcre1p: unable to spawn jobq slave process

Thu Aug 16 20:23:29 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_cjq0_3249.trc:

Thu Aug 16 20:25:25 2012

Detected change in CPU count to 8

Thu Aug 16 12:25:55 2012

Errors in file :

ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments:

Thu Aug 16 20:28:25 2012

!

Detected change in CPU count to 1

Thu Aug 16 20:29:15 2012

Process J000 died, see its trace file

Thu Aug 16 20:29:15 2012

kkjcre1p: unable to spawn jobq slave process

Thu Aug 16 20:29:15 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_cjq0_3249.trc:

Thu Aug 16 20:29:26 2012

Detected change in CPU count to 8

Thu Aug 16 20:29:50 2012

OER 7451 in Load Indicator : Error Code = Linux-x86_64 Error: 11086: Unknown system error

Thu Aug 16 20:30:26 2012

Determining CPU socket count failed!

Detected change in CPU count to 1

Thu Aug 16 20:31:26 2012

Detected change in CPU count to 8

Thu Aug 16 22:22:49 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_lgwr_3241.trc:

ORA-00471: DBWR process terminated with error

Instance terminated by DBW0, pid = 3239

根據ora-00600 [keltnfy-ldmInit] [46] [1]的錯誤，一般是由於主機名不一致導致資料庫無法啟動等原因，不過我檢視資料庫中的/etc/hosts檔案和hostname主機名確實是一致的。檢視相應的trace檔案都發現檔案不存在了，這確實令我非常的疑惑。

[oracle@server127 bdump]$ uptime

10:31:30 up 24 days, 23:33, 3 users, load average: 2.12, 2.29, 2.55

而系統也沒有重啟過，不過好在資料庫也能正常的startup，Detected change in CPU count to 8

等這類cpu資訊有關。

不一會兒該伺服器又再次奔潰

Fri Aug 17 10:46:02 2012

Process J000 died, see its trace file

Fri Aug 17 10:46:02 2012

kkjcre1p: unable to spawn jobq slave process

Fri Aug 17 10:46:02 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_cjq0_19052.trc:

Fri Aug 17 10:48:47 2012

Errors in file /db/oracle10g/admin/benguo/udump/benguo_ora_19187.trc:

ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode

Fri Aug 17 10:48:48 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_lgwr_19044.trc:

ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode

Instance terminated by CKPT, pid = 19046

[oracle@server127 bdump]$ oerr ora 01242

01242, 00000, "data file suffered media failure: database in NOARCHIVELOG mode"

// *Cause: The database is in NOARCHIVELOG mode and a database file was

// detected as inaccessible due to media failure.

// *Action: Restore accessibility to the file mentioned in the error stack

// and restart the instance.

之前也遇到過由於磁碟壞道引起的ora-01242錯誤。

http://blog.itpub.net/post/43172/527958

metalink中給出的：

The File suffered media failure as before that there was some I/O error in writing to the datafile as seen in the alert.log. The root-cause is that the datafile was locked by an OS-tool making a filesystem backup, like Netbackup or ArcServ. The RDBMS could not open the datafile and failed accordingly .

The instance will crash in NOARCHIVELOG-mode, while in ARCHIVELOG-mode, the instance will remain running, but the datafile will be put OFFLINE and will require recovery.

Solution

If the Media recovery is required then
-- restore the old backup of the datafile
-- recover the datafile/tablespace
If there was no logswitch after the failure then the file can be recovered from the current redo log and no need to restore the old backup , so just recover database/tablespace will do

Also make sure that the backup window does not exceed and does not clash with the db open time

Online backup should be recommended , to avoid these problems

這個資料庫並沒有netbackup啊，可能還是磁碟引起。

Linux的系統日誌中出現瞭如下錯誤：

end_request: I/O error, dev sr0, sector 6979968

Buffer I/O error on device sr0, logical block 872496

sr 1:0:0:0: SCSI error: return code = 0x08000002

sr0: Current: sense key: Medium Error

Add. Sense: No seek complete

end_request: I/O error, dev sr0, sector 0

Buffer I/O error on device sr0, logical block 0

Buffer I/O error on device sr0, logical block 1

Buffer I/O error on device sr0, logical block 2

Buffer I/O error on device sr0, logical block 3

Buffer I/O error on device sr0, logical block 4

sr 1:0:0:0: SCSI error: return code = 0x08000002

sr0: Current: sense key: Medium Error

Add. Sense: No seek complete

end_request: I/O error, dev sr0, sector 0

printk: 3 messages suppressed.

Buffer I/O error on device sr0, logical block 0

sr 1:0:0:0: SCSI error: return code = 0x08000002

sr0: Current: sense key: Medium Error

Add. Sense: No seek complete

看來還是磁碟存在問題了，導致了資料庫的意外關閉，而linux給出的還是可能會是bug引起。

[@more@]

ora-00600 [keltnfy-ldmInit] [46] [1]疑惑再現和ora-01242

相關文章