EXT3-fs error ( device sd(8,5)) in start_transaction:Journal has aborted

tolywang發表於2009-01-14

Red Hat  Linux AS3.0 U6  ,   DELL 2950   

Oracle RAC ,  9.2.0.7   ,  OCFS  ,  RAC  , 雙節點

今天11點左右發現 NODE02  中的 /var/log/message 以及 oracle alert log 不能被訪問,都是read only 模式,節點1沒有問題, 然後測試其他命令,發現有些命令也不能正確使用 。 報錯。 初步估計是OS問題。 趕緊到機房檢視機器螢幕報錯及是否磁碟問題 。 發現磁碟正常, Server螢幕報錯資訊如下:

EXT3-fs error ( device sd(8,5)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,5)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,8)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,8)) in start_transaction:Journal has aborted
EXT3-fs error ( device sd(8,8)) in start_transaction:Journal has aborted

想查詢節點2上的 log messages 以及 linux dmesg 都不能開啟 。ftp 也不能使用, 沒有辦法,直接將節點2上 /var/log 下的   messges 及 dmesg  複製到共享磁碟上, 然後在正常的節點1上複製到辦公電腦,傳輸給DELL服務工程師 。  dmesg  的一些資訊如下:  

Device 08:40 not ready.
 I/O error: dev 08:40, sector 0
Device 08:80 not ready.
 I/O error: dev 08:80, sector 0
Device 08:60 not ready.
 I/O error: dev 08:60, sector 0
Device 08:80 not ready.
 I/O error: dev 08:80, sector 0
Device 08:60 not ready.
 I/O error: dev 08:60, sector 0
Device 08:a0 not ready.
 I/O error: dev 08:a0, sector 0
Device 08:a0 not ready.
 I/O error: dev 08:a0, sector 0
Device 08:80 not ready.
 I/O error: dev 08:80, sector 0
Device 08:80 not ready.
 I/O error: dev 08:80, sector 0
Device 08:c0 not ready.
 I/O error: dev 08:c0, sector 0
Device 08:a0 not ready.
 I/O error: dev 08:a0, sector 0
Device 08:c0 not ready.
 I/O error: dev 08:c0, sector 0
Device 08:a0 not ready.
 I/O error: dev 08:a0, sector 0
Device 08:c0 not ready.
 I/O error: dev 08:c0, sector 0
Device 08:c0 not ready.
 I/O error: dev 08:c0, sector 0
Attached scsi generic sg0 at scsi0, channel 0, id 8, lun 0,  type 13
SCSI device sdb: 555745280 512-byte hdwr sectors (284542 MB)
 sdb: sdb1
SCSI device sdd: 524288000 512-byte hdwr sectors (268435 MB)
 sdd: sdd1
SCSI device sdb: 555745280 512-byte hdwr sectors (284542 MB)
 sdb: sdb1
SCSI device sdf: 545259520 512-byte hdwr sectors (279173 MB)
 sdf: sdf1 sdf2 sdf3 sdf4
SCSI device sdd: 524288000 512-byte hdwr sectors (268435 MB)
 sdd: sdd1
SCSI device sdh: 555745280 512-byte hdwr sectors (284542 MB)
 sdh: sdh1
SCSI device sdf: 545259520 512-byte hdwr sectors (279173 MB)
 sdf: sdf1 sdf2 sdf3 sdf4
SCSI device sdj: 524288000 512-byte hdwr sectors (268435 MB)
 sdj: sdj1
SCSI device sdh: 555745280 512-byte hdwr sectors (284542 MB)
 sdh: sdh1
SCSI device sdl: 545259520 512-byte hdwr sectors (279173 MB)
 sdl: sdl1 sdl2 sdl3 sdl4
SCSI device sdj: 524288000 512-byte hdwr sectors (268435 MB)
 sdj: sdj1
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,3), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,7), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,5), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,6), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,2), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ide-floppy driver 0.99.newide
hda: attached ide-cdrom driver.
hda: ATAPI 24X DVD-ROM drive, 2048kB Cache, UDMA(33)
Uniform. CD-ROM driver Revision: 3.12
hda: DMA disabled

 

/var/log/  messages  的資訊沒有什麼發現: 都是一些監控系統報的資訊 。

 

Jan 13 19:44:46 DELL-RAC02 telnetd[10671]: ttloop: peer died: EOF
Jan 13 19:49:45 DELL-RAC02 telnetd[10889]: ttloop: peer died: EOF
Jan 13 19:54:45 DELL-RAC02 telnetd[11127]: ttloop: peer died: EOF
Jan 13 19:59:45 DELL-RAC02 telnetd[11379]: ttloop: peer died: EOF
Jan 13 20:04:46 DELL-RAC02 telnetd[11599]: ttloop: peer died: EOF
Jan 13 20:09:45 DELL-RAC02 telnetd[11765]: ttloop: peer died: EOF
Jan 13 20:14:45 DELL-RAC02 telnetd[11871]: ttloop: peer died: EOF
Jan 13 20:19:45 DELL-RAC02 telnetd[12028]: ttloop: peer died: EOF
Jan 13 20:24:45 DELL-RAC02 telnetd[12140]: ttloop: peer died: EOF
Jan 13 20:29:45 DELL-RAC02 telnetd[12308]: ttloop: peer died: EOF
Jan 13 20:34:46 DELL-RAC02 telnetd[12484]: ttloop: peer died: EOF
Jan 13 20:39:45 DELL-RAC02 telnetd[12678]: ttloop: peer died: EOF
Jan 13 20:44:45 DELL-RAC02 telnetd[12808]: ttloop: peer died: EOF
Jan 13 20:49:45 DELL-RAC02 telnetd[12975]: ttloop: peer died: EOF
Jan 13 20:54:45 DELL-RAC02 telnetd[13131]: ttloop: peer died: EOF
Jan 13 20:59:45 DELL-RAC02 telnetd[13301]: ttloop: peer died: EOF
Jan 13 21:04:47 DELL-RAC02 telnetd[13439]: ttloop: peer died: EOF
Jan 13 21:09:46 DELL-RAC02 telnetd[13650]: ttloop: peer died: EOF
Jan 13 21:14:46 DELL-RAC02 telnetd[13890]: ttloop: peer died: EOF
Jan 13 21:19:46 DELL-RAC02 telnetd[14099]: ttloop: peer died: EOF
Jan 13 21:24:45 DELL-RAC02 telnetd[14213]: ttloop: peer died: EOF
Jan 13 21:29:45 DELL-RAC02 telnetd[14347]: ttloop: peer died: EOF
Jan 13 21:34:46 DELL-RAC02 telnetd[14593]: ttloop: peer died: EOF
Jan 13 21:39:46 DELL-RAC02 telnetd[14885]: ttloop: peer died: EOF

 

---------------------------------------------------------------------------- 

基本可以排除是硬碟問題,因為皮膚上沒有報錯。 初步確定是檔案系統出現錯誤,和Linux 有關。

查詢了一下baidu , google  , 大多數都是重新啟動系統後OK的 ,我估計也沒有太大問題。但是為了確認以及責任方面的問題, 還是得問問DELL工程是, dell工程師的反饋是讓重新啟動系統 。  具體原因也mail讓他們查詢 。回覆說2.6 核心版本的linux有一些檔案系統的bug ,  但是2.4 的檔案系統目前還沒有bug 。 具體還要執行什麼EMCGrab執行日誌給他 (他們發的一個執行指令碼), 具體原因還待查 。

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/35489/viewspace-539597/,如需轉載,請註明出處,否則將追究法律責任。

相關文章