系統表空間IO錯誤 資料損壞處理一則

wmlm發表於2008-05-09
===========================================================
系統表空間IO錯誤 資料損壞處理一則
===========================================================

同事最近遇到一個資料庫問題,說是系統表空間出現壞塊,警告日誌檔案中不斷出現如下錯誤:

[oracle@gdmstest bdump]$ tail -20 alert_mydb.log
Linux Error: 4: Interrupted system call
Additional information: 23710
Wed Oct 25 16:47:44 2006
Errors in file /opt/oracle/admin/mydb/bdump/mydb_smon_19646.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01115: IO error reading block from file 1 (block # 23712)
ORA-01110: data file 1: '/opt/oracle/oradata/mydb/system01.dbf'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 4: Interrupted system call
Additional information: 23710
Wed Oct 25 16:47:59 2006
Errors in file /opt/oracle/admin/mydb/bdump/mydb_smon_19646.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01115: IO error reading block from file 1 (block # 23712)
ORA-01110: data file 1: '/opt/oracle/oradata/mydb/system01.dbf'
ORA-27091: skgfqio: unable to queue I/O
ORA-27072: skgfdisp: I/O error
Linux Error: 4: Interrupted system call
Additional information: 23710

而透過dbv檢查又沒有報資料塊損壞:

[oracle@gdmstest mydb]$ dbv file=system01.dbf blocksize=8192

DBVERIFY: Release 9.2.0.4.0 - Production on Thu Oct 26 11:36:42 2006

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

DBVERIFY - Verification starting : FILE = system01.dbf


DBVERIFY - Verification complete

Total Pages Examined : 23709
Total Pages Processed (Data) : 13000
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 2090
Total Pages Failing (Index): 0
Total Pages Processed (Other): 1377
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 7242
Total Pages Marked Corrupt : 0
Total Pages Influx : 0

我們一起來看看這個問題,首先從錯誤日誌來看,其實這並不是一個資料塊損壞的問題:
ORA-01115: IO error reading block from file 1 (block # 23712)

這是個IO錯誤,資料塊不能讀取。

而DBV的提示也只是說檢查了23709個資料塊,這些資料塊沒有問題,而我們真正報錯的資料塊是23712號資料塊,也就是說DBV檢查到這個塊附近,無法繼續讀取,進而退出。

而系統表空間遠遠大於 23709 * 8k / 1024 = 185M。

此時檢查系統日誌,dmesg日誌中有大量的定址錯誤,也就是說硬體出現了故障:

[maintain@gdmstest bdump]$ dmesg
: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=58847319, high=3, low=8515671, sector=14266880
end_request: I/O error, dev 03:06 (hda), sector 14266880
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }

至此問題被定位。

如果我們嘗試cp系統表空間檔案,同樣會收到硬體的錯誤提示資訊:

[oracle@gdmstest mydb]$ cp system01.dbf system01.dbf.bk
cp: 正在讀入‘system01.dbf’: 輸入/輸出錯誤
[oracle@gdmstest mydb]$ ll
總用量 2173060
....
-rw-r----- 1 oracle dba 524296192 10月 25 16:49 system01.dbf
-rw-r----- 1 oracle dba 194236416 10月 25 17:00 system01.dbf.bk
...............

只能複製194236416 Bytes,也就是 194236416 / 8192 = 23710.5,同樣是讀到23709個資料塊左右,硬體的損壞就要透過系統的其它手段去解決了。

 



引文來源  王旺的書房 | 系統表空間IO錯誤 資料損壞處理一則


Link URL: http://wworacle.blog.163.com/blog/static/21268725200849114220744

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/271063/viewspace-263473/,如需轉載,請註明出處,否則將追究法律責任。

相關文章