ORA-19599 When Backing up an Archivelog that is Corrupt

潇湘隐者發表於2024-04-19

前幾天遇到了一起備份失敗案例,RMAN備份過程中遇到了歸檔日誌損壞的情況,還是第一次遇到這種案例,這裡記錄一下這個案例的具體情況。

備份作業失敗,檢查RMAN備份的輸出日誌,發現一個歸檔日誌檔案損壞(corrupt)了,如下所示:

RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
RMAN-08515: archived log file name=/eapdblog/eap_1_666_1155313416.arc thread=1 sequence=666
RMAN-08137: warning: archived log not deleted, needed for standby or upstream capture process
RMAN-08515: archived log file name=/eapdblog/eap_1_667_1155313416.arc thread=1 sequence=667
RMAN-03009: failure of backup command on dev_0 channel at 04/09/2024 09:44:50
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: non RMAN, but media manager or vendor specific failure, error text:
Vendor specific error: OB2_EndObjectBackup() failed ERR(-2)
ORA-19599: block number 316064 is corrupt in archived log /eapdblog/eap_1_660_1155313416.arc

檢查驗證歸檔日誌,發現歸檔日誌檔案eap_1_660_1155313416.arc確實損壞。如下所示:

RMAN> validate archivelog all;

Starting validate at 09-APR-24
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=261 device type=DISK
channel ORA_DISK_1: starting validation of archived log
channel ORA_DISK_1: specifying archived log(s) for validation
input archived log thread=1 sequence=660 RECID=645 STAMP=1165788069
input archived log thread=1 sequence=663 RECID=648 STAMP=1165824445
input archived log thread=1 sequence=664 RECID=649 STAMP=1165828881
input archived log thread=1 sequence=665 RECID=650 STAMP=1165829178
input archived log thread=1 sequence=666 RECID=651 STAMP=1165829976
input archived log thread=1 sequence=667 RECID=652 STAMP=1165830268
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Archived Logs
=====================
Thrd Seq Status Blocks Failing Blocks Examined Name
---- ------- ------ -------------- --------------- ---------------
1 660 FAILED 8 346599 /eapdblog/eap_1_660_1155313416.arc
1 663 OK 0 382900 /eapdblog/eap_1_663_1155313416.arc
1 664 OK 0 94593 /eapdblog/eap_1_664_1155313416.arc
1 665 OK 0 1748 /eapdblog/eap_1_665_1155313416.arc
1 666 OK 0 17557 /eapdblog/eap_1_666_1155313416.arc
1 667 OK 0 4226 /eapdblog/eap_1_667_1155313416.arc
validate found one or more corrupt blocks
See trace file /eapdb/diag/rdbms/eap/eap/trace/eap_ora_917867.trc for details
Finished validate at 09-APR-24

RMAN> exit

檢查告警日誌,也看到下面資訊。

2024-04-08T23:15:05.730996+08:00

***
Corrupt block seq: 660 blocknum=316064.
Bad header found during backing up archived log
Data in bad block - flag:1. format:34. bno:93696. seq:649
beg:16 cks:21324
calculated check value: 21324

Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
2024-04-08T23:15:21.671470+08:00

***
Corrupt block seq: 660 blocknum=316064.
Bad header found during backing up archived log
Data in bad block - flag:1. format:34. bno:93696. seq:649
beg:16 cks:21324
calculated check value: 21324

Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
Reread of seq=660, blocknum=316064, file=/eapdblog/eap_1_660_1155313416.arc, found same corrupt data
2024-04-08T23:15:36.695623+08:00

雖然知道歸檔日誌損壞了,但是不清楚什麼原因導致歸檔日誌損壞,之前也見過別人分享的案例ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (Doc ID 1487957.1),但是當前環境如下所示,跟Doc ID 1487957.1中案例環境完全不一樣

作業系統  :Red Hat Enterprise Linux release 8.8 (Ootpa)

資料庫版本: Oracle 19c Enterprise Edition 19.20.0.0.0

檔案系統為: xfs

開了Service Requests,然後提交各種日誌,以及損壞歸檔日誌的dump檔案,最後官方反饋跟未公開的兩個bug非常相似(下面截圖)。不過這種現象發生的頻率非常少。還是第一次遇到這種錯誤。官方技術支援建議,如果這種情況出現的頻率很少,建議觀察,如果出現頻率很高,建議修改filesystemio_options為directio來規避這個問題。

sqlplus / as sysdba
oradebug setmypid
oradebug tracefile_name
alter system dump logfile '/eapdblog/eap_1_660_1155313416.arc' VALIDATE;

做了如下操作處理,然後重新做了RMAN完整備份,又觀察了好幾天,暫時一直未遇到這個錯誤。

手工刪除這個損壞的歸檔日誌
RMAN > crosscheck archivelog all;
RMAN> DELETE EXPIRED ARCHIVELOG sequence 660;

相關文章