資料庫版本 11.2.0.3.15
作業系統 CentOS 6.5
檔案系統 ext4

最近發現我們線上資料庫的3臺備庫，全部出現一樣的錯誤如下：
INCIDENT_ID PROBLEM_KEY CREATE_TIME
-------------------- ----------------------------------------------------------- ----------------------------------------
352249 ORA 445 2016-04-07 22:18:02.350000 +08:00
640297 ORA 600 [3020] 2016-11-08 20:09:44.612000 +08:00
640217 ORA 600 [ORA-00600: internal error code, arguments: [3020], 2016-11-08 20:10:19.945000 +08:00
640298 ORA 600 [3020] 2016-11-14 11:40:52.532000 +08:00
640177 ORA 600 [ORA-00600: internal error code, arguments: [3020], 2016-11-14 11:41:32.540000 +08:00

Dump continued from file: /home/oracle/database/diag/rdbms/yjfhisd/yjfhis/trace/yjfhis_mrp0_27052.trc
ORA-00600: internal error code, arguments: [3020], [3], [33188], [12616100], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 3, block# 33188, file offset is 271876096 bytes)
ORA-10564: tablespace UNDOTBS1
ORA-01110: da
========= Dump for incident 640177 (ORA 600 [ORA-00600: internal error code, arguments: [3020], [3], [33188], [12616100], [], [], [], [], [], [], [], []

開始查詢這個錯誤發現可能是BUG引起的，只能是重新進行備庫配置，但是回頭想想為什麼只有這一套庫的STANDBY 出現問題，
回想一下這臺機器因為磁碟I/O不行，我開啟了非同步I/O，讓I/O進入核心態快取，加快ORACLE I/O的速度
也就是
filesystemio_options=SETALL
初衷是好的，但是經過查詢MOS和一些網友的文章也發現同樣的問題。
MOS：
ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (文件 ID 1487957.1)
這個文章指出，不光是備庫，主庫的各種檔案都可能出現問題，臥槽這個問題就大了。只有改引數重啟資料庫了。

同時我注意到這樣一句話：
Database files on ext4 File System on Linux and Database parameter filesystemio_options is set to SETALL.
This is a Linux defect when using O_SYNC|O_DIRECT on ext4 file systems
(filesystemio_options=SETALL open the database files using O_SYNC|O_DIRECT).

也就是說ORACLE呼叫LINUX API OPEN 函式的時候由LINUX決定使用使用O_SYNC還是O_DIRECT
那麼這也就說明了一點為什麼這個引數是靜態的，
因為ORACLE程式呼叫 OPEN函式呼叫開啟檔案後返回一個檔案描述符給 ORACLE程式，除非資料庫
程式終止，一直在read write的情況下這個不可能動態修改。

再來看
O_SYNC,O_DIRECT
O_DIRECT (Since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special
situations, such as when applications do their own caching. File I/O is done directly to/from user space buffers. The I/O is syn-
chronous, that is, at the completion of a read(2) or write(2), data is guaranteed to have been transferred. See NOTES below for further
discussion.
大概意思是說，最小化系統cache，這裡應該是值的核心緩衝區，一般情況會導致效能問題，但是如果應用程式有自己的快取，可以這樣幹。

O_SYNC The file is opened for synchronous I/O. Any write(2)s on the resulting file descriptor will block the calling process until the data
has been physically written to the underlying hardware. But see NOTES below.

再來看看ORACLE對這個引數的描述：

FILESYTEMIO_OPTIONS can be set to one of the following values:
ASYNCH: enable asynchronous I/O on file system files, which has no timing requirement for transmission.
在檔案系統檔案上啟用非同步I/O，在資料傳送上沒有計時要求。
DIRECTIO: enable direct I/O on file system files, which bypasses the buffer cache.
在檔案系統檔案上啟用直接I/O，繞過buffer cache。
SETALL: enable both asynchronous and direct I/O on file system files.
在檔案系統檔案上啟用非同步和直接I/O。
NONE: disable both asynchronous and direct I/O on file system files.
在檔案系統檔案上禁用非同步和直接I/O。

然後看看LINUX為什麼把它設定為NONE
Linux none
Oracle only supports native Linux asynchronous I/O,
which requires also using direct I/O.
Older Linux kernels do not support native asynchronous I/O.

LINUX老版本不支援(沒有)原生的非同步I/O(native asynchronous I/O),其實
回想MYSQL的早期版本的實現是MYSQL層自己模擬的非同步的I/O，後來
mysql也採用LINUX的原生非同步I/O。
ORACLE出於這種考慮設定為了NONE，其實還應該加一句就是回觸發本文的
這個問題。

最後我們看看MYSQL innodb 中一般也有一個差不多的引數：
innodb_flush_method 這個引數一般設定為O_DIRECT，繞開核心緩衝區
http://blog.itpub.net/7728585/viewspace-1980262/

總之在linux 5 6 EXT4檔案系統上主備不要設定
filesystemio_options=SETALL
另外設定這種底層的引數，需要參考MOS的建議或檢視一下有沒有什麼bug或者
嚴重問題,千萬不要想當然，除非大家都這樣用。。

CentOS 6.5 ext4 filesystemio_options=SETALL 備庫archivelog 損壞問題

相關文章