資料庫版本：Oracle 12.1.2 相同版本遷移。

X4 作業系統為RHEL 5.10。

X7 作業系統為RHEL 7.4。

透過複製修改X4資料庫引數檔案到X7上，啟動資料庫到NOMOUNT階段正常，建立SPFILE檔案指定到ASM磁碟組時，例項異常停止，alert日誌如下：

ORACLE_BASE from environment = /u01/app/oracle

Wed Jul 10 15:13:08 2019

WARNING: unknown state for DB spfile location resource, Return Value: 3

The spfile name is ?/dbs/spfile@.ora

Wed Jul 10 15:13:15 2019

DSKM process appears to be hung. Initiating system state dump.

Wed Jul 10 15:13:15 2019

System state dump requested by (instance=1, osid=305061 (GEN0)), summary=[system state dump request (ksz_check_ds)].

System State dumped to trace file /u01/app/oracle/diag/rdbms/gncdb/gncdb1/trace/gncdb1_diag_305067_20190710151315.trc

Wed Jul 10 15:13:17 2019

Decreasing number of real time LMS from 3 to 0

Wed Jul 10 15:13:45 2019

Errors in file /u01/app/oracle/diag/rdbms/gncdb/gncdb1/trace/gncdb1_dskm_305077.trc:

ORA-56867: Cannot connect to Master Diskmon on pipe "default pipe"

ORA-27300: OS system dependent operation:connect failed with status: 2

ORA-27301: OS failure message: No such file or directory

ORA-27302: failure occurred at: skgznpcon6

Wed Jul 10 15:13:45 2019

USER (ospid: 305077): terminating the instance due to error 56867

Wed Jul 10 15:13:45 2019

System state dump requested by (instance=1, osid=305077 (DSKM)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/oracle/diag/rdbms/gncdb/gncdb1/trace/gncdb1_diag_305067_20190710151345.trc

Wed Jul 10 15:13:45 2019

Dumping diagnostic data in directory=[cdmp_20190710151345], requested by (instance=1, osid=305077 (DSKM)), summary=[abnormal instance termination].

Wed Jul 10 15:13:46 2019

Instance terminated by USER, pid = 305077

Wed Jul 10 15:15:43 2019

WARNING: unknown state for DB spfile location resource, Return Value: 3

根據日誌資訊由於DSKM( This process is active only if Exadata Storage is used. DSKM performs operations related to Exadata I/O fencing and Exadata cell failure handling. ) 程式掛起，觸發Oracle system dump, 根據檢視DSKM日誌資料庫由於ORA-56867錯誤導致DIAG程式crash資料庫。

Oracle ora-27300 相關錯誤多數由於系統相關資源限制導致，檢查系統資源及系統messages日誌未發現相關錯誤，由於已經有一套資料庫存在也排除由於系統資源引起錯誤的產生。

在ASM磁碟組可以建立目錄，將引數檔案複製到磁碟組中嘗試啟動，例項終止，報錯資訊和最初錯誤一致。

根據相關錯誤資訊，在MOS上查詢相關問題，發現跟29164963類似，該BUG影響版本為 Exadata Storage Server Software 19 ，根據相關收集日誌，該套EXADATA版本為19，在問題範圍以內，根據MOS相關資訊進行調整後

On database servers perform the following steps:

1. Add the following lines to the tmpfiles.d(5) configuration file /usr/lib/tmpfiles.d/tmp.conf :

x /tmp/.oracle*

x /var/tmp/.oracle*

x /usr/tmp/.oracle*

2. Restart systemd-tmpfiles-clean.timer service by running the following command as the root user:

# systemctl restart systemd-tmpfiles-clean.timer

3. If the system has already been affected by one of the errors described in the Symptoms section above, then restart clusterware.

4. Review open Advanced Intrusion Detection Environment (AIDE) alerts.

The change to /usr/lib/tmpfiles.d/tmp.conf must be registered in the AIDE database so critical software alerts are not generated as a result of the change. Before updating the AIDE database, review and resolve open AIDE alerts by running the following DBMCLI command:

DBMCLI> list alerthistory where alertDescription like '.*AIDE.*' and endTime = null;

For details about AIDE see Security Guide for Exadata Database Machine.

5. Update the AIDE database by running the following command as the root user:

# /opt/oracle.SupportTools/exadataAIDE -update

調整以上內容後，重啟CRS叢集，發現原系統上資料庫無法啟動，資料庫報錯資訊如下：

ORA-00210: cannot open control file

ORA-00202: error in writing''+RECODG/utsdb/controlfile/current.256.732754521''

ORA-17503: ksfdopn: 2 Failed to open file +RECODG/utsdb/controlfile/current.256.732754521

ORA-15001: diskgroup "RECODG" does not exist or is not mounted

ORA-15055: unable to connect to ASM instance

ORA-27140: attach to post/wait facility failed

ORA-27300: OS system dependent operation:invalid_euid failed with status: 1

ORA-27301: OS failure message: Not owner

ORA-27302: failure occurred at: skgpwinit5

ORA-27303: additional information: startup euid = 100 (grid), current euid = 101 (oracle)

根據錯誤資訊該問題多數由於grid的oracle檔案許可權導致，檢視該檔案許可權為-rwxrwxr-x，調整為正確許可權後，重啟CRS叢集恢復正常。

調整方式為：chmod 6751 oracle

資料庫恢復後，繼續原來恢復操作，透過PFILE建立SPFILE到ASM磁碟組恢復成功。

參考文件

(EX50) Exadata 19.1 / Oracle Linux 7 systemd-tmpfiles cleanup can cause database startup/connection failure, or clusterware connection failure ( 文件 ID 2498572.1)

Startup Instance Failed with ORA-27140 ORA-27300 ORA-27301 ORA-27302 and ORA-27303 on skgpwinit6 ( 文件 ID 1274030.1)

X7一體機資料庫遷移問題處理

參考文件

相關文章