【DataGuard】錯誤的log_file_name_convert引數導致物理Data Guard配置故障分析與處理

secooler發表於2011-06-17
  在同一臺主機上部署物理Data Guard,或部署的Data Guard對應的資料庫檔案、日誌檔案目錄不一致時,便會使用到db_file_name_convert引數和log_file_name_convert引數。

  本文討論當log_file_name_convert引數未設定或者設定不當而導致的Data Guard部署故障場景及處理分析過程。

  關於同一臺主機部署物理Data Guard的方法請參考文章《【DataGuard】同一臺主機實現物理Data Guard配置安裝》(http://space.itpub.net/519536/viewspace-578181)。

  這裡有關Data Guard配置過程中需要準備的資料檔案、日誌檔案以及Standby控制檔案均已準備完畢,準備方法這裡不贅述。這裡我們將備庫pfile檔案中的log_file_name_convert引數內容註釋掉。

1.調整後的備庫pfile內容
secdg@secdb /home/oracle$ cat $ORACLE_HOME/dbs/initsecdg.ora
*.audit_file_dest='/u01/app/oracle/admin/secdg/adump'
*.background_dump_dest='/u01/app/oracle/admin/secdg/bdump'
*.control_files='/u01/app/oracle/oradata/secdg/cfile/control01.ctl','/u01/app/oracle/oradata/secdg/cfile/control02.ctl','/u01/app/oracle/oradata/secdg/cfile/control03.ctl'
*.core_dump_dest='/u01/app/oracle/admin/secdg/cdump'
*.db_file_multiblock_read_count=8# SMALL
*.db_files=80# SMALL
*.db_name='secdb'
*.global_names=TRUE
*.job_queue_processes=5
*.log_buffer=32768# SMALL
*.log_checkpoint_interval=10000
*.max_dump_file_size='10240'# limit trace file size to 5 Meg each
*.parallel_max_servers=5# SMALL
*.processes=200# SMALL
*.sga_max_size=300M
*.sga_target=300M
*.undo_management='AUTO'
*.user_dump_dest='/u01/app/oracle/admin/secdg/udump'

## Parameters for Standby Database.
db_name='secdb'
db_unique_name=secdg
log_archive_format=log%t_%s_%r.arc
log_archive_config='DG_CONFIG=(secdb,secdg)'
fal_server=secdb
fal_client=secdg
standby_file_management=AUTO
db_file_name_convert='/u01/app/oracle/oradata/secdb/dfile/','/u01/app/oracle/oradata/secdg/dfile/'
#log_file_name_convert='/u01/app/oracle/oradata/secdb/lfile/','/u01/app/oracle/oradata/secdg/lfile/'
standby_archive_dest='/home/oracle/arch/secdg'
## Parameters which using for switch over from Standby to Primary.
LOG_ARCHIVE_DEST_1='LOCATION=/home/oracle/arch/secdg VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=secdg'
LOG_ARCHIVE_DEST_2='SERVICE=secdb LGWR SYNC  VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=secdb'
log_archive_dest_state_1=ENABLE
log_archive_dest_state_2=ENABLE
remote_login_passwordfile=EXCLUSIVE

注意,這裡我們註釋掉了“log_file_name_convert”引數,表示在啟動Data Guard庫的時候日誌檔案將不被轉換。看一下結果。

2.啟動備庫物理Data Guard
secdb@secdb /home/oracle$ export ORACLE_SID=secdg
secdg@secdb /home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jun 17 18:11:48 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to an idle instance.

NotConnected@> startup nomount
ORACLE instance started.

Total System Global Area  314572800 bytes
Fixed Size                  1219184 bytes
Variable Size              96470416 bytes
Database Buffers          213909504 bytes
Redo Buffers                2973696 bytes
NotConnected@> alter database mount standby database;

Database altered.

3.確認log_file_name_convert引數資訊
NotConnected@> show parameter convert

NAME                   TYPE    VALUE
---------------------- ------- -------------------------------------
db_file_name_convert   string  /u01/app/oracle/oradata/secdb/dfile/,
                               /u01/app/oracle/oradata/secdg/dfile/
log_file_name_convert  string

此處顯示“log_file_name_convert”內容為空,表示未設定。

4.由log_file_name_convert引發的配置故障結果
NotConnected@> col GROUP# for 999999
NotConnected@> col STATUS for a5
NotConnected@> col TYPE for a7
NotConnected@> col MEMBER for a50
NotConnected@> select * From v$logfile order by 1,4;

 GROUP# STATU TYPE    MEMBER                                             IS_
------- ----- ------- -------------------------------------------------- ---
      1 STALE ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo01.log     NO
      1 STALE ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo01_b.log   NO
      2       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo02.log     NO
      2       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo02_b.log   NO
      3       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo03.log     NO
      3       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo03_b.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo04_a.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo04_b.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo05_a.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdb/lfile/redo05_b.log   NO
      6       STANDBY /u01/app/oracle/oradata/secdb/lfile/log6a.rdo      NO
      6       STANDBY /u01/app/oracle/oradata/secdb/lfile/log6b.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdb/lfile/log7a.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdb/lfile/log7b.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdb/lfile/log8a.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdb/lfile/log8b.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdb/lfile/log9a.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdb/lfile/log9b.rdo      NO
     10       STANDBY /u01/app/oracle/oradata/secdb/lfile/log10a.rdo     NO
     10       STANDBY /u01/app/oracle/oradata/secdb/lfile/log10b.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdb/lfile/log11a.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdb/lfile/log11b.rdo     NO

22 rows selected.


這裡便體現了這種配置的問題,由於沒有設定log_file_name_convert引數,導致備庫使用到的日誌檔案不正確,這裡顯示的是主庫日誌資訊!

為方便比較,正確的內容如下:
NotConnected@> select * From v$logfile order by 1,4;

 GROUP# STATU TYPE    MEMBER                                             IS_
------- ----- ------- -------------------------------------------------- ---
      1       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo01.log     NO
      1       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo01_b.log   NO
      2       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo02.log     NO
      2       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo02_b.log   NO
      3       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo03.log     NO
      3       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo03_b.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo04_a.log   NO
      4       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo04_b.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo05_a.log   NO
      5       ONLINE  /u01/app/oracle/oradata/secdg/lfile/redo05_b.log   NO
      6       STANDBY /u01/app/oracle/oradata/secdg/lfile/log6a.rdo      NO
      6       STANDBY /u01/app/oracle/oradata/secdg/lfile/log6b.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdg/lfile/log7a.rdo      NO
      7       STANDBY /u01/app/oracle/oradata/secdg/lfile/log7b.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdg/lfile/log8a.rdo      NO
      8       STANDBY /u01/app/oracle/oradata/secdg/lfile/log8b.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdg/lfile/log9a.rdo      NO
      9       STANDBY /u01/app/oracle/oradata/secdg/lfile/log9b.rdo      NO
     10       STANDBY /u01/app/oracle/oradata/secdg/lfile/log10a.rdo     NO
     10       STANDBY /u01/app/oracle/oradata/secdg/lfile/log10b.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdg/lfile/log11a.rdo     NO
     11       STANDBY /u01/app/oracle/oradata/secdg/lfile/log11b.rdo     NO

22 rows selected.


5.嘗試主備庫Switch Over後在secdg例項切換日誌
secdg例項的日誌中將會記在如下報錯
Fri Jun 17 22:51:56 2011
Errors in file /u01/app/oracle/admin/secdg/bdump/secdg_lgwr_1165.trc:
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02_b.log'
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02.log'
Fri Jun 17 22:51:56 2011
Errors in file /u01/app/oracle/admin/secdg/bdump/secdg_lgwr_1165.trc:
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02_b.log'
ORA-00314: log 2 of thread 1, expected sequence# 44 doesn't match 0
ORA-00312: online log 2 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo02.log'
Fri Jun 17 22:51:56 2011
LGWR: terminating instance due to error 314
Instance terminated by LGWR, pid = 1165

這便是由於log_file_name_convert引數沒有設定導致備庫錯誤地用到了“原主庫日誌檔案”的嚴重後果!
secdg資料庫例項因此而被殺死!嘗試重新啟動的話,仍然會報同樣的錯誤。
secdg@secdb /home/oracle$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jun 17 23:00:47 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to an idle instance.

NotConnected@> startup;
ORACLE instance started.

Total System Global Area  314572800 bytes
Fixed Size                  1219184 bytes
Variable Size              96470416 bytes
Database Buffers          213909504 bytes
Redo Buffers                2973696 bytes
Database mounted.
ORA-00314: log 1 of thread 1, expected sequence# 43 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01_b.log'
ORA-00314: log 1 of thread 1, expected sequence# 43 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01.log'

到此切換後的主庫(原備庫)將不在可用!

6.故障處理
既然問題出在備庫的配置上,並且備庫已經不在可用。最後的可行的出路便是將現在的備庫(secdb例項,及原主庫)調整為主庫以供生產所用。
我們能採用的方法便是強制將備庫做Failover處理。具體操作如下。
NotConnected@> alter database recover managed standby database finish force;

Database altered.

NotConnected@> alter database commit to switchover to primary;

Database altered.

sys@secdb> select DATABASE_ROLE from v$database;

DATABASE_ROLE
----------------
PRIMARY

sys@secdb> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

後續的拯救措施便是,根據這個Failover出來的主庫重新按照正確的方法重建Data Guard備庫。

7.另外一種故障場景
  另外一種故障場景是當主庫切換日誌時遇到類似故障,主庫將不再可用。此時的處理方式只能是放棄主庫,然後將備庫切換為主庫,然後再重建整個Data Guard環境。
類似的報錯資訊如下:
Errors in file /u01/app/oracle/admin/secdb/udump/secdb_ora_2201.trc:
ORA-00314: log 1 of thread 1, expected sequence# 60 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01_b.log'
ORA-00314: log 1 of thread 1, expected sequence# 60 doesn't match 0
ORA-00312: online log 1 thread 1: '/u01/app/oracle/oradata/secdb/lfile/redo01.log'

8.小結
  透過這個故障場景的再現及處理過程可以得出一個結論,我們在配置Data Guard環境時一定要對每個引數的含義和作用做到了如指掌,防止出現故障素手無策。
  另外一個老生常談的結論是:“有備無患”!,在準備對生產庫做重大調整前一定要對資料庫做好有效的備份。這樣在出現類似文章提到的Data Guard配置故障後,可以非常高效的進行恢復。

Good luck.

secooler
11.06.17

-- The End --

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/519536/viewspace-700320/,如需轉載,請註明出處,否則將追究法律責任。

相關文章