Oracle 19c透過recover standby database from service修復GAP案例

潇湘隐者發表於2024-08-13

原文網址 : https://www.cnblogs.com/kerrycode/p/18356564

案例介紹

環境介紹

作業系統： Red Hat Enterprise Linux release 8.10 (Ootpa)
資料庫版本： Oracle 19.23.0.0.0

上週五，系統管理員需要給Linux升級補丁，UAT環境下的一套DG，資料庫沒有正常關閉的情況下，作業系統升級補丁後強制reboot了，週一早上處理的過程中遇到下面錯誤：

備庫的告警日誌有下面錯誤資訊(GAP sequence提示資訊)：

PR00 (PID:145361): FAL: Failed to request gap sequence
PR00 (PID:145361):  GAP - thread 1 sequence 94-94
PR00 (PID:145361):  DBID 1790039322 branch 1173452819
PR00 (PID:145361): FAL: All defined FAL servers have been attempted
PR00 (PID:145361): -------------------------------------------------------------------------
PR00 (PID:145361): Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
PR00 (PID:145361): parameter is defined to a value that's sufficiently large
PR00 (PID:145361): enough to maintain adequate log switch information to resolve
PR00 (PID:145361): archived redo log gaps.
PR00 (PID:145361): -------------------------------------------------------------------------
2024-07-22T10:26:06.796447+08:00

主庫查詢


set linesize 720
col name for a60
col creator for a12
select name,creator,sequence#,applied,completion_time from v$archived_log where sequence#=94;


SQL> set linesize 720
SQL> col name for a60
SQL> col creator for a12
SQL> select name,creator,sequence#,applied,completion_time from v$archived_log where sequence#=94;

NAME                                                         CREATOR       SEQUENCE# APPLIED   COMPLETIO
------------------------------------------------------------ ------------ ---------- --------- ---------
/gspdblog/gspprod_1173452819_1_94.arc                        ARCH                 94 NO        19-JUL-24

SQL>

檢查確認主庫上的歸檔日誌gspprod_1173452819_1_94.arc已經被刪除了。

檢查備庫最後應用的歸檔日誌資訊：


SQL> set pages 1000 lines 1000
SQL> SELECT al.thrd "Thread", almax "Last Seq Received", lhmax "Last Seq Applied"
  2  FROM (select thread# thrd, MAX(sequence#) almax
  3  FROM v$archived_log
  4  WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) al,
  5  (SELECT thread# thrd,   MAX(sequence#) lhmax
  6  FROM v$log_history
  7  WHERE resetlogs_change#=(SELECT resetlogs_change# FROM v$database) GROUP BY thread#) lh
  8  WHERE al.thrd = lh.thrd;

    Thread Last Seq Received Last Seq Applied
---------- ----------------- ----------------
         1               111               93

SQL> 



SQL> col client_pid for a10
SQL> select inst_id, thread#, process, pid, status, client_process, client_pid,
  2  sequence#, block#, active_agents, known_agents from gv$managed_standby order by thread#, pid;

   INST_ID    THREAD# PROCESS   PID                      STATUS       CLIENT_P CLIENT_PID  SEQUENCE#     BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
---------- ---------- --------- ------------------------ ------------ -------- ---------- ---------- ---------- ------------- ------------
         1          0 DGRD      143439                   ALLOCATED    N/A      N/A                 0          0             0       0
         1          0 DGRD      143441                   ALLOCATED    N/A      N/A                 0          0             0       0
         1          0 RFS       147165                   IDLE         UNKNOWN  148946              0          0             0       0
         1          0 RFS       147167                   IDLE         UNKNOWN  148942              0          0             0       0
         1          0 RFS       147169                   IDLE         UNKNOWN  148944              0          0             0       0
         1          1 ARCH      143437                   CLOSING      ARCH     143437             91     700416             0       0
         1          1 ARCH      143443                   CLOSING      ARCH     143443             92     704512             0       0
         1          1 ARCH      143445                   CLOSING      ARCH     143445             93     696320             0       0
         1          1 ARCH      143447                   CLOSING      ARCH     143447             90     708608             0       0
         1          1 MRP0      145359                   WAIT_FOR_GAP N/A      N/A                94          0             9       9
         1          1 RFS       145885                   IDLE         Archival 148936              0          0             0       0
         1          1 RFS       147161                   IDLE         LGWR     148948            112     321083             0       0

12 rows selected.

SQL>

因為UAT環境沒有備份歸檔日誌，而主庫上都設定了一個作業清除兩天前的歸檔日誌(資源不足，需要定期清理歸檔日誌)，正常情況下，這個作業並不會帶來什麼問題，而由於上週五升級系統補丁，資料庫停了2天，但是這個作業並沒有停止(crontab作業)，週一處理的時候，這個作業已經將兩天前的歸檔日誌給清理了。導致備用資料庫無法獲取序列號(SEQUENCE#)為94的歸檔日誌。此時由於出現歸檔日誌的GAP導致備用資料庫無法同步資料，這種情況下，我們打算用Oracle 18.1提供的新特性來恢復物理備庫，如下所示

RMAN> RECOVER STANDBY DATABASE FROM SERVICE primary_connect_identifier;

This command will internally keep track of standby file locations, refresh standby controlfile from primary, 
update the new standby controlfile with standby file names, perform incremental backup on primary, transfer 
the backup-pieces over network to standby and perform recovery on standby

主要是這種新特性來恢復備用資料庫非常方便，一條命令即可搞定，相比之前的增量備份/還原要簡單很多。

操作步驟：

取消redo應用(備用資料庫)

SQL> alter database recover managed standby database cancel;

Database altered.

將備用資料庫啟動到MOUNT狀態

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 8589932424 bytes
Fixed Size                 13932424 bytes
Variable Size            2348810240 bytes
Database Buffers         6207569920 bytes
Redo Buffers               19619840 bytes
Database mounted.

執行下面命令

# 注意，不執行此命令，會遇到RMAN-05150
SQL> recover managed standby database cancel;  
Media recovery complete.
SQL>

如果不執行上面命令，在RMAN做recover standby database時會遇到RMAN-05150錯誤，如下案例所示：

RMAN> recover standby database from service gsp;

Starting recover at 22-JUL-24
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/22/2024 10:46:14
RMAN-05150: Managed Recovery Process must be disabled before running RECOVER STANDBY DATABASE.

4: 備庫執行修復命令，開始線上重新整理備庫

RMAN> recover standby database from service gsp;

注意，這裡RMAN連線資料庫的方式請選擇賬號密碼，不要使用系統認證方式，否則可能會遇到錯誤案例1.

正常情況下，你會看到類似這樣的輸出資訊

.............................
starting media recovery

media recovery complete, elapsed time: 00:00:00
Finished recover at 22-JUL-24
Finished recover at 22-JUL-24

5：啟動資料庫恢復同步

在sqlplus中執行下面命令

alter databae open;
alter pluggable database all open;
alter database recover managed standby database using current logfile disconnect;

下面還介紹一下，在操作過程中容易踩到的坑或錯誤：

錯誤案例1

使用RMAN恢復備庫時，遇到ORA-17629: Cannot connect to the remote database server錯誤，如下所示

$ rman target /

Recovery Manager: Release 19.0.0.0.0 - Production on Mon Jul 22 10:59:06 2024
Version 19.23.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: gsp (DBID=1790039322, not open)

RMAN> recover standby database from service gsp;

Starting recover at 22-JUL-24
Oracle instance started

Total System Global Area    8589932424 bytes

Fixed Size                    13932424 bytes
Variable Size               2348810240 bytes
Database Buffers            6207569920 bytes
Redo Buffers                  19619840 bytes

contents of Memory Script:
{
   restore standby controlfile from service  'gsp';
   alter database mount standby database;
}
executing Memory Script

Starting restore at 22-JUL-24
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=5 device type=DISK

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/22/2024 10:59:33
RMAN-03015: error occurred in stored script Memory Script
ORA-17629: Cannot connect to the remote database server
ORA-17627: 
ORA-17629: Cannot connect to the remote database server

RMAN> exit

遇到這個錯誤，是因為RMAN連線資料庫使用作業系統認證，這種方式連線遠端資料庫(remote database server)就會有問題，應該改成賬號密碼認證方式連線資料庫。這樣就不會遇到這個錯誤了。

正確方式

rman target sys/user_password

錯誤方式

rman target /

錯誤案例2

還原恢復過程遇到下面一系列ORA錯誤。具體如下所示：

.......................................................................
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/22/2024 11:08:40
RMAN-03015: error occurred in stored script Memory Script
ORA-19849: error while reading backup piece from service gsp
ORA-19573: cannot obtain exclusive enqueue for datafile 18
ORA-19890: data file already in use
ORA-45909: restore, recover or block media recovery may be in progress
ORA-19660: some files in the backup set could not be verified
ORA-19661: datafile 18 could not be verified due to corrupt blocks
ORA-19849: error while reading backup piece from service gsp
ORA-19573: cannot obtain exclusive enqueue for datafile 18
ORA-19890: data file already in use
ORA-45909: restore, recover or block media recovery may be in progress

RMAN>

檢查備庫

select process,status,sequence#,thread# from gv$managed_standby where process like 'MRP%';

SQL> select process,status,sequence#,thread# from gv$managed_standby where process like 'MRP%';

PROCESS   STATUS        SEQUENCE#    THREAD#
--------- ------------ ---------- ----------
MRP0      WAIT_FOR_GAP         94          1

在dgmgr中執行命令

DGMGRL> edit database gspro set state='APPLY-OFF';
Succeeded.
DGMGRL>

重新驗證(備用資料庫)MRP程序是否已經結束。

SQL> select process,status,sequence#,thread# from gv$managed_standby where process like 'MRP%';

no rows selected

SQL>

如上所示，MRP程序已經不存在了，就可以重新進行還原恢復操作。

參考資料:

ORA-19573 when trying to restore to standby with incremental backup From Primary or During any RMAN restore operation (Doc ID 1646232.1)
ORA-17629 with RMAN 'From Service' Command (Doc ID 2960469.1)

透過RMAN備份standby database成功恢復還原
2018-05-08
Database
Oracle DG Standby Database型別
2022-02-03
OracleDatabase型別
Oracle DG建立Physical Standby Database
2022-02-03
OracleDatabase
Oracle DG建立Logical Standby Database
2022-02-03
OracleDatabase
Oracle 12.2 使用Database Link優化Standby Database Workload
2019-01-24
OracleDatabase優化
Convert a Physical Standby Database into a Snapshot Standby Database
2019-05-10
Database
Oracle Database Cloud - Database as a Service Quick Start
2018-12-24
OracleDatabaseCloudUI
Oracle 19c Concepts(01)：Introduction to Oracle Database
2019-03-08
OracleDatabase
Oracle 19c Concepts(13)：Oracle Database Instance
2019-03-27
OracleDatabase
Oracle 19c Database Management Tools
2020-12-12
OracleDatabase
【12c】12c RMAN新特性之通過網路遠端恢復資料庫（RESTORE/Recover from Service）
2018-04-09
資料庫REST
Sqlcl 連線Oracle DataBase 19c
2020-02-05
SQLOracleDatabase
1 Oracle Database 19c 新特性
2020-03-24
OracleDatabase
Converting Oracle Database from Linux to Windows using RMAN
2018-08-27
OracleDatabaseLinuxWindows
Oracle database 升級（文件）to 10.2.0.4 from 10.2.0.1
2018-06-05
OracleDatabase
Oracle 19c Concepts(00)：Changes in This Release for Oracle Database Concepts
2019-03-07
OracleDatabase
在Oracle DG Standby庫上啟用flashback database功能
2019-03-21
OracleDatabase
Oracle 19c Concepts(18)：Concepts for Database Administrators
2019-04-03
OracleDatabase
Oracle 19c Concepts(19)：Concepts for Database Developers
2019-04-04
OracleDatabaseDeveloper
Oracle Database 19c安裝Sample Schemas
2020-02-06
OracleDatabase
Oracle 19c透過SCAN連線ORA-12520
2020-05-01
Oracle
Performing a Failover to a Physical Standby Database
2020-02-24
ORMAIDatabase
Oracle 19C Data Guard基礎運維-05Failovers (GAP)
2020-04-19
Oracle運維AI
recover database using backup controlfile理解
2021-11-19
Database
Oracle 19c Concepts(17)：Topics for Database Administrators and Developers
2019-04-02
OracleDatabaseDeveloper
Oracle Database 19c中的自動索引
2019-08-20
OracleDatabase索引
Disable Database Audit In Oracle 19c RAC-20220111
2022-03-21
DatabaseOracle
Oracle 19c Database Configure the HTTPS Port for EM Express
2020-12-10
OracleDatabaseHTTPExpress
Oracle 12c系列(八)｜RMAN (FROM SERVICE)
2018-06-15
Oracle
oracle12.2 adg ORA-46952: standby database format mismatch for password file
2020-03-26
OracleDatabaseORM
Setup Standby Database on One PC(轉)
2019-03-15
Database
Oracle Database 19c(19.9) RAC On RedHat 8.3 Using VirtualBox and MacBook
2022-03-21
OracleDatabaseRedhatMac
Oracle 19c DBA's Guide(01): Getting Started with Database Administration
2021-08-30
OracleGUIIDEDatabase
Oracle 12c DG備庫啟動報錯standby database requires recovery
2022-02-17
OracleDatabaseUI
Oracle 19C Data Guard基礎運維-01安裝物理standby
2020-04-16
Oracle運維
Oracle 19c standby 建立資料檔案報錯ORA-01111
2021-12-11
Oracle
Oracle 18C,19C standby CHECKPOINT_CHANGE# 不更新問題
2021-05-25
Oracle
Guide to Database as a Service (DBaaS)
2018-12-24
GUIIDEDatabase