ADG無法同步:TT00程式報錯 Error 12514

AlfredZhao發表於2023-01-29

環境: Oracle 19.16 ADG (Single Instance -> RAC)
在配置ADG的場景,發現ADG不能同步。

1.檢視報錯資訊

1)主庫alert日誌:

2023-01-29T11:18:41.489164+08:00
TT00 (PID:27840): Error 12514 received logging on to the standby
TT00 (PID:27840): Attempting LAD:2 network reconnect (12514)
TT00 (PID:27840): LAD:2 network reconnect abandoned
2023-01-29T11:18:41.489623+08:00
Errors in file /u01/app/oracle/diag/rdbms/demo/demo/trace/demo_tt00_27840.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
TT00 (PID:27840): krsg_check_connection: Error 12514 connecting to standby 'demorac'

2)ADG配置鏈路的error資訊:

SQL> select error from v$archive_dest where dest_id = 2;

ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor

錯誤號一致,均為12514。

2.oerr檢視該錯誤說明

[oracle@bogon 2023_01_28]$ oerr ora 12514
12514, 00000, "TNS:listener does not currently know of service requested in connect descriptor"
// *Cause:  The listener received a request to establish a connection to a
// database or other service. The connect descriptor received by the listener
// specified a service name for a service (usually a database service)
// that either has not yet dynamically registered with the listener or has
// not been statically configured for the listener.  This may be a temporary
// condition such as after the listener has started, but before the database
// instance has registered with the listener.
// *Action:
//  - Wait a moment and try to connect a second time.
//  - Check which services are currently known by the listener by executing:
//    lsnrctl services <listener name>
//  - Check that the SERVICE_NAME parameter in the connect descriptor of the
//    net service name used specifies a service known by the listener.
//  - If an easy connect naming connect identifier was used, check that
//    the service name specified is a service known by the listener.
//  - Check for an event in the listener.log file.

3.嘗試sqlplus連線到standby

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba
...
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor

分別連線兩個節點:

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac1 as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL> 

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac2 as sysdba
...
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor

4.嘗試relocate監聽

看來是因為RAC的例項2連線有問題?
嗯,現在例項2還不存在,那直接把scan配置到節點1機器上試試吧:

[grid@db01rac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node db01rac2
[grid@db01rac1 ~]$ srvctl relocate scan_listener -scannumber 1 -node db01rac1
[grid@db01rac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node db01rac1

結果依然報錯,說明並不是relocate的問題。
其實在relocate之前,就發現scan在節點2時的狀態有問題,沒有監聽到任何服務:

[grid@db01rac2 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 18:54:31

Copyright (c) 1991, 2022, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                28-JAN-2023 18:53:51
Uptime                    0 days 0 hr. 0 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/db01rac2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521)))
The listener supports no services
The command completed successfully

5.繼續排查發現是引數問題

為什麼LISTENER_SCAN1監聽不到任何服務呢?嘗試使用手工alter system register;也不行。
與scan有關的資料庫配置就是那個remote引數了,難道設定有問題,oracle使用者登陸到資料中檢視確認:

SQL> show parameter remote_listener

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
remote_listener 		     string

為什麼會是空呢?實際是因為該環境搭建ADG,主庫是單例項,引數中沒有設定remote_listener引數,備庫是RAC,雖然配置了其他RAC獨有引數,但漏掉了這個引數。
知道原因後,解決很簡單,直接根據當前環境在備庫配置正確的 remote_listener 引數值:

alter system set remote_listener='db01rac-scan:1521';

再次檢視LISTENER_SCAN1監聽狀態,已經正常有服務註冊了:

[grid@db01rac1 admin]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 19:26:13

Copyright (c) 1991, 2022, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                28-JAN-2023 19:15:24
Uptime                    0 days 0 hr. 10 min. 49 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/db01rac1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521)))
Services Summary...
Service "demorac" has 1 instance(s).
  Instance "jydb1", status READY, has 1 handler(s) for this service...
The command completed successfully

此時再次測試連線,已恢復正常,ADG鏈路的報錯資訊也已經隨之解決。

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL>

6.總結和延伸

本次問題定位後發現很簡單,只是因為備庫的一個引數設定導致:

# primary 檢視配置鏈路的錯誤資訊:
SQL> select error from v$archive_dest where dest_id = 2;

ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor

# standby 修正引數設定:
alter system set remote_listener='db01rac-scan:1521';

延伸:其實很多ADG同步問題,日誌中都會丟擲各類錯誤,除了按照錯誤排查之外,還有一些經驗性的思路,可以在troubleshooting過程中選擇嘗試,下面給出示例,希望能對大家有所幫助。

1)ADG配置鏈路defer後再enable:

SQL> alter system set log_archive_dest_state_2=defer;
SQL> alter system set log_archive_dest_state_2=enable;

2)密碼檔案同步:

[oracle@bogon dbs]$ scp orapwdemo 192.168.1.11:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb1
[oracle@bogon dbs]$ scp orapwdemo 192.168.1.12:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb2

3)重啟MRP程式:

RECOVER MANAGED STANDBY DATABASE CANCEL;
RECOVER MANAGED STANDBY DATABASE DISCONNECT;

注意:有時在備庫open read only狀態下無法啟動MRP的情況,可以選擇到mount狀態下嘗試,遇到過很多情況下都有效。

4)嘗試重建SRLs:
遇到過個別情況,實時同步有問題是因為SRLs有問題,按照你的環境要求重建即可:

#recreate:drop and create SRLs:
SQL> recover managed standby database cancel;

alter database drop standby logfile group 14;
alter database drop standby logfile group 13;
alter database drop standby logfile group 12;
alter database drop standby logfile group 11;

#確認asm磁碟組中清理,注意千萬核對清楚,別誤刪除其他的:
ASMCMD> rm standby_group_*
ASMCMD> pwd
+datadg/DEMORAC

#新建SRLs:
SQL> select * from v$standby_log;

alter database add standby logfile thread 1 group 11 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 12 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 13 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 14 '+DATADG' size 209715200;

SQL> recover managed standby database disconnect;

5)檢查基本引數配置:

很多時候,環境有可能被人修改過,要檢查基本引數,確認和你的環境要求匹配:

set linesize 500
col value for a70
col name for a50
 
select name, value
from v$parameter
where name in ('db_name','db_unique_name','log_archive_config', 'log_archive_dest_1','log_archive_dest_2',
               'log_archive_dest_state_1','log_archive_dest_state_2', 'remote_login_passwordfile',
               'log_archive_format','log_archive_max_processes','fal_server','db_file_name_convert',
                     'log_file_name_convert', 'standby_file_management');

6)瞭解主要程式:

遇到過給生產庫配置ADG,但無論如何折騰備庫也無法實現實時同步,最終發現是主庫有異常,重啟後就自動解決了。
但很多情況申請重啟主庫很難,其實可以先申請嘗試重啟相關程式,比如主庫的arc,tt 等程式。
這些程式被殺掉通常不會導致主庫crash,而是會重新啟動新的程式。

相關文章