Oracle DataGuard環境中主庫收到ORA-16198報錯
客戶的一套Oracle Active DataGuard環境中,主庫在每天的最高峰的時間段內都會收到如下的報錯:
Fri Apr 24 17:25:59 2015
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 16198 for archive log file 1 to 'afabdg01'
參考如下的MOS文章:
Redo Transport Services fails with ORA-16198 when using SYNC (synchronous) mode (Doc ID 808469.1)
In this Document
Symptoms |
Cause |
Solution |
References |
Applies to:
Oracle Database - Enterprise Edition - Version 10.2.0.1 and laterInformation in this document applies to any platform.
***Checked for relevance on 26-Feb-2014***
This will affect LGWR SYNC transport mode in 10.2.0.x databases and SYNC transport mode in 11.2.0.x databases
Symptoms
Redo Transport Services failed with ORA-16198 from primary database
to either the physical standby database or logical standby database
using LGWR SYNC mode.
The primary alert log file showed:
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Fri Feb 6 21:22:26 2009
Errors in file /u01/app/oracle/admin/crthpd01/bdump/crthpd01_lgwr_2793488.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR: Network asynch I/O wait error 16198 log 2 service '(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=abc)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=xyz_STANDBY_XPT.world)(INSTANCE_NAME=xyz)(SERVER=dedicated)))'
Fri Feb 6 21:22:26 2009
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 2 thread 1 sequence 628 (16198)
Fri Feb 6 21:22:27 2009
If you use Data Guard Broker, then the primary drc log showed:
DG 2009-04-12-12:12:08 0 2 0 RSM detected log transport problem: log transport for database 'xyz_STANDBY' has the following error.
DG 2009-04-12-12:12:08 0 2 0 ORA-16198: Timeout incurred on internal channel during remote archival
DG 2009-04-12-12:12:08 0 2 0 RSM0: HEALTH CHECK ERROR: ORA-16737: the redo transport service for standby database "xyz_STANDBY" has an error
DG 2009-04-12-12:12:08 0 2 678445062 Operation CTL_GET_STATUS cancelled during phase 2, error = ORA-16778
DG 2009-04-12-12:12:08 0 2 678445062 Operation CTL_GET_STATUS cancelled during phase 2, error = ORA-16778
Cause
The NET_TIMEOUT attribute in the LOG_ARCHIVE_DEST_2 on the primary is set too low so that
LNS couldn't finish sending redo block in 10 seconds in this example.
log_archive_dest_2 service="(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PR
OTOCOL=tcp)(HOST=abc)(PORT=1521)))(CONNECT
_DATA=(SERVICE_NAME=xyz_STANDBY_XPT.world)(
INSTANCE_NAME=xyz)(SERVER=dedicated)))",
LGWR SYNC AFFIRM delay=0 OPTIONAL max_failure=0
max_connections=1 reopen=300 db_unique_name="
xyz_STANDBY" register net_timeout=10 valid
_for=(online_logfile,primary_role)
Noticed that you used LGWR SYNC log transport mode and NET_TIMEOUT was set to 10 .
Solution
You'll need to increase the NET_TIMEOUT value in the LOG_ARCHIVE_DEST_2 on the primary to at least 15 to 20 seconds depends on your network speed.
If you don't use Data Guard Broker, then you could change LOG_ARCHIVE_DEST_2 from SQL*Plus using ALTER SYSTEM command. For example,
SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_2 SERVICE=xyz_STANDBY
LGWR SYNC DB_UNIQUE_NAME=xyz_STANDBY NET_TIMEOUT=30 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
If you use Data Guard Broker, then you will need to modify NetTimeout property from DGMGRL or Grid Control.
For example, connect to the DGMGRL command-line interface from the primary machine,
DGMGRL> connect sys/
DGMGRL> EDIT DATABASE '
=======================================================================
Note: If NET_TIMEOUT attribute has already been set to 30, and you still get ORA-16198, that means
LNS couldn't finish sending redo block in 30 seconds.
The slowness may caused by:
1. Operating System. Please keep track of OS usage (like iostat).
2. Network. Please keep track network flow (like tcpdump).
The purpose here is to figure out if the slowness is caused by temporary OS glitch or temporary network glitch.
出現這個報錯是由於在預設的NET_TIMEOUT時間(10秒)內主庫LGWR程式沒有將資料完整的傳送到備庫,可以將NET_TIMEOUT設定為15或者30秒來增加LGWR傳送資料到備庫的時間,減少出現這個問題的機率。如果NET_TIMEOUT設定為30秒仍然存在此問題,那麼就需要考慮是否是主庫到備庫的網路存在效能問題或存在一定的故障,對於WAN外網的Standby資料庫最好不要使用LGWR SYNC進行實時同步,使用ARC NSYNC同步更合適。
--end--
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23135684/viewspace-1713244/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle DataGuard環境主備庫日誌組數和大小調整Oracle
- ORACLE RAC資料庫配置Dataguard環境(3)Oracle資料庫
- ORACLE RAC資料庫配置Dataguard環境(2)Oracle資料庫
- ORACLE RAC資料庫配置Dataguard環境(1)Oracle資料庫
- RAC環境單例項啟動資料庫收到ORA-29702報錯單例資料庫
- ORACLE 12C DATAGUARD環境搭建和主從切換Oracle
- RAC和Dataguard環境下主備庫切換演練模板
- 檢查主庫dataguard有無報錯資訊的檢視
- 【Dataguard】Oracle多租戶環境對Dataguard的影響Oracle
- Oracle10g RAC環境下DataGuard備庫搭建例項Oracle
- Dataguard環境修改主庫和standby庫online redo log&standby redo log大小
- dataguard環境中的密碼維護密碼
- Oracle10g RAC環境下DataGuard備庫搭建例項-eygleOracle
- RAC+Dataguard環境中JDBC Failover配置JDBCAI
- RAC環境的STANDBY資料庫備份報錯資料庫
- oracle 之dataguard主庫系統崩潰之物理備庫切主庫Oracle
- Oracle10g RAC環境下 DataGuard備庫搭建例項-3-eygleOracle
- Oracle10g RAC環境下 DataGuard備庫搭建例項-2-eygleOracle
- Oracle10g RAC環境下 DataGuard備庫搭建例項-1-eygleOracle
- macaca 環境配置報錯Mac
- Oracle10g RAC環境下 DataGuard備庫搭建例項-4-自己補Oracle
- DataGuard環境程式資訊追蹤
- RAC環境下dataguard的搭建
- Linux環境中MySQL主從同步–新增新的從庫LinuxMySql主從同步
- DataGaurd環境主庫崩潰後將備庫切為主庫
- Percona MySQL 5.6 主主複製環境報錯"Got fatal error 1236 from master.."MySqlGoErrorAST
- Oracle資料庫(DataGuard)遷移方案(中)Oracle資料庫
- iOS 搭建XMPP環境時新增依賴庫報錯及解決iOS
- 生產環境中mysql資料庫由主從關係切換為主主關係MySql資料庫
- dataguard 主備庫出現gap
- Docker環境Oracle資料庫搭建DockerOracle資料庫
- Windows 環境安裝 Horizon 報錯解決Windows
- 主從環境中,從IO程式被停掉
- 搭建rac+DataGuard的測試環境
- Oracle DataGuard環境failover後通過舊備份建立物理StandbyOracleAI
- Oracle10G Dataguard 多個備庫 - 主庫和物理備庫的切換Oracle
- dataguard主庫壞塊的修復
- DataGuard主備庫切換步驟