網路故障造成備庫standby logfile的損壞
某生產庫exadata,oracle 11.2.0.4,客戶說物理備庫與主庫不能進行同步,如是要客戶把生產庫與備庫的alert_sid.log檔案打包傳給我。檢視了一個主庫的alert.log日誌檔案發現從15號開始網路就斷斷續續的,資訊如下:
Wed Jun 15 14:25:48 2016 *********************************************************************** Fatal NI connect error 12543, connecting to: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.137.8.101)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=syyjm)(CID=(PROGRAM=oracle)(HOST=yb01dbadm01.dnsserver) (USER=oracle)))) VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 15-JUN-2016 14:25:48 Tracing not turned on. Tns error struct: ns main err code: 12543 TNS-12543: TNS:destination host unreachable ns secondary err code: 12560 nt main err code: 513 TNS-00513: Destination host unreachable nt secondary err code: 113 nt OS err code: 0 *********************************************************************** Fatal NI connect error 12543, connecting to: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.137.8.101)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=syyjm)(CID=(PROGRAM=oracle)(HOST=yb01dbadm01.dnsserver) (USER=oracle)))) VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 15-JUN-2016 14:25:51 Tracing not turned on. Tns error struct: ns main err code: 12543 TNS-12543: TNS:destination host unreachable ns secondary err code: 12560 nt main err code: 513 TNS-00513: Destination host unreachable nt secondary err code: 113 nt OS err code: 0 Wed Jun 15 14:26:02 2016 ...省略...
對應的備庫這個時間點的alert.log日誌檔案資訊如下:
Wed Jun 15 14:24:47 2016 *********************************************************************** Fatal NI connect error 12170. Wed Jun 15 14:24:47 2016 *********************************************************************** VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Fatal NI connect error 12170. Time: 15-JUN-2016 14:24:47 VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Tracing not turned on. Time: 15-JUN-2016 14:24:47 Tns error struct: Tracing not turned on. ns main err code: 12535 Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out TNS-12535: TNS:operation timed out ns secondary err code: 12560 ns secondary err code: 12560 nt main err code: 505 nt main err code: 505 TNS-00505: Operation timed out TNS-00505: Operation timed out nt secondary err code: 110 nt secondary err code: 110 nt OS err code: 0 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.137.8.73)(PORT=12282)) Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.137.8.73)(PORT=12091)) RFS[4]: Possible network disconnect with primary database RFS[5]: Possible network disconnect with primary database Wed Jun 15 14:30:55 2016 Killing 1 processes with pids 25908 (idle RFS by thread/sequence) in order to allow the in-transit gap to be queued. Requested by OS process 58349 Wed Jun 15 14:30:57 2016 RFS[10]: Assigned to RFS process 58352 RFS[10]: Selected log 8 for thread 1 sequence 40085 dbid 643576469 branch 867103448 Wed Jun 15 14:30:58 2016 Archived Log entry 3997 added for thread 1 sequence 40085 ID 0x265be592 dest 1: Wed Jun 15 14:30:59 2016 Primary database is in MAXIMUM PERFORMANCE mode RFS[11]: Assigned to RFS process 58363 RFS[11]: Selected log 7 for thread 1 sequence 40087 dbid 643576469 branch 867103448 RFS[10]: Selected log 8 for thread 1 sequence 40086 dbid 643576469 branch 867103448 Wed Jun 15 14:31:01 2016 Killing 1 processes with pids 25910 (idle RFS by thread/sequence) in order to retry receiving a log after reattaching. Requested by OS process 58367 Wed Jun 15 14:31:01 2016 Archived Log entry 3998 added for thread 1 sequence 40086 ID 0x265be592 dest 1: RFS[12]: Assigned to RFS process 58367 RFS[12]: Selected log 10 for thread 2 sequence 35718 dbid 643576469 branch 867103448
從資訊RFS[4]: Possible network disconnect with primary database可以看到備庫不能與主庫進行連線。從資訊Killing 1 processes with pids 25908 (idle RFS by thread/sequence) in order to allow the in-transit gap to be queued. Requested by OS process 58349,可知因為不能從主庫接受歸檔日誌,且超時而被程式58349給kill掉了,而58349程式是另一個RFS程式用於接受主庫所傳輸過來的歸檔日誌。也就是說,因為網路連線出現了問題,RFS程式在不斷的kill,restart重複這樣的操作
在2016-06-15 14:34:25這個時間點由於網路恢復日誌傳輸恢復正常,但在2016-05-15 16:59:00這個時間點網路又斷了,不能進行日誌傳輸
Wed Jun 15 14:34:25 2016 ARC3: Standby redo logfile selected for thread 1 sequence 40085 for destination LOG_ARCHIVE_DEST_2 Wed Jun 15 14:34:27 2016 Thread 1 advanced to log sequence 40087 (LGWR switch) Current log# 1 seq# 40087 mem# 0: +DATAC1/yyjm/onlinelog/group_1.261.867103449 Wed Jun 15 14:34:27 2016 Archived Log entry 139166 added for thread 1 sequence 40086 ID 0x265be592 dest 1: Wed Jun 15 14:34:27 2016 LNS: Standby redo logfile selected for thread 1 sequence 40087 for destination LOG_ARCHIVE_DEST_2 ARC3: Standby redo logfile selected for thread 1 sequence 40086 for destination LOG_ARCHIVE_DEST_2 Wed Jun 15 15:58:19 2016 Thread 1 advanced to log sequence 40088 (LGWR switch) Current log# 2 seq# 40088 mem# 0: +DATAC1/yyjm/onlinelog/group_2.262.867103449 Wed Jun 15 15:58:19 2016 LNS: Standby redo logfile selected for thread 1 sequence 40088 for destination LOG_ARCHIVE_DEST_2 Wed Jun 15 15:58:20 2016 Archived Log entry 139173 added for thread 1 sequence 40087 ID 0x265be592 dest 1: Wed Jun 15 16:59:00 2016 *********************************************************************** Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 15-JUN-2016 16:59:00 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12560 nt main err code: 505 TNS-00505: Operation timed out nt secondary err code: 110 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.137.32.4)(PORT=1222)) Wed Jun 15 17:05:01 2016
從備庫日誌資訊來看,備庫其實在2016-06-15 16:08:43這個時間點網路就已經斷開了
Wed Jun 15 16:08:43 2016 RFS[9]: Possible network disconnect with primary database Wed Jun 15 16:08:43 2016 RFS[6]: Possible network disconnect with primary database Wed Jun 15 17:56:36 2016
這個DG環境的網路時斷時連的情況從2016-06-15一直到2016-06-19號,從備庫的日誌資訊可以看到如下資訊,不能在備庫建立歸檔日誌檔案,從資訊RFS[23]: No standby redo logfiles available for thread 1 與RFS[24]: No standby redo logfiles available for thread 2,以及ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03'可知因為備庫的standby log的standy03出現了損壞。
Sun Jun 19 12:15:21 2016 RFS[23]: Assigned to RFS process 18969 RFS[23]: No standby redo logfiles available for thread 1 Creating archive destination file : /arch/syyjm/1_40277_867103448.dbf (12905 blocks) Sun Jun 19 12:15:56 2016 Archiver process freed from errors. No longer stopped Sun Jun 19 12:15:56 2016 Primary database is in MAXIMUM PERFORMANCE mode Re-archiving standby log 11 thread 2 sequence 36153 RFS[24]: Assigned to RFS process 18976 RFS[24]: No standby redo logfiles available for thread 2 ...省略... Sun Jun 19 12:21:03 2016 Unable to create archive log file '/arch/syyjm/1_40254_867103448.dbf' ARC3: Error 19504 Creating archive log file to '/arch/syyjm/1_40254_867103448.dbf' ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16038: log 9 sequence# 40254 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' Sun Jun 19 12:21:03 2016 ...省略... Sun Jun 19 12:31:03 2016 Suppressing further error logging of LOG_ARCHIVE_DEST_1. Suppressing further error logging of LOG_ARCHIVE_DEST_1. ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16038: log 9 sequence# 40254 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' Sun Jun 19 12:31:03 2016 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16014: log 9 sequence# 40254 not archived, no available destinations ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' ...省略... Wed Jun 22 11:22:11 2016 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16038: log 9 sequence# 40254 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' Wed Jun 22 11:22:11 2016 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16014: log 9 sequence# 40254 not archived, no available destinations ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03'
從主庫alert日誌檔案也可以看到與不能進行對遠端目錄進行歸檔操作
Wed Jun 22 11:05:22 2016 Thread 1 advanced to log sequence 41141 (LGWR switch) Current log# 2 seq# 41141 mem# 0: +DATAC1/yyjm/onlinelog/group_2.262.867103449 FAL[server, ARC0]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance yyjm1 - Archival Error. Archiver continuing. FAL[server, ARC1]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance yyjm1 - Archival Error. Archiver continuing. Wed Jun 22 11:05:23 2016 Archived Log entry 142098 added for thread 1 sequence 41140 ID 0x265be592 dest 1:
透過主庫與備庫的alert.log檔案資訊可以清楚的知道原因是因為DG環境的網路時斷時連引起了備庫的RFS程式在重複kill,restart操作過程中損壞了standby log檔案,解決方法很簡單就是重建備庫被損不的standby log檔案。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26015009/viewspace-2121637/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 故障分析 | MySQL 備份檔案靜默損壞一例分析MySql
- 成功恢復某公司伺服器故障導致的資料庫損壞伺服器資料庫
- 硬碟物理故障解決方法之電路板損壞修復方案硬碟
- 【故障公告】取代 memcached 的 redis 出現問題造成網站故障Redis網站
- 【故障公告】資料庫伺服器 CPU 100% 造成全站故障資料庫伺服器
- 故障分析 | DROP 大表造成資料庫假死資料庫
- Oracle 12.2 physical standby備庫收集AWR報告Oracle
- 4 Creating a Logical Standby Database 建立邏輯備庫Database
- 網路故障排除工具 | 快速定位網路故障
- 搭建windows到linux的oracle 12c physical standby備庫WindowsLinuxOracle
- InterBase資料庫檔案損壞的修復方法資料庫
- IBM X3650伺服器系統損壞故障的資料恢復IBM伺服器資料恢復
- standby_file_management為manual造成dataguard延遲
- PostgreSQL資料庫toast表損壞解決SQL資料庫AST
- SQLite資料庫損壞及其修復探究SQLite資料庫
- WPS文件損壞如何修復?WPS文件損壞的修復方法
- xtrabackup備份mysql“ib_logfile0 is ofMySql
- 伺服器Oracle資料庫損壞修復伺服器Oracle資料庫
- GPON網路故障如何處理?GPON網路故障處理流程
- Shen攻擊:亞太港口重大網路攻擊可能造成1100億美元損失
- Oracle資料庫不同損壞級別的恢復詳解Oracle資料庫
- 如何查詢和顯示網站的損壞連結?網站
- Oracle 無備份情況下undo檔案損壞處理Oracle
- 伺服器資料庫損壞能修復嘛伺服器資料庫
- redo損壞修復啟動資料庫辦法資料庫
- LogFile
- 當前日誌損壞的案例(轉)
- C:/inetpub/logs/logfile/路徑詳解
- FBI:2021年網際網路犯罪造成損失超過69億美元 比2020年猛增20多億美元
- 【故障公告】阿里雲OSS故障造成圖片無法上傳阿里
- Oracle 12c DG備庫啟動報錯standby database requires recoveryOracleDatabaseUI
- STANDBY_FILE_MANAGEMENT引數未設定auto導致的ADG備庫異常
- 【LINUX】Oracle資料庫 linux磁碟頭資料損壞修復LinuxOracle資料庫
- Redis 的 KEYS 命令引起 RDS 資料庫雪崩,當機 2 次,造成幾百萬損失Redis資料庫
- 學習這篇Oracle資料庫檔案壞塊損壞的恢復方法,擴充你的知識面Oracle資料庫
- raid5癱瘓導致資料庫損壞的恢復過程AI資料庫
- linux下修復磁碟損壞Linux
- 控制檔案損壞處理
- 織夢資料庫配置檔案資料庫損壞:嘗試修復資料庫資料庫