網路故障造成備庫standby logfile的損壞
某生產庫exadata,oracle 11.2.0.4,客戶說物理備庫與主庫不能進行同步,如是要客戶把生產庫與備庫的alert_sid.log檔案打包傳給我。檢視了一個主庫的alert.log日誌檔案發現從15號開始網路就斷斷續續的,資訊如下:
Wed Jun 15 14:25:48 2016 *********************************************************************** Fatal NI connect error 12543, connecting to: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.137.8.101)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=syyjm)(CID=(PROGRAM=oracle)(HOST=yb01dbadm01.dnsserver) (USER=oracle)))) VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 15-JUN-2016 14:25:48 Tracing not turned on. Tns error struct: ns main err code: 12543 TNS-12543: TNS:destination host unreachable ns secondary err code: 12560 nt main err code: 513 TNS-00513: Destination host unreachable nt secondary err code: 113 nt OS err code: 0 *********************************************************************** Fatal NI connect error 12543, connecting to: (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=10.137.8.101)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=syyjm)(CID=(PROGRAM=oracle)(HOST=yb01dbadm01.dnsserver) (USER=oracle)))) VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 15-JUN-2016 14:25:51 Tracing not turned on. Tns error struct: ns main err code: 12543 TNS-12543: TNS:destination host unreachable ns secondary err code: 12560 nt main err code: 513 TNS-00513: Destination host unreachable nt secondary err code: 113 nt OS err code: 0 Wed Jun 15 14:26:02 2016 ...省略...
對應的備庫這個時間點的alert.log日誌檔案資訊如下:
Wed Jun 15 14:24:47 2016 *********************************************************************** Fatal NI connect error 12170. Wed Jun 15 14:24:47 2016 *********************************************************************** VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Fatal NI connect error 12170. Time: 15-JUN-2016 14:24:47 VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Tracing not turned on. Time: 15-JUN-2016 14:24:47 Tns error struct: Tracing not turned on. ns main err code: 12535 Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out TNS-12535: TNS:operation timed out ns secondary err code: 12560 ns secondary err code: 12560 nt main err code: 505 nt main err code: 505 TNS-00505: Operation timed out TNS-00505: Operation timed out nt secondary err code: 110 nt secondary err code: 110 nt OS err code: 0 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.137.8.73)(PORT=12282)) Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.137.8.73)(PORT=12091)) RFS[4]: Possible network disconnect with primary database RFS[5]: Possible network disconnect with primary database Wed Jun 15 14:30:55 2016 Killing 1 processes with pids 25908 (idle RFS by thread/sequence) in order to allow the in-transit gap to be queued. Requested by OS process 58349 Wed Jun 15 14:30:57 2016 RFS[10]: Assigned to RFS process 58352 RFS[10]: Selected log 8 for thread 1 sequence 40085 dbid 643576469 branch 867103448 Wed Jun 15 14:30:58 2016 Archived Log entry 3997 added for thread 1 sequence 40085 ID 0x265be592 dest 1: Wed Jun 15 14:30:59 2016 Primary database is in MAXIMUM PERFORMANCE mode RFS[11]: Assigned to RFS process 58363 RFS[11]: Selected log 7 for thread 1 sequence 40087 dbid 643576469 branch 867103448 RFS[10]: Selected log 8 for thread 1 sequence 40086 dbid 643576469 branch 867103448 Wed Jun 15 14:31:01 2016 Killing 1 processes with pids 25910 (idle RFS by thread/sequence) in order to retry receiving a log after reattaching. Requested by OS process 58367 Wed Jun 15 14:31:01 2016 Archived Log entry 3998 added for thread 1 sequence 40086 ID 0x265be592 dest 1: RFS[12]: Assigned to RFS process 58367 RFS[12]: Selected log 10 for thread 2 sequence 35718 dbid 643576469 branch 867103448
從資訊RFS[4]: Possible network disconnect with primary database可以看到備庫不能與主庫進行連線。從資訊Killing 1 processes with pids 25908 (idle RFS by thread/sequence) in order to allow the in-transit gap to be queued. Requested by OS process 58349,可知因為不能從主庫接受歸檔日誌,且超時而被程式58349給kill掉了,而58349程式是另一個RFS程式用於接受主庫所傳輸過來的歸檔日誌。也就是說,因為網路連線出現了問題,RFS程式在不斷的kill,restart重複這樣的操作
在2016-06-15 14:34:25這個時間點由於網路恢復日誌傳輸恢復正常,但在2016-05-15 16:59:00這個時間點網路又斷了,不能進行日誌傳輸
Wed Jun 15 14:34:25 2016 ARC3: Standby redo logfile selected for thread 1 sequence 40085 for destination LOG_ARCHIVE_DEST_2 Wed Jun 15 14:34:27 2016 Thread 1 advanced to log sequence 40087 (LGWR switch) Current log# 1 seq# 40087 mem# 0: +DATAC1/yyjm/onlinelog/group_1.261.867103449 Wed Jun 15 14:34:27 2016 Archived Log entry 139166 added for thread 1 sequence 40086 ID 0x265be592 dest 1: Wed Jun 15 14:34:27 2016 LNS: Standby redo logfile selected for thread 1 sequence 40087 for destination LOG_ARCHIVE_DEST_2 ARC3: Standby redo logfile selected for thread 1 sequence 40086 for destination LOG_ARCHIVE_DEST_2 Wed Jun 15 15:58:19 2016 Thread 1 advanced to log sequence 40088 (LGWR switch) Current log# 2 seq# 40088 mem# 0: +DATAC1/yyjm/onlinelog/group_2.262.867103449 Wed Jun 15 15:58:19 2016 LNS: Standby redo logfile selected for thread 1 sequence 40088 for destination LOG_ARCHIVE_DEST_2 Wed Jun 15 15:58:20 2016 Archived Log entry 139173 added for thread 1 sequence 40087 ID 0x265be592 dest 1: Wed Jun 15 16:59:00 2016 *********************************************************************** Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 11.2.0.4.0 - Production Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production Time: 15-JUN-2016 16:59:00 Tracing not turned on. Tns error struct: ns main err code: 12535 TNS-12535: TNS:operation timed out ns secondary err code: 12560 nt main err code: 505 TNS-00505: Operation timed out nt secondary err code: 110 nt OS err code: 0 Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.137.32.4)(PORT=1222)) Wed Jun 15 17:05:01 2016
從備庫日誌資訊來看,備庫其實在2016-06-15 16:08:43這個時間點網路就已經斷開了
Wed Jun 15 16:08:43 2016 RFS[9]: Possible network disconnect with primary database Wed Jun 15 16:08:43 2016 RFS[6]: Possible network disconnect with primary database Wed Jun 15 17:56:36 2016
這個DG環境的網路時斷時連的情況從2016-06-15一直到2016-06-19號,從備庫的日誌資訊可以看到如下資訊,不能在備庫建立歸檔日誌檔案,從資訊RFS[23]: No standby redo logfiles available for thread 1 與RFS[24]: No standby redo logfiles available for thread 2,以及ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03'可知因為備庫的standby log的standy03出現了損壞。
Sun Jun 19 12:15:21 2016 RFS[23]: Assigned to RFS process 18969 RFS[23]: No standby redo logfiles available for thread 1 Creating archive destination file : /arch/syyjm/1_40277_867103448.dbf (12905 blocks) Sun Jun 19 12:15:56 2016 Archiver process freed from errors. No longer stopped Sun Jun 19 12:15:56 2016 Primary database is in MAXIMUM PERFORMANCE mode Re-archiving standby log 11 thread 2 sequence 36153 RFS[24]: Assigned to RFS process 18976 RFS[24]: No standby redo logfiles available for thread 2 ...省略... Sun Jun 19 12:21:03 2016 Unable to create archive log file '/arch/syyjm/1_40254_867103448.dbf' ARC3: Error 19504 Creating archive log file to '/arch/syyjm/1_40254_867103448.dbf' ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16038: log 9 sequence# 40254 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' Sun Jun 19 12:21:03 2016 ...省略... Sun Jun 19 12:31:03 2016 Suppressing further error logging of LOG_ARCHIVE_DEST_1. Suppressing further error logging of LOG_ARCHIVE_DEST_1. ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16038: log 9 sequence# 40254 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' Sun Jun 19 12:31:03 2016 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16014: log 9 sequence# 40254 not archived, no available destinations ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' ...省略... Wed Jun 22 11:22:11 2016 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16038: log 9 sequence# 40254 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03' Wed Jun 22 11:22:11 2016 ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance syyjm - Archival Error ORA-16014: log 9 sequence# 40254 not archived, no available destinations ORA-00312: online log 9 thread 1: '/data/syyjm/datafile/standy03'
從主庫alert日誌檔案也可以看到與不能進行對遠端目錄進行歸檔操作
Wed Jun 22 11:05:22 2016 Thread 1 advanced to log sequence 41141 (LGWR switch) Current log# 2 seq# 41141 mem# 0: +DATAC1/yyjm/onlinelog/group_2.262.867103449 FAL[server, ARC0]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance yyjm1 - Archival Error. Archiver continuing. FAL[server, ARC1]: FAL archive failed, see trace file. ARCH: FAL archive failed. Archiver continuing ORACLE Instance yyjm1 - Archival Error. Archiver continuing. Wed Jun 22 11:05:23 2016 Archived Log entry 142098 added for thread 1 sequence 41140 ID 0x265be592 dest 1:
透過主庫與備庫的alert.log檔案資訊可以清楚的知道原因是因為DG環境的網路時斷時連引起了備庫的RFS程式在重複kill,restart操作過程中損壞了standby log檔案,解決方法很簡單就是重建備庫被損不的standby log檔案。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26015009/viewspace-2121637/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 陣列櫃故障造成控制檔案損壞,資料檔案損壞陣列
- 磁碟損壞造成RMAN備份檔案有壞塊的恢復案例
- 為物理備庫新增standby logfile group相關事則
- 遇到了 客戶生產庫 current online logfile 損壞
- 【故障分析】通過壞塊提示資訊確定損壞的資料庫物件資訊資料庫物件
- 故障分析 | MySQL 備份檔案靜默損壞一例分析MySql
- Oracle資料庫塊的物理損壞與邏輯損壞Oracle資料庫
- 硬碟物理故障解決方法之電路板損壞修復方案硬碟
- 成功恢復某公司伺服器故障導致的資料庫損壞伺服器資料庫
- [ORACLE] 系統故障資料庫恢復--資料檔案無損壞Oracle資料庫
- 資料庫損壞解決:資料庫已損壞,無法分配空間資料庫
- 11g 最大保護模式 standby database網路故障導致主庫當機模式Database
- 物理DG從庫損壞後的重建
- oracle 9i刪除public使用者造成資料字典損壞所有sql語句不能操作的故障處理OracleSQL
- 【故障公告】取代 memcached 的 redis 出現問題造成網站故障Redis網站
- 儲存壞道造成資料庫hang住資料庫
- Oracle塊損壞恢復(有rman備份)Oracle
- SMON故障造成的parallel rollbackParallel
- 故障分析 | DROP 大表造成資料庫假死資料庫
- 【恢復】使用控制檔案的trace備份恢復因異常斷電導致所有控制檔案損壞的資料庫故障資料庫
- 解決物理standby 歸檔日誌損壞ORA-00334
- Oracle資料庫UNDO損壞後的恢復Oracle資料庫
- master資料庫損壞之後的恢復AST資料庫
- 使用RMAN恢復完全損壞的資料庫資料庫
- 沒有備份的情況下處理undo損壞
- CentOS 6.5 ext4 filesystemio_options=SETALL 備庫archivelog 損壞問題CentOSHive
- ORA-600 [12700]故障處理一則(線上重建損壞的索引)索引
- 網路故障排除工具 | 快速定位網路故障
- snapshot standby快照備庫角色
- Lloyd’s of London:網路犯罪每年給北美企業造成960億美元損失
- 故障的機器修好後重啟,狂拉主庫binlog,導致網路問題,造成一定影響
- MySQL資料庫表損壞後的修復方法MySql資料庫
- PostgreSQL資料庫toast表損壞解決SQL資料庫AST
- SQLite資料庫損壞及其修復探究SQLite資料庫
- 備份&恢復之十三:損壞全部控制檔案
- 資料庫在沒有備份的情況下的資料檔案損壞的恢復資料庫
- Oracle REDO損壞Oracle Redo
- SQL Server 2008在資料庫損壞時備份事務日誌SQLServer資料庫