從巡檢備份失敗排查解決資料庫故障
轉載地址:http://www.cnblogs.com/jasoname/p/5474159.html
最近某業務備份報錯:
Starting Control File and SPFILE Autobackup at 09-MAY-16
piece handle=c-335040995-20160509-00 comment=API Version 2.0,MMS Version 5.0.0.0
Finished Control File and SPFILE Autobackup at 09-MAY-16
sql statement: alter system archive log current
released channel: ch00
released channel: ch01
allocated channel: ch00
channel ch00: sid=491 instance=dlsc1 devtype=SBT_TAPE
channel ch00: Veritas NetBackup for Oracle - Release 7.0 (2010010501)
released channel: ch00
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-12001: could not open channel ch01
RMAN-10008: could not create channel context
RMAN-10003: unable to connect to target database
ORA-12170: TNS:Connect timeout occurred
RMAN> RMAN>
Recovery Manager complete.
Script /oracle/nbu_scripts/hot_database_backup.sh
==== ended in error on Mon May 9 09:26:39 BEIST 2016 ====
從備份資訊上看資料庫在備份完Control File and SPFILE切換歸檔日誌後備份歸檔出現問題。
RMAN-10003: unable to connect to target database提示不能連線到目標庫。
檢查節點1,發現節點1正常。節點2無法進入sqlplus和rman下。
oracle@xxxxdb2:/oracle$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.3.0 - Production on Mon May 9 14:46:01 2016
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file '/dev/rspfile'
ORA-27041: unable to open file
IBM AIX RISC System/6000 Error: 6: No such device or address
Additional information: 11
SQL>
SQL> exit
Disconnected
oracle@xxxxdb2:/oracle$ rman target /
Recovery Manager: Release 10.2.0.3.0 - Production on Mon May 9 14:46:22 2016
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database (not started)
RMAN> exit
Recovery Manager complete.
oracle@xxxxdb2:/oracle$
檢視crs狀態:
Recovery Manager complete.
oracle@xxxxdb2:/oracle$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
oracle@xxxxdb2:/oracle$
哎,難道是crs沒起來?
嘗試啟動crs
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
等了一會以為crs能順利啟動結果還是起不來如下:
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crsctl check crs
Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
什麼情況,執行ocr命令看看ocr的情況結果:
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./ocrcheck
PROT-602: Failed to retrieve data from the cluster registry
到這裡已經知道啥問題了,儲存肯定沒認到。
檢視資料所在的vg
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# lsvg
rootvg
datavg
archvg
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# lsvg datavg
0516-010 : Volume group must be varied on; use varyonvg command.果然datavg沒有啟用。到這裡別急著啟用datavg
因為datavg沒有啟用很可能是hacmp沒有起來,檢視hacmp狀態,確實沒有起來。
root@xxxxdb2:/# /usr/es/sbin/cluster/utilities/clshowsrv -v
Status of the RSCT subsystems used by HACMP:
Subsystem Group PID Status
topsvcs topsvcs inoperative
grpsvcs grpsvcs inoperative
grpglsm grpsvcs inoperative
emsvcs emsvcs inoperative
emaixos emsvcs inoperative
ctrmc rsct 262394 active
Status of the HACMP subsystems:
Subsystem Group PID Status
clcomdES clcomdES 200944 active
clstrmgrES cluster 311646 active
Status of the optional HACMP subsystems:
Subsystem Group PID Status
clinfoES cluster inoperative
Obtaining information via SNMP from Node: xxxxdb1...
_____________________________________________________________________________
Cluster Name: xxxxdb
Cluster State: UP
Cluster Substate: STABLE
_____________________________________________________________________________
Node Name: xxxxdb1 State: UP
Network Name: net_ether_02 State: UP
Address: 192.168.77.194 Label: xxxxdb1_priv State: UP
Node Name: xxxxdb2 State: DOWN
Network Name: net_ether_02 State:
Address: xxxx Label: xxxxdb2 State: DOWN
Address: 192.168.77.195 Label: xxxxdb2_priv State: DOWN
Cluster Name: xxxxdb
Resource Group Name: orarg
Startup Policy: Online On All Available Nodes
Fallover Policy: Bring Offline (On Error Node Only)
Fallback Policy: Never Fallback
Site Policy: ignore
Priority Override Information:
Primary Instance POL:
[MORE...5]
問題找到了,啟動hacmp吧
smitty clstart
Start Cluster Services on these nodes
啟動完畢:
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# lsvg datavg
VOLUME GROUP: datavg VG IDENTIFIER: 00c63cf200004c000000011d0937bc9f
VG STATE: active PP SIZE: 64 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 7990 (511360 megabytes)
MAX LVs: 256 FREE PPs: 661 (42304 megabytes)
LVs: 44 USED PPs: 7329 (469056 megabytes)
OPEN LVs: 5 QUORUM: 3
TOTAL PVs: 5 VG DESCRIPTORS: 5
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 5 AUTO ON: no
Concurrent: Enhanced-Capable Auto-Concurrent: Disabled
VG Mode: Concurrent
Node ID: 2 Active Nodes: 1
MAX PPs per VG: 32768 MAX PVs: 1024
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 130852
Used space (kbytes) : 3300
Available space (kbytes) : 127552
ID : 222055846
Device/File Name : /dev/rocr
Device/File integrity check succeeded
Device/File not configured
Cluster registry integrity check succeeded
看來datavg已掛載
接下來啟動crs
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
root@xxxxdb2:/oracle/product/10.2.0/crs/bin# ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.xxxx.db application ONLINE ONLINE xxxxdb1
ora....c1.inst application ONLINE ONLINE xxxxdb1
ora....c2.inst application ONLINE ONLINE xxxxdb2
ora....B1.lsnr application ONLINE ONLINE xxxxdb1
ora....db1.gsd application ONLINE ONLINE xxxxdb1
ora....db1.ons application ONLINE ONLINE xxxxdb1
ora....db1.vip application ONLINE ONLINE xxxxdb1
ora....B2.lsnr application ONLINE ONLINE xxxxdb2
ora....db2.gsd application ONLINE ONLINE xxxxdb2
ora....db2.ons application ONLINE ONLINE xxxxdb2
ora....db2.vip application ONLINE ONLINE xxxxdb2
ok到這裡問題解決。分析下故障思路
先是從備份資訊得到NBU無法連線到節點2目標庫-->sqlplus和rman均失敗-->crs啟動失敗-->ocrcheck失敗-->datavg沒有啟用-->hacmp沒有啟動
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-2097381/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- hp-ux磁帶備份失敗故障解決UX
- NetBackup備份Oracle資料庫失敗-nbuOracle資料庫
- 【MySQL】mysqldump備份失敗與解決方案合集MySql
- 故障分析 | DDL 導致的 Xtrabackup 備份失敗
- 刪除資料泵備份失敗的表
- 資料庫巡檢模版資料庫
- Oracle資料庫巡檢Oracle資料庫
- 解決linux crontab備份mysql失敗的問題LinuxMySql
- oracle資料庫巡檢(轉)Oracle資料庫
- SQL Server資料庫巡檢SQLServer資料庫
- 資料庫連線失敗的原因及解決方法資料庫
- 網站連結資料庫失敗怎麼解決網站資料庫
- 【備份恢復】從備份恢復資料庫資料庫
- Kettle8.2連線Oracle資料庫失敗解決方法Oracle資料庫
- 華納雲:sqlserver匯入資料庫失敗怎麼解決?SQLServer資料庫
- PHP資料庫連線失敗–couldnotfinddriver解決辦法PHP資料庫
- Mysql備份失敗案例(一)MySql
- 使用RDA巡檢MYSQL資料庫MySql資料庫
- 資料庫巡檢參考項資料庫
- oracle資料庫巡檢內容Oracle資料庫
- MySQL資料庫健康檢查--MySQL巡檢MySql資料庫
- oracle資料庫巡檢(二)全面檢查Oracle資料庫
- TSM無法備份故障解決(續)
- mongodb資料庫連結失敗如何解決MongoDB資料庫
- TSM備份時提示認證失敗(Authentication failure)問題的解決AI
- MSSQL資料庫健康檢查--SQL Server巡檢SQL資料庫Server
- Oracle資料庫(RAC)巡檢報告Oracle資料庫
- Oracle資料庫巡檢參考項Oracle資料庫
- 解決 SQL Server 安裝失敗均,報錯“等待資料庫引擎恢復控制代碼失敗”SQLServer資料庫
- rman備份的時候讀取v$session_longops失敗導致備份失敗SessionGo
- TSM單個資料庫無法備份故障一則資料庫
- sql server資料庫連線失敗/無法附加解決過程SQLServer資料庫
- 解決SQL Server資料庫維護計劃失敗的問題SQLServer資料庫
- 從零開始實現資料庫自動化巡檢(一)資料庫
- 資料庫備份資料庫
- (7) MySQL資料庫備份詳解MySql資料庫
- Oracle資料庫(單機)巡檢報告Oracle資料庫
- 資料庫巡檢常用的SQL語句資料庫SQL