RAC環境出現Waiting for instances to leave資訊(一)
在RAC環境的其中一個節點上的alert檔案中,出現了這個資訊:Waiting for instances to leave。
這一篇描述問題的發現過程。
首先描述一下問題的發現過程。程式設計師報告RAC環境影響比較緩慢,而這時同事正好在這個RAC環境的一個節點上檢查RMAN的日誌,發現RMAN連線資料庫很慢,半天都沒有響應。而這時,另一個節點幾乎無法訪問了,一個普通的ssh登陸都需要1分鐘以上,更不要說執行命令了。
檢查當前還可以訪問的節點的alert檔案,發現了上面的Waiting for instances to leave資訊:
Mon Jul 27 15:52:14 2009
Waiting for instances to leave:
1
由於這個問題產生的原因比較複雜,為了將問題描述清楚,還需要重頭說起。
首先是硬體維護任意發現RAC伺服器的其中一個節點DOWN機,這個伺服器無法透過遠端登陸。
在資料庫中檢查,節點1關閉,節點2正常執行。從節點2上的alert檔案中,可以看到,節點2對資料庫進行了一些恢復的操作,併成功接管了全部資料庫的管理操作:
Fri Jul 24 16:25:09 2009
Reconfiguration started (old inc 4, new inc 6)
List of nodes:
1
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Fri Jul 24 16:25:09 2009
LMS 0: 6 GCS shadows cancelled, 0 closed
Fri Jul 24 16:25:09 2009
LMS 1: 14 GCS shadows cancelled, 1 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Fri Jul 24 16:25:11 2009
Instance recovery: looking for dead threads
Fri Jul 24 16:25:11 2009
Beginning instance recovery of 1 threads
Fri Jul 24 16:25:12 2009
LMS 0: 378032 GCS shadows traversed, 0 replayed
Fri Jul 24 16:25:12 2009
LMS 1: 381728 GCS shadows traversed, 0 replayed
Fri Jul 24 16:25:12 2009
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Fri Jul 24 16:25:14 2009
parallel recovery started with 7 processes
Fri Jul 24 16:25:15 2009
Started redo scan
Fri Jul 24 16:25:15 2009
Completed redo scan
3990 redo blocks read, 986 data blocks need recovery
Fri Jul 24 16:25:15 2009
Started redo application at
Thread 1: logseq 1678, block 572689
Fri Jul 24 16:25:15 2009
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1678 Reading mem 0
Mem# 0: /dev/vx/rdsk/datavg/newtrade_redo1_1_1_1g
Mem# 1: /dev/vx/rdsk/datavg/newtrade_redo1_1_2_1g
Fri Jul 24 16:25:15 2009
Completed redo application
Fri Jul 24 16:25:16 2009
Completed instance recovery at
Thread 1: logseq 1678, block 576679, scn 1112404710247
862 data blocks read, 1014 data blocks written, 3990 redo blocks read
Switch log for thread 1 to sequence 1679
隨後,硬體維護人員在現場啟動了節點1的伺服器,在啟動的時候似乎報了一個和記憶體有關的錯誤(我不在現場,聽說的),隨後檢查沒有發現系統的錯誤資訊。
檢查Oracle的alert資訊,沒有發現任何的錯誤。懷疑節點突然崩潰和硬體異常有關。
檢查系統的dmest資訊,也沒有找到系統異常DOWN機的原因。
由於沒有找到錯誤,在節點1啟動後,將節點1上資料庫的例項1也啟動了。
但是沒過多長時間,就碰到了前面描述的錯誤。節點2的alert檔案中詳細的錯誤資訊如下:
Mon Jul 27 15:20:42 2009
MMNL absent for 1202 secs; Foregrounds taking over
Mon Jul 27 15:37:40 2009
MMNL absent for 1599 secs; Foregrounds taking over
Mon Jul 27 15:48:19 2009
IPC Send timeout detected.Sender: ospid 2284
Receiver: inst 1 binc 27438 ospid 3739
Mon Jul 27 15:49:24 2009
IPC Send timeout detected.Sender: ospid 5165
Receiver: inst 1 binc 27438 ospid 3739
Mon Jul 27 15:49:31 2009
IPC Send timeout detected.Sender: ospid 5179
Receiver: inst 1 binc 27438 ospid 3739
.
.
.
Mon Jul 27 15:49:58 2009
IPC Send timeout detected.Sender: ospid 23757
Receiver: inst 1 binc 27438 ospid 3739
Mon Jul 27 15:49:59 2009
IPC Send timeout to 0.2 inc 12 for msg type 32 from opid 46
Mon Jul 27 15:49:59 2009
Communications reconfiguration: instance_number 1
Mon Jul 27 15:50:15 2009
IPC Send timeout to 0.2 inc 12 for msg type 32 from opid 43
Mon Jul 27 15:50:32 2009
Trace dumping is performing id=[cdmp_20090727154959]
Mon Jul 27 15:51:11 2009
Trace dumping is performing id=[cdmp_20090727154959]
Mon Jul 27 15:51:49 2009
Evicting instance 1 from cluster
Mon Jul 27 15:52:14 2009
Waiting for instances to leave:
1
Mon Jul 27 15:52:34 2009
Waiting for instances to leave:
1
Mon Jul 27 15:52:54 2009
Waiting for instances to leave:
1
.
.
.
Mon Jul 27 16:07:35 2009
Waiting for instances to leave:
1
Mon Jul 27 16:07:55 2009
Waiting for instances to leave:
1
Mon Jul 27 16:08:15 2009
Waiting for instances to leave:
1
開始是IPC Send timeout detected錯誤,隨後是Waiting for instances to leave資訊。根據錯誤資訊,以及節點1的響應時間判斷,問題多半就出在節點1上:多半是由於節點1太忙,導致響應時間太長,而節點2要與節點1直接進行互動,在互動時出現了超時錯誤。
為了儘快的解決這個問題,在節點2上執行svrctl嘗試關閉節點1上的例項:
bash-3.00$ srvctl stop inst -d newtrade -i newtrade1
等待了一段時間,節點1上的例項1終於被關閉。而這時節點1上的響應終於恢復正常了。檢查節點1上的alert檔案,發現了大量的錯誤資訊:
Mon Jul 27 14:42:45 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:43:21 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:43:21 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 14:51:35 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:52:17 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:52:35 2009
Process PZ99 died, see its trace file
Mon Jul 27 14:52:35 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:52:35 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:53:29 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:54:18 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:54:42 2009
ksvcreate: Process(q002) creation failed
Mon Jul 27 14:55:27 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 14:55:44 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:02:27 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:02:48 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:05:00 2009
GES: Potential blocker (pid=3815) on resource CI-0000000A-00000002;
enqueue info in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc and DIAG trace file
Mon Jul 27 15:05:41 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:05:53 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:05 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:11 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:16 2009
Process PZ98 died, see its trace file
Mon Jul 27 15:08:40 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:08:41 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:10:36 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:10:55 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:11:50 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:12:26 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:14:05 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:14:59 2009
kkjcre1p: unable to spawn jobq slave process
Mon Jul 27 15:14:59 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_cjq0_3879.trc:
Mon Jul 27 15:15:33 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:16:12 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:24:04 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:24:44 2009
GES: Potential blocker (pid=3815) on resource CI-0000000A-00000002;
enqueue info in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc and DIAG trace file
Mon Jul 27 15:25:10 2009
GES: Potential blocker (pid=3815) on resource CI-00000001-00000002;
enqueue info in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc and DIAG trace file
Mon Jul 27 15:25:13 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:27:07 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:27:13 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:27:40 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:27:55 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:28:16 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:28:41 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:33:07 2009
Process startup failed, error stack:
Mon Jul 27 15:33:07 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_psp0_3671.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
Mon Jul 27 15:33:08 2009
Process q001 died, see its trace file
Mon Jul 27 15:33:46 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:37:54 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:39:41 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:39:50 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:43:30 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:45:30 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 15:45:40 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:45:40 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:46:28 2009
IPC Send timeout detected. Receiver ospid 3739
Mon Jul 27 15:46:48 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms1_3739.trc:
Mon Jul 27 15:47:32 2009
IPC Send timeout detected. Receiver ospid 3739
Mon Jul 27 15:47:39 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms1_3739.trc:
Mon Jul 27 15:48:16 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:49:27 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:50:00 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:50:34 2009
Trace dumping is performing id=[cdmp_20090727154959]
Mon Jul 27 15:52:29 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:52:30 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 15:57:04 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:57:32 2009
kkjcre1p: unable to spawn jobq slave process
Mon Jul 27 15:57:32 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_cjq0_3879.trc:
Mon Jul 27 15:57:45 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:58:33 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 15:59:08 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 16:00:00 2009
ksvcreate: Process(q001) creation failed
Mon Jul 27 16:01:23 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 16:01:23 2009
WARNING: inbound connection timed out (ORA-3136)
.
.
.
Mon Jul 27 16:05:55 2009
WARNING: inbound connection timed out (ORA-3136)
Mon Jul 27 16:06:22 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc0_4240.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
Mon Jul 27 16:06:24 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc1_4267.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
Mon Jul 27 16:06:26 2009
Trace dumping is performing id=[cdmp_20090727160625]
Mon Jul 27 16:06:30 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc0_4240.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
.
.
.
Mon Jul 27 16:06:30 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_arc1_4267.trc:
ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []
Mon Jul 27 16:06:54 2009
USER: terminating instance due to error 481
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms1_3739.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lms0_3714.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_ckpt_3815.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/udump/newtrade1_ora_16645.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/udump/newtrade1_ora_16926.trc:
ORA-00481: LMON 程式因錯誤而終止
Mon Jul 27 16:06:54 2009
System state dump is made for local instance
Mon Jul 27 16:06:54 2009
Errors in file /data/oracle/admin/newtrade/udump/newtrade1_ora_16834.trc:
ORA-00481: LMON 程式因錯誤而終止
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_dbw0_3779.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lmd0_3704.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_pmon_3633.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_cjq0_3879.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_psp0_3671.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_smon_3842.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lck0_4026.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lgwr_3805.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:55 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_lmon_3682.trc:
ORA-29740: evicted by member 1, group incarnation 14
Mon Jul 27 16:06:56 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_smon_3842.trc:
ORA-00481: LMON process terminated with error
Mon Jul 27 16:06:56 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade1_mman_3757.trc:
ORA-00481: LMON process terminated with error
System State dumped to trace file /data/oracle/admin/newtrade/bdump/newtrade1_diag_3644.trc
Mon Jul 27 16:07:05 2009
Dump system state for local instance only
Mon Jul 27 16:07:05 2009
Trace dumping is performing id=[cdmp_20090727160705]
Mon Jul 27 16:07:08 2009
Shutting down instance (abort)
License high water mark = 104
Mon Jul 27 16:07:09 2009
Instance terminated by USER, pid = 26184
Mon Jul 27 16:07:13 2009
Instance terminated by USER, pid = 17258
不但有常規的錯誤,還有ORA-600錯誤,看來要解決的問題還真是不少。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-618350/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- RAC環境出現Waiting for instances to leave資訊(二)AI
- 【RAC】使用一條“ps”命令獲取Linux環境下全部RAC叢集程式資訊Linux
- 一鍵RAC環境配置指令碼指令碼
- rac環境vip在linux下的連線資訊Linux
- 單例項opatch出現RAC資訊單例
- 【Oracle】 RAC 環境刪除oracle 之一Oracle
- RAC環境利用備份恢復RAC資料庫(一)資料庫
- rac環境下standby資料庫的實現資料庫
- rac環境下使用impdp匯入資料出錯
- DB遷移RAC環境
- oracle rac 環境檢測Oracle
- 提取java環境資訊Java
- 11gr2RAC環境解除安裝(一)
- RAC環境重建控制檔案
- ORACLE RAC 環境下修改IPOracle
- RAC環境的恢復策略
- Oracle RAC + Data Guard 環境搭建Oracle
- RAC環境下dataguard的搭建
- 【RAC】在RAC環境下需要惟一設定的重要引數列表
- Oracle RAC 10g for Solaris環境解除安裝(一)Oracle
- 【RAC】使用VMware虛擬機器搭建RAC環境虛擬機
- 存檔-環境資訊aptAPT
- RAC和ASM環境下打patchASM
- RAC 環境下修改歸檔模式模式
- RAC環境下配置OGG同步
- RAC環境中的TNSNAMES檔案
- RAC環境網路故障測試
- RAC環境STANDBY的SWITCHOVER切換
- 【RAC】RAC環境下歸檔日誌格式約定
- RAC環境中的儲存部分管理——RAC管理
- 記錄一次遷移環境 .env 出現的問題
- RAC環境在ASM上建立表空間出錯ORA-569ASM
- Oracle RAC一鍵部署001(主機環境校驗)Oracle
- RAC環境下的RMAN 備份指令碼一例指令碼
- Windows環境下的資訊收集Windows
- DataGuard環境程式資訊追蹤
- 搭建測試環境後出現的問題
- Oracle RAC 環境下的連線管理Oracle