AIX RAC9I 心跳線斷掉測試(續)
昨天測試了心跳線斷掉的時候,叢集把節點2踢出,使資料庫關閉,但是HACMP並沒有關閉,也忘記看網路卡接管的情況
今天測試網路卡接管的情況以及客戶端訪問的情況
[oracle@P61A:/u01/app/oracle]$ifconfig -a
en0: flags=4e080863,80
inet 12.0.0.61 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
inet 10.10.1.61 netmask 0xffffff00 broadcast 10.10.1.255
inet 10.10.3.201 netmask 0xffffff00 broadcast 10.10.3.255
inet 10.10.3.101 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
[oracle@P61A:/u01/app/oracle]$rsh P61B ifconfig -a
en0: flags=4e080863,80
inet 12.0.0.62 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
inet 10.10.1.62 netmask 0xffffff00 broadcast 10.10.1.255
inet 10.10.3.202 netmask 0xffffff00 broadcast 10.10.3.255
inet 10.10.3.102 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
21:59:45 拔掉P61B的心跳網線
[oracle@P61A:/u01/app/oracle]$date
Tue May 12 21:59:48 CDT 2009
可以遠端連線的例項1,2,可以查詢某些檢視,但不是全部,系統掛起,還有無法執行存在對資料,資料字典的修改
P61B
Tue May 12 22:04:55 2009
IPC Send timeout detected. Sender ospid 340088
Tue May 12 22:05:20 2009
IPC Send timeout detected. Sender ospid 385220
Tue May 12 22:05:27 2009
Communications reconfiguration: instance 0
Tue May 12 22:05:32 2009
IPC Send timeout detected. Sender ospid 364746
Tue May 12 22:05:32 2009
IPC Send timeout detected. Sender ospid 254204
Tue May 12 22:05:57 2009
Trace dumping is performing id=[cdmp_20090512220527]
Tue May 12 22:06:00 2009
IPC Send timeout detected. Sender ospid 389318
Tue May 12 22:06:23 2009
Waiting for clusterware split-brain resolution
Tue May 12 22:06:52 2009
IPC Send timeout detected. Sender ospid 356598
Tue May 12 22:07:54 2009
Trace dumping is performing id=[cdmp_20090512220724]
Tue May 12 22:10:34 2009
IPC Send timeout detected. Sender ospid 327754
Tue May 12 22:16:23 2009
Errors in file /u01/app/oracle/admin/rac/bdump/rac2_lmon_356598.trc:
ORA-29740: evicted by member 1, group incarnation 3
Tue May 12 22:16:23 2009
LMON: terminating instance due to error 29740
Instance terminated by LMON, pid = 356598
P61A
Tue May 12 22:05:11 2009
IPC Send timeout detected. Sender ospid 450792
Tue May 12 22:05:23 2009
IPC Send timeout detected. Sender ospid 233720
Tue May 12 22:05:23 2009
IPC Send timeout detected. Sender ospid 266430
Tue May 12 22:05:42 2009
IPC Send timeout detected. Sender ospid 225300
Communications reconfiguration: instance 1
Waiting for clusterware split-brain resolution
Tue May 12 22:06:44 2009
Trace dumping is performing id=[cdmp_20090512220614]
Tue May 12 22:16:13 2009
Evicting instance 2 from cluster
Tue May 12 22:16:19 2009
Reconfiguration started (old inc 2, new inc 4)
List of nodes:
0
Nested/batched reconfiguration detected.
Global Resource Directory frozen
one node partition
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 699
745 GCS shadows traversed, 0 cancelled, 0 closed
304 GCS resources traversed, 0 cancelled
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
745 GCS shadows traversed, 0 replayed, 0 unopened
Submitted all GCS remote-cache requests
24 write requests issued in 745 GCS resources
6 PIs marked suspect, 0 flush PI msgs
Tue May 12 22:16:19 2009
Reconfiguration complete
Post SMON to start 1st pass IR
Tue May 12 22:16:19 2009
Instance recovery: looking for dead threads
Tue May 12 22:16:19 2009
Beginning instance recovery of 1 threads
Tue May 12 22:16:19 2009
Started redo scan
Tue May 12 22:16:19 2009
Completed redo scan
569 redo blocks read, 28 data blocks need recovery
Tue May 12 22:16:22 2009
Started recovery at
Thread 2: logseq 7, block 3, scn 0.0
Tue May 12 22:16:22 2009
Recovery of Online Redo Log: Thread 2 Group 3 Seq 7 Reading mem 0
Mem# 0 errs 0: /dev/rtrac_redo2_11
Tue May 12 22:16:22 2009
Completed redo application
Tue May 12 22:16:22 2009
Ended recovery at
Thread 2: logseq 7, block 572, scn 0.271410
2 data blocks read, 28 data blocks written, 569 redo blocks read
Ending instance recovery of 1 threads
SMON: about to recover undo segment 11
SMON: mark undo segment 11 as available
SMON: about to recover undo segment 12
SMON: mark undo segment 12 as available
SMON: about to recover undo segment 13
SMON: mark undo segment 13 as available
SMON: about to recover undo segment 14
SMON: mark undo segment 14 as available
SMON: about to recover undo segment 15
SMON: mark undo segment 15 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 17
SMON: mark undo segment 17 as available
SMON: about to recover undo segment 18
SMON: mark undo segment 18 as available
SMON: about to recover undo segment 19
SMON: mark undo segment 19 as available
SMON: about to recover undo segment 20
SMON: mark undo segment 20 as available
大概5分鐘才能察覺到腦裂,大概17分鐘才能解決
[oracle@P61A:/u01/app/oracle]$ifconfig -a
en0: flags=4e080863,80
inet 12.0.0.61 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
inet 10.10.1.61 netmask 0xffffff00 broadcast 10.10.1.255
inet 10.10.3.201 netmask 0xffffff00 broadcast 10.10.3.255
inet 10.10.3.101 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
[oracle@P61A:/u01/app/oracle]$rsh P61B ifconfig -a
en0: flags=4e080863,80
inet 12.0.0.62 netmask 0xffffff00 broadcast 12.0.0.255
en1: flags=4e080863,80
inet 10.10.1.62 netmask 0xffffff00 broadcast 10.10.1.255
inet 10.10.3.202 netmask 0xffffff00 broadcast 10.10.3.255
inet 10.10.3.102 netmask 0xffffff00 broadcast 10.10.3.255
lo0: flags=e08084b
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
IP也沒切換過去,也不可能切換的過去
看下LINUX下的測試了下,大概也需要15分鐘才能解決完腦裂,一個節點被強行關閉,但是主機不會重啟
ORACM程式崩潰,GSD程式還存在
服務IP的配置的9I感覺沒什麼意義,9I處理腦裂的方式並不是重啟主機,HACMP控制的服務IP應該需要編寫專門的指令碼才能實現切換,意義不大
參見linux下ORACLE叢集件的處理方式,也沒有服務IP這個概念
9i下處理腦裂狀況太慢了,基本上在15分鐘左右,期間可以執行部分查詢(應該是資料已經在SGA中,而且不需要重新SQL解析的這部分查詢,其他新的查詢執行時直接HANG在那裡知道超時)
9i下的HACMP,像10g一樣,把卷組管理起來應該就足夠了
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8242091/viewspace-594945/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- .net持續整合測試篇之Nunit that斷言
- 心跳檢測
- .net持續整合測試篇之Nunit常見斷言
- ubuntu經常斷網、掉線、上不去網的原因Ubuntu
- 心跳檢測機制
- 聊聊持續測試
- win10網路總是自動掉線斷網怎麼辦_win10網路自動掉線斷網的解決教程Win10
- Airpods與Win10系統連線聲音斷斷續續的解決方法AIWin10
- 測試,不斷成長
- .netcore持續整合測試篇之測試方法改造NetCore
- 自動化測試中如何判斷測試是否透過?詳解 Pytest 測試框架的斷言用法框架
- 持續測試效能的方法
- 聊聊持續測試與安全
- win10系統網路總掉線怎麼辦_win10網路連線不穩總是掉線斷網如何處理Win10
- 【AIX】AIX程式監控工具AI
- 長連線的心跳保持設計
- Laravel-Echo 線上離線檢測判斷Laravel
- JAVA之長連線、短連線和心跳包Java
- .net持續整合測試篇之Nunit引數化測試
- securecrt保持會話不會斷掉Securecrt會話
- linux滲透測試後續指南Linux
- 聊聊持續測試的進階
- Linux 核心的持續整合測試Linux
- 時間線測試
- websocket線上測試Web
- 固態硬碟掉電保護的原理及測試方法硬碟
- 聊聊 TCP 長連線和心跳那些事TCP
- junit 測試中各種斷言用法
- [譯]重新思考單元測試斷言
- 介面測試--apipost介面斷言詳解API
- 介面測試--自定義斷言設定
- Keepalived檢測mysql 3306心跳的指令碼MySql指令碼
- Spring Boot Admin使用及心跳檢測原理Spring Boot
- 持續交付會如何影響測試
- 使用 Xcode Server 持續整合 & 打包測試XCodeServer
- Nacos - 客戶端心跳續約及客戶端總結客戶端
- ray叢集多節點在NAT環境下的部署(ray兩個節點在不同wifi下連線上後丟失心跳掉線)WiFi
- AIX VGDAAI
- websocket線上測試工具Web