IP地址被清空導致例項重啟
客戶10.2.0.4
RAC for Solaris 10環境突然出現了例項重啟的現象。
資料庫正常執行到下午3點左右,隨後兩個節點分別重啟,其中一個節點上的例項無法自動啟動。檢查兩個例項的告警日誌發現,在節點重啟前,兩個節點都出現了明顯的ORA-27504錯誤:
Wed Apr 10 15:00:05 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_10997.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found.
Check output from ifconfig command
Wed Apr 10 15:00:06 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_11007.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found.
Check output from ifconfig command
Wed Apr 10 15:00:06 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_11009.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found.
Check output from ifconfig command
Wed Apr 10 15:00:06 2013
Errors in file /oracle/admin/orcl/udump/orcl1_ora_11011.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.3 not found.
Check output from ifconfig command
.
.
.
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25688
Receiver: inst 2 binc 427282 ospid 11838
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25724
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25680
Receiver: inst 2 binc 431591 ospid 11822
Receiver: inst 2 binc 431795 ospid 11874
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25684
Receiver: inst 2 binc 428985 ospid 11826
Wed Apr 10 15:07:08 2013
IPC Send timeout detected.Sender: ospid 25708
Receiver: inst 2 binc 430048 ospid 11858
Wed Apr 10 15:07:09 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer
operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.1 inc 4 for msg type 44 from opid 7
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.12 inc 4 for msg type 44 from opid 21
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.2 inc 4 for msg type 44 from opid 8
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.3 inc 4 for msg type 44 from opid 10
Wed Apr 10 15:07:35 2013
IPC Send timeout to 1.8 inc 4 for msg type 44 from opid 15
Wed Apr 10 15:08:13 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer
operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:08:16 2013
IPC Send timeout detected.Sender: ospid 25748
Receiver: inst 2 binc 430164 ospid 11890
.
.
.
Wed Apr 10 15:08:53 2013
IPC Send timeout to 1.13 inc 4 for msg type 36 from opid 176
Wed Apr 10 15:08:53 2013
IPC Send timeout to 1.15 inc 4 for msg type 36 from opid 167
Wed Apr 10 15:08:57 2013
IPC Send timeout to 1.4 inc 4 for msg type 32 from opid 180
.
.
.
Wed Apr 10 15:15:51 2013
Evicting instance 2 from cluster
Wed Apr 10 15:16:09 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer
operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:16:40 2013
Waiting for instances to leave:
2
Wed Apr 10 15:17:00 2013
Waiting for instances to leave:
2
Wed Apr 10 15:17:09 2013
ospid 25678: network interface with IP address 192.168.168.3 no longer
operational
requested interface 192.168.168.3 not found. Check output from ifconfig command
Wed Apr 10 15:17:20 2013
Waiting for instances to leave:
2
節點2上的錯誤資訊與之類似:
.
.
.
Wed Apr 10 15:19:07 2013
Errors in file /oracle/admin/orcl/udump/orcl2_ora_14065.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.4 not found.
Check output from ifconfig command
Wed Apr 10 15:19:08 2013
Errors in file /oracle/admin/orcl/udump/orcl2_ora_14057.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:if_not_found failed with status: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information: requested interface 192.168.168.4 not found.
Check output from ifconfig command
Wed Apr 10 15:19:46 2013
ospid 11820: network interface with IP address 192.168.168.4 no longer
operational
requested interface 192.168.168.4 not found. Check output from ifconfig command
Wed Apr 10 15:20:46 2013
ospid 11820: network interface with IP address 192.168.168.4 no longer
operational
requested interface 192.168.168.4 not found. Check output from ifconfig command
Wed Apr 10 15:20:55 2013
Errors in file /oracle/admin/orcl/bdump/orcl2_lmon_11818.trc:
ORA-29740: evicted by member 0, group incarnation 6
Wed Apr 10 15:20:55 2013
LMON: terminating instance due to error 29740
Wed Apr 10 15:20:55 2013
Errors in file /oracle/admin/orcl/bdump/orcl2_smon_11924.trc:
ORA-29740: evicted by member , group incarnation
Wed Apr 10 15:20:55 2013
Errors in file /oracle/admin/orcl/bdump/orcl2_lmse_11886.trc:
ORA-29740: evicted by member , group incarnation
Wed
Wed
Apr 10 16:11:37 2013
Starting ORACLE instance (normal)
Wed Apr 10 16:11:45 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:45 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:11:45 2013
Oracle Instance Startup operation failed. Another process may be attempting to
startup or shutdown this Instance.
Wed Apr 10 16:11:45 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:11:50 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:50 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:11:50 2013
Oracle Instance Startup operation failed. Another process may be attempting to
startup or shutdown this Instance.
Wed Apr 10 16:11:50 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:11:54 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:11:54 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:11:54 2013
Oracle Instance Startup operation failed. Another process may be attempting to
startup or shutdown this Instance.
Wed Apr 10 16:11:54 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:29 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:29 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:29 2013
Oracle Instance Startup operation failed. Another process may be attempting to
startup or shutdown this Instance.
Wed Apr 10 16:12:29 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:47 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:47 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:47 2013
Oracle Instance Startup operation failed. Another process may be attempting to
startup or shutdown this Instance.
Wed Apr 10 16:12:47 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:52 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:52 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:52 2013
Oracle Instance Startup operation failed. Another process may be attempting to
startup or shutdown this Instance.
Wed Apr 10 16:12:52 2013
Failed to acquire instance startup/shutdown serialization primitive
Wed Apr 10 16:12:56 2013
sculkget: failed to lock /oracle/products/10.2/db_1/dbs/lkinstorcl2 exclusive
Wed Apr 10 16:12:56 2013
sculkget: lock held by PID: 6912
Wed Apr 10 16:12:56 2013
Oracle Instance Startup operation failed. Another process may be attempting to
startup or shutdown this Instance.
Wed Apr 10 16:12:56 2013
Failed to acquire instance startup/shutdown serialization primitive
導致問題的原因根據錯誤資訊很容易分析出來,節點2上的IP地址被修改,導致心跳通訊出現了異常,而節點1試圖將節點2踢出叢集,但是由於無法和節點2之間進行通訊,因此只有等待節點2重啟。
檢查節點2的作業系統日誌:
Apr 10 15:00:04 bj-sst-xhm-3f2-m5k-02 ip: [ID 482227
kern.notice] ip_arp_done: init failed
Apr 10 15:07:37 bj-sst-xhm-3f2-m5k-02 Had[4135]: [ID 702911 daemon.notice] VCS
CRITICAL V-16-1-50086 CPU usage on bj-sst-xhm-3f2-m5k-02 is 92%
Apr 10 15:18:41 bj-sst-xhm-3f2-m5k-02 sshd[13485]: [ID 800047 auth.error]
error: Failed to allocate internet-domain X11 display socket.
在15點04秒時出現的ip_arp_done: init failed資訊,說明設定網路卡介面時使用了主機名資訊,且主機的IP地址被線上修改。
最後根據HISTORY確認,發現有人透過root登入系統,執行ifconfig –a6來檢查IPV6的地址,但是命令敲錯,執行了ifconfig –a 6,在a和6之間多了一個空格,導致主機所有的IP地址被設定成0.0.0.0,於是導致了上面的錯誤。
這再次說明,對於root這種許可權使用者而言,任何的不小心都可能會導致非常嚴重的後果。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-1060787/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- IP packet reassembles failed導致例項被驅逐AI
- MongoDB例項重啟失敗探究(大事務Redo導致)MongoDB
- insert變數太多導致例項重啟ORA-00600、ORA-01006變數
- 大型監控網路系統規劃ip地址例項
- MAC address(實體地址)重複導致的網路故障Mac
- 導致爬蟲使用代理IP卻仍被限制的原因爬蟲
- Redis CVE-2020-14147導致例項異常退出Redis
- db2 清空表導致表出現UnavailableDB2AI
- 最佳實踐:負載均衡SLB支援自定義VPC例項IP地址負載
- 記一次ORA-01102導致資料庫例項無法啟動案例資料庫
- 導致IP被封的原因
- CSS3 translate導致字型模糊的例項程式碼CSSS3
- 記php-fpm重啟導致的一個bugPHP
- MySQL 5.6因為OOM導致資料庫重啟MySqlOOM資料庫
- MySQL Case-時間問題導致MySQL例項批次當機MySql
- 案例分享-full gc導致k8s pod重啟GCK8S
- K8S 生態週報| containerd 存在 bug 會導致 Pod 被重啟,建議升級K8SAI
- 15、MySQL Case-時間問題導致MySQL例項批次當機MySql
- 3.1.5 啟動例項
- Linux中ip命令的使用例項Linux
- 記一次 Mac 意外重啟導致的 Homestead 問題Mac
- mstar因裝置讀不到導致,待機重啟問題
- Oracle sysman.mgmt_jobs導致資料庫自動重啟Oracle資料庫
- Android之點選Home鍵後再次開啟導致APP重啟問題AndroidAPP
- 【TCP/IP】IP地址分類和特殊IP地址TCP
- 記一次執行緒池配置導致的ThreadLocal清空執行緒thread
- win10 ip地址選項打不開怎麼辦_win10 ip地址選項打不開如何解決Win10
- IP地址
- MAC 地址與IP地址Mac
- 伺服器意外斷電導致無法重啟資料恢復伺服器資料恢復
- 關於沒有熔斷降級導致服務重啟問題
- SQL Server隱藏例項會導致Alwasy on手動故障轉移時報error 26SQLServerError
- 2.4.9 Step 8: 啟動例項
- 3.1.5.9 啟動遠端例項
- Containerd 的 Bug 導致容器被重建!如何避免?AI
- 導致爬蟲被限制的原因有哪些?爬蟲
- JS · \r\n被轉義導致出錯JS
- mysql的新建索引會導致insert被lockedMySql索引
- docker Redis 被任意連結 導致被 kdevtmpfsi 挖礦記錄DockerRedisdev