Linux,Network manager 導致節點異常重啟

westzq1984發表於2013-09-05
推斷是Network manager 導致的,原因待查

今天在VmWare的虛擬機器上裝了個測試RAC,又遇到了一個摸不到頭緒的問題
CRS裝好後,一旦登陸圖形介面,節點就重啟,事情就有這麼巧
不登陸圖形介面,觀察了1個小時沒問題,一旦登陸後,立刻重啟

在OS日誌中,一旦登陸圖形介面,重啟前的日誌如下
Sep  5 19:29:18 dm01db01 nm-system-settings: Loaded plugin ifcfg-rh: (c) 2007 - 2008 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
Sep  5 19:29:18 dm01db01 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ...
Sep  5 19:29:18 dm01db01 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ...
Sep  5 19:29:18 dm01db01 nm-system-settings:    ifcfg-rh:     read connection 'System eth1'
Sep  5 19:29:18 dm01db01 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ...
Sep  5 19:29:18 dm01db01 nm-system-settings:    ifcfg-rh:     read connection 'System eth0'

叢集件沒有任何日誌,就像機器被人直接重啟了一樣,找不到任何原因
ping心跳,偶爾有200多ms,但是重啟前,ping都在幾ms內
vmstat監控,CPU利用率也沒有問題

測試瞭如下調整:
1.加大 misscount 無效
2.調整 diagwait,也沒有任何日誌
3.關閉了無用的服務,無效
4.重新換了個網段,無效

一直覺得是網路的問題,搜尋關鍵字 ifcfg-rh ,找到了一篇文章 OEL: Error: Missing Or Invalid IP4 Prefix '0' On Linux Server (Doc ID 1522095.1)
雖然現象和我的問題無關,但是抱著死馬當活馬醫的想法,跟著文件關閉了Network manager

1.在/etc/sysconfig/network-scripts/ifcfg-eth* 中增加 NM_CONTROLLED="no"
2.chkconfig NetworkManager off
3.reboot

重啟後主機正常。在OS日誌中看到:
Sep  5 19:41:06 dm01db01 nm-system-settings: Loaded plugin ifcfg-rh: (c) 2007 - 2008 Red Hat, Inc.  To report bugs please use the NetworkManager mailing list.
Sep  5 19:41:06 dm01db01 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-lo ...
Sep  5 19:41:06 dm01db01 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth1 ...
Sep  5 19:41:06 dm01db01 nm-system-settings:    ifcfg-rh:     read connection 'System eth1'
Sep  5 19:41:06 dm01db01 nm-system-settings:    ifcfg-rh: Ignoring connection 'System eth1' and its device because NM_CONTROLLED was false.
Sep  5 19:41:06 dm01db01 nm-system-settings:    ifcfg-rh: parsing /etc/sysconfig/network-scripts/ifcfg-eth0 ...
Sep  5 19:41:06 dm01db01 nm-system-settings:    ifcfg-rh:     read connection 'System eth0'
Sep  5 19:41:06 dm01db01 nm-system-settings:    ifcfg-rh: Ignoring connection 'System eth0' and its device because NM_CONTROLLED was false.

可以看到配置被忽略掉了。

先記錄一個,以後在研究

版本資訊
[root@dm01db01 network-scripts]# cat /etc/issue
Oracle Linux Server release 5.9
Kernel \r on an \m

[root@dm01db01 network-scripts]# uname -a
Linux dm01db01 2.6.39-300.26.1.el5uek #1 SMP Thu Jan 3 18:31:38 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@dm01db01 network-scripts]# /u01/app/oracle/product/crs/bin/crsctl query crs activeversion
CRS active version on the cluster is [10.2.0.5.0]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8242091/viewspace-772247/,如需轉載,請註明出處,否則將追究法律責任。

相關文章