因為IOS升級的需要，9月23號零晨2：00重啟了兩臺CISCO4006核心交換機，導致一套SMP業務雙機（SUN E6500 +A35300FC+Solaris8+SC2.2）系統掛住。

SMP兩臺主機均網路不正常，無法ping通網路,用串列埠線連線到主機串列埠上，在OK狀態下執行go後沒有反應，即系統已經hang住。

因重啟了兩臺CISCO4006核心交換機，導致網路中斷，以致雙機系統掛住。

1、用串列埠登入，重啟兩臺主機；

2、系統正常啟動後檢查系統狀態是否正常；

3、按以下方法啟動雙機的主備機：

1)、檢視當前雙機狀態：

Smp1#hastat

2）、啟動雙機的主機：

Smp1#scadmin startcluster smp smpcluster #scadmin startcluster localnodename clustname

3）、檢查是否已經啟動

Smp1#hastat

4）、啟動雙機的備機：

Smp2#scadmin startnode

5）、檢查雙機是否啟動正常：

Smp1#hastat

Getting Information from all the nodes ......

HIGH AVAILABILITY CONFIGURATION AND STATUS

-------------------------------------------

LIST OF NODES CONFIGURED IN CLUSTER

smp1 smp2

CURRENT MEMBERS OF THE CLUSTER

smp1 is a cluster member

smp2 is a cluster member

CONFIGURATION STATE OF THE CLUSTER

Configuration State on smp1: Stable

Configuration State on smp2: Stable

UPTIME OF NODES IN THE CLUSTER

uptime of smp1: 4:49pm up 85 day(s), 23:41, 13 users, load average: 1.59, 1.59, 1.58

uptime of smp2: 4:49pm up 463 day(s), 1:35, 1 user, load average: 1.43, 1.40, 1.41

LOGICAL HOSTS MASTERED BY THE CLUSTER MEMBERS

Logical Hosts Mastered on smp1:

smp

Logical Hosts for which smp1 is Backup Node:

None

Logical Hosts Mastered on smp2:

None

Logical Hosts for which smp2 is Backup Node:

smp

LOGICAL HOSTS IN MAINTENANCE STATE

None

STATUS OF PRIVATE NETS IN THE CLUSTER

Status of Interconnects on smp1:

interconnect0: selected

interconnect1: up

Status of private nets on smp1:

To smp1 - UP

To smp2 - UP

Status of Interconnects on smp2:

interconnect0: selected

interconnect1: up

Status of private nets on smp2:

To smp1 - UP

To smp2 - UP

STATUS OF PUBLIC NETS IN THE CLUSTER

Status of Public Network On smp1:

bkggrp r_adp status fo_time live_adp

nafo0 hme0:hme3 OK NEVER hme0

Status of Public Network On smp2:

bkggrp r_adp status fo_time live_adp

nafo0 hme0:hme3 OK NEVER hme0

STATUS OF DATA SERVICES RUNNING IN THE CLUSTER

Status Of Registered Data Services

iinsmp: On

Status Of Data Services Running On smp1

No Status Method for Data Service "iinsmp"

Status Of Data Services Running On smp2

Data Service "iinsmp":

Not being managed on this system

RECENT ERROR MESSAGES FROM THE CLUSTER

Recent Error Messages on smp1

Aug 14 17:57:29 smp1 snmpXdmid: Error in parsing packet from agent.

Aug 14 17:59:28 smp1 snmpdx: agent 1.3.6.1.4.1.42 not responding

Aug 15 09:03:44 smp1 su: 'su root' failed for ruser on /dev/pts/6

Aug 15 09:03:54 smp1 last message repeated 1 time

Aug 15 11:45:05 smp1 su: 'su root' failed for ruser on /dev/pts/9

Aug 15 12:07:40 smp1 automountd[482]: No network locking on 192.19.3.6 : contact admin to install server change

Aug 15 16:48:17 smp1 su: 'su root' failed for ruser on /dev/pts/6

Recent Error Messages on smp2

Aug 14 17:44:10 smp2 snmpdx: session_send_loopback_request() failed

Aug 14 17:44:12 smp2 snmpdx: error while receiving a pdu from 136.100.168.119.2481: The message has a wrong version (1)

Aug 14 17:44:24 smp2 last message repeated 4 times

Aug 14 17:59:10 smp2 snmpdx: agent relay-agent not responding

Aug 14 17:59:10 smp2 snmpXdmid: Error in parsing packet from agent.

Aug 14 18:01:10 smp2 snmpdx: agent 1.3.6.1.4.1.42 not responding

Aug 15 10:46:19 smp2 login: REPEATED LOGIN FAILURES ON /dev/pts/0 FROM smp1

建議：重啟交換機時一定要進行風險評估，確保不會對主機系統有影響。

總結：因為智慧網的SUN裝置雙機用的是SC2.2與小靈通的SC3.0差別較大，需要掌握雙機常用操作：

root@smp12 # scadmin

Missing required parameter

Usage:

scadmin [-a] [-f] startcluster localnodename clustname

scadmin [-a] [-f] startnode [clustname]

scadmin [-a] stopnode [clustname]

scadmin abortpartition localnodename clustname

scadmin continuepartition localnodename clustname

scadmin reldisks [clustname]

scadmin resdisks [clustname]

scadmin reserve cXtYdZ

scadmin switch clustname [-m] logical-hosts ...

scadmin switch clustname dest-host logical-hosts ...

scadmin switch clustname -r

例如：

啟主機：

Smp11# scadmin startcluster smp smpcluster #在主機上執行

啟備機：

Smp12#scadmin startnode #在備機上執行

切換雙機：

Smp11# scadmin switch smp12 smpwork #從smp11切換到smp12上

[@more@]

SUN E6500雙機hang住處理

相關文章