客戶一套ORACLE 10.2.0.4 的crs 問題處理

羽化殘虹發表於2014-10-07

由於客戶更換HBA 卡和光纖交換機介面後,後來發現資料庫沒起來,下面是處理過程

 客戶環境 兩個 ibm p570  os  6100-04-01-0944 oracle 10.2.0.4

 

遠端發現 第2 node ORACLE 安裝軟體的檔案按系統 已經100%了,哎,肯定是哪個程式瘋狂的寫吧lv撐滿。

 檢視 crs.log 發現基本所有資訊都是這個

 

2014-10-02 21:54:15.523: [  OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3980]=0x0

2014-10-02 21:54:15.523: [  OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3981]=0x0

2014-10-02 21:54:15.523: [  OCRRAW][1]proprdc_propr_fcl: proprhandle_fcl->propr_fcl_page[3982]=0x0

這種報錯在google上根本查不到,好吧,去MOS 看看 ,mos也比較少,找到了一些相似的問題,說是10.2.0.4 bug

 

   先檢視 crs alert 日誌檔案,發現了重大資訊

 

crsd(201070)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.215

[crsd(164818)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

 

磁碟有問題啦。。

檢視你2號節點 hdisk2 hdisk6 磁碟組屬性,使用者,許可權等都是正常

 

crw-rw----    1 oracle   oinstall     24, 10 Oct 03 09:55 /dev/rhdisk10

crw-rw----    1 oracle   oinstall     24, 11 Oct 03 09:55 /dev/rhdisk11

crw-rw----    1 oracle   oinstall     24, 12 Oct 03 09:48 /dev/rhdisk12

crw-rw----    1 oracle   oinstall     24, 13 Oct 03 09:48 /dev/rhdisk13

crw-rw----    1 oracle   oinstall     24, 14 Oct 03 09:47 /dev/rhdisk14

crw-rw----    1 oracle   oinstall     24, 15 Oct 03 09:45 /dev/rhdisk15

crw-rw----    1 root     oinstall     24,  2 Oct 03 09:55 /dev/rhdisk2

crw-rw----    1 oracle   oinstall     24,  3 Oct 03 09:55 /dev/rhdisk3

crw-rw----    1 oracle   oinstall     24,  4 Oct 03 09:55 /dev/rhdisk4

crw-rw----    1 oracle   oinstall     24,  5 Oct 03 09:55 /dev/rhdisk5

crw-rw----    1 root     oinstall     24,  6 Oct 03 09:55 /dev/rhdisk6

crw-rw----    1 oracle   oinstall     24,  7 Oct 03 09:55 /dev/rhdisk7

crw-rw----    1 oracle   oinstall     24,  8 Oct 03 09:30 /dev/rhdisk8

crw-rw----    1 oracle   oinstall     24,  9 Oct 03 09:09 /dev/rhdisk9

 

再檢視1號機器

crw-rw----    1 oracle   system     24, 10 Oct 03 09:55 /dev/rhdisk10

crw-rw----    1 oracle   system     24, 11 Oct 03 09:55 /dev/rhdisk11

crw-rw----    1 root   system     24, 12 Oct 03 09:48 /dev/rhdisk12

crw-rw----    1 root   system     24, 13 Oct 03 09:48 /dev/rhdisk13

crw-rw----    1 root   system     24, 14 Oct 03 09:47 /dev/rhdisk14

crw-rw----    1 root   system     24, 15 Oct 03 09:45 /dev/rhdisk15

crw-rw----    1 root     system     24,  2 Oct 03 09:55 /dev/rhdisk2

crw-rw----    1 root   system     24,  3 Oct 03 09:55 /dev/rhdisk3

crw-rw----    1 root   system     24,  4 Oct 03 09:55 /dev/rhdisk4

crw-rw----    1 root   system     24,  5 Oct 03 09:55 /dev/rhdisk5

crw-rw----    1 root     system     24,  6 Oct 03 09:55 /dev/rhdisk6

crw-rw----    1 root   system     24,  7 Oct 03 09:55 /dev/rhdisk7

crw-rw----    1 root   system     24,  8 Oct 03 09:30 /dev/rhdisk8

crw-rw----    1 root   system     24,  9 Oct 03 09:09 /dev/rhdisk9

 

1號機器的磁碟許可權和,陣列改成和2號機器一樣

crw-rw----    1 oracle   oinstall     24, 10 Oct 03 09:55 /dev/rhdisk10

crw-rw----    1 oracle   oinstall     24, 11 Oct 03 09:55 /dev/rhdisk11

crw-rw----    1 oracle   oinstall     24, 12 Oct 03 09:48 /dev/rhdisk12

crw-rw----    1 oracle   oinstall     24, 13 Oct 03 09:48 /dev/rhdisk13

crw-rw----    1 oracle   oinstall     24, 14 Oct 03 09:47 /dev/rhdisk14

crw-rw----    1 oracle   oinstall     24, 15 Oct 03 09:45 /dev/rhdisk15

crw-rw----    1 root     oinstall     24,  2 Oct 03 09:55 /dev/rhdisk2

crw-rw----    1 oracle   oinstall     24,  3 Oct 03 09:55 /dev/rhdisk3

crw-rw----    1 oracle   oinstall     24,  4 Oct 03 09:55 /dev/rhdisk4

crw-rw----    1 oracle   oinstall     24,  5 Oct 03 09:55 /dev/rhdisk5

crw-rw----    1 root     oinstall     24,  6 Oct 03 09:55 /dev/rhdisk6

crw-rw----    1 oracle   oinstall     24,  7 Oct 03 09:55 /dev/rhdisk7

crw-rw----    1 oracle   oinstall     24,  8 Oct 03 09:30 /dev/rhdisk8

crw-rw----    1 oracle   oinstall     24,  9 Oct 03 09:09 /dev/rhdisk9

 

但是2號好節點還是起不來,依然報同樣的錯誤,

檢視1號機器和2號機器的hdisk2 hdisk6 屬性

PCM             PCM/friend/otherapdisk                                         Path Control Module              False

PR_key_value    none                                                           Persistant Reserve Key Value     True

algorithm       fail_over                                                      Algorithm                        True

autorecovery    no                                                             Path/Ownership Autorecovery      True

clr_q           no                                                             Device CLEARS its Queue on error True

cntl_delay_time 0                                                              Controller Delay Time            True

cntl_hcheck_int 0                                                              Controller Health Check Interval True

dist_err_pcnt   0                                                              Distributed Error Percentage     True

dist_tw_width   50                                                             Distributed Error Sample Time    True

hcheck_cmd      inquiry                                                        Health Check Command             True

hcheck_interval 60                                                             Health Check Interval            True

hcheck_mode     nonactive                                                      Health Check Mode                True

location                                                                       Location Label                   True

lun_id          0x0                                                            Logical Unit Number ID           False

lun_reset_spt   yes                                                            LUN Reset Supported              True

max_retry_delay 60                                                             Maximum Quiesce Time             True

max_transfer    0x40000                                                        Maximum TRANSFER Size            True

node_name       0x200400a0b811758c                                             FC Node Name                     False

pvid            none                                                           Physical volume identifier       False

q_err           yes                                                            Use QERR bit                     True

q_type          simple                                                         Queuing TYPE                     True

queue_depth     10                                                             Queue DEPTH                      True

reassign_to     120                                                            REASSIGN time out value          True

reserve_policy  no_reserve                                                     Reserve Policy                   True

rw_timeout      30                                                             READ/WRITE time out value        True

scsi_id         0x10300                                                        SCSI ID                          False

start_timeout   60                                                             START unit time out value        True

unique_id       3E213600A0B800011758C0000C04C4BBE8D500F1815      FAStT03IBMfcp Unique device identifier         False

ww_name         0x201500a0b811758c                                             FC World Wide Name               False

再看1 號節

PCM             PCM/friend/otherapdisk                                         Path Control Module              False

PR_key_value    none                                                           Persistant Reserve Key Value     True

algorithm       fail_over                                                      Algorithm                        True

autorecovery    no                                                             Path/Ownership Autorecovery      True

clr_q           no                                                             Device CLEARS its Queue on error True

cntl_delay_time 0                                                              Controller Delay Time            True

cntl_hcheck_int 0                                                              Controller Health Check Interval True

dist_err_pcnt   0                                                              Distributed Error Percentage     True

dist_tw_width   50                                                             Distributed Error Sample Time    True

hcheck_cmd      inquiry                                                        Health Check Command             True

hcheck_interval 60                                                             Health Check Interval            True

hcheck_mode     nonactive                                                      Health Check Mode                True

location                                                                       Location Label                   True

lun_id          0x0                                                            Logical Unit Number ID           False

lun_reset_spt   yes                                                            LUN Reset Supported              True

max_retry_delay 60                                                             Maximum Quiesce Time             True

max_transfer    0x40000                                                        Maximum TRANSFER Size            True

node_name       0x200400a0b811758c                                             FC Node Name                     False

pvid            none                                                           Physical volume identifier       False

q_err           yes                                                            Use QERR bit                     True

q_type          simple                                                         Queuing TYPE                     True

queue_depth     10                                                             Queue DEPTH                      True

reassign_to     120                                                            REASSIGN time out value          True

reserve_policy  single_path                                                     Reserve Policy                   True

rw_timeout      30                                                             READ/WRITE time out value        True

scsi_id         0x10300                                                        SCSI ID                          False

start_timeout   60                                                             START unit time out value        True

unique_id       3E213600A0B800011758C0000C04C4BBE8D500F1815      FAStT03IBMfcp Unique device identifier         False

ww_name         0x201500a0b811758c                                             FC World Wide Name               False

 

發現 1號機器 hdisk2 hdisk6 ocr 磁碟)怎麼是single_path  按道理應該是共享的。後來發現1號機器的所有rac 磁碟都是這樣的。

立刻改掉

 Root使用者

for i in 2 3 4 5 6 7 8 9 10 11 12 13 14 15

          do chdev -l hdisk$i -a reserve_policy=no_reserve

          do

結果發現 hdisk2 hdisk6 改不了,裝置比較busy

0514-062 Cannot perform the requested function because the

                 specified device is busy.

刪除磁碟還是不行

#  rmdev -dl hdisk6

Method error (/usr/lib/methods/ucfgdevice):

        0514-062 Cannot perform the requested function because the

                 specified device is busy.

想想應該是2號機器 把ocr磁碟佔用了,所以我怎麼操作都不允許

檢視crs程式

oracle 196786 155908   0 09:05:18      -  0:00 /oracle/product/10.2.0/crs/bin/oclsomon.bin

root 103266 102694   1 09:05:17      -  0:47 /oracle/product/10.2.0/crs/bin/crsd.bin reboot

  oracle 107362 192550   0 09:05:19      -  0:05 /oracle/product/10.2.0/crs/bin/ocssd.bin

1號機器 停止crs,發現crs的程式還是存在,這裡介紹一下1號節點自從前幾天換了hba,手動停止 crsctl stop crs 命令感覺不好使了

  重啟1號機器還是更改不了磁碟,停止不了crs,索性root使用者進位制crs自動啟動,再重啟兩個機器

As root user on all node

cd /etc/

# ./init.crs disable crs

 

啟動之後這下沒有任何crs程式 ,1號機器嘗試更改磁碟屬性,這下可以了。。哈哈

# ps -ef |grep crs

    root 102694      1   0 08:54:41      -  0:00 /bin/sh /etc/init.crsd run

root 151958 180262   0 08:59:54  pts/0  0:00 grep crs

# chdev -l hdisk2 -a reserve_policy=no_reserve

hdisk2 changed

# chdev -l hdisk6 -a reserve_policy=no_reserve

hdisk6 changed

#

現在在2號節點啟動crs

# ./crsct start crs

  檢視 crs  alertlog

 [crsd(201070)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.215

[crsd(164818)]CRS-1006:The OCR location /dev/rhdisk2 is inaccessible. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.476

[crsd(164818)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-02 22:32:28.477

[crsd(164818)]CRS-1012:The OCR service started on node jxsmdb2.

2014-10-02 22:32:28.751

[crsd(164818)]CRS-1201:CRSD started on node jxsmdb2.

[cssd(70408)]CRS-1603:CSSD on node jxsmdb2 shutdown by user.

2014-10-03 09:05:23.615

[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk4. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.

2014-10-03 09:05:23.815

[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk3. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.

2014-10-03 09:05:23.815

[cssd(107362)]CRS-1605:CSSD voting file is online: /dev/rhdisk5. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/cssd/ocssd.log.

[cssd(107362)]CRS-1601:CSSD Reconfiguration complete. Active nodes are jxsmdb2 .

2014-10-03 09:08:44.541

[evmd(99266)]CRS-1401:EVMD started on node jxsmdb2.

2014-10-03 09:08:44.585

[crsd(103266)]CRS-1005:The OCR upgrade was completed. Version has changed from 169870336 to 169870336. Details in /oracle/product/10.2.0/crs/log/jxsmdb2/crsd/crsd.log.

2014-10-03 09:08:44.586

[crsd(103266)]CRS-1012:The OCR service started on node jxsmdb2.

2014-10-03 09:08:46.874

[crsd(103266)]CRS-1201:CRSD started on node jxsmdb2.

2014-10-03 09:08:47.163

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:08:47.183

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:09:43.287

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:09:43.297

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

2014-10-03 09:09:45.746

[crsd(103266)]CRS-1205:Auto-start failed for the CRS resource . Details in jxsmdb2.

檢視crsd.log

 

2014-10-03 09:05:19.356: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:19.357: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:05:20.702: [ COMMCRS][261]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

 

2014-10-03 09:05:20.702: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:20.702: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:05:22.041: [ COMMCRS][263]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

 

2014-10-03 09:05:22.041: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:22.041: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:05:23.380: [ COMMCRS][265]clsc_connect: (1106704d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

 

2014-10-03 09:05:23.380: [ CSSCLNT][1]clsssInitNative: connect failed, rc 9

 

2014-10-03 09:05:23.380: [  CRSRTI][1]32CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

2014-10-03 09:08:44.482: [  CLSVER][1]32Active Version from OCR:10.2.0.4.0

2014-10-03 09:08:44.482: [  CLSVER][1]32Active Version and Software Version are same

2014-10-03 09:08:44.485: [ CRSMAIN][1]32Initializing OCR

2014-10-03 09:08:44.491: [  OCRRAW][1]proprioo: for disk 0 (/dev/rhdisk2), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.491: [  OCRRAW][1]proprioo: for disk 1 (/dev/rhdisk6), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.574: [  OCRMAS][3352]th_master:12: I AM THE NEW OCR MASTER at incar 1. Node Number 2

2014-10-03 09:08:44.575: [  OCRRAW][3352]proprioo: for disk 0 (/dev/rhdisk2), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.575: [  OCRRAW][3352]proprioo: for disk 1 (/dev/rhdisk6), id match (1), my id set (1551842756,1866535888) total id sets (1), 1st set (1551842756,1866535888), 2nd set (0,0) my votes (1), total votes (2)

2014-10-03 09:08:44.596: [  OCRMAS][3352]th_master: Deleted ver keys from cache (master)

2014-10-03 09:08:44.596: [    CRSD][1]32ENV Logging level for Module: allcomp  0

2014-10-03 09:08:44.597: [    CRSD][1]32ENV Logging level for Module: default  0

2014-10-03 09:08:44.598: [    CRSD][1]32ENV Logging level for Module: COMMCRS  0

2014-10-03 09:08:44.598: [    CRSD][1]32ENV Logging level for Module: COMMNS  0

2014-10-03 09:08:44.599: [    CRSD][1]32ENV Logging level for Module: CRSUI  0

2014-10-03 09:08:44.600: [    CRSD][1]32ENV Logging level for Module: CRSCOMM  0

2014-10-03 09:08:44.600: [    CRSD][1]32ENV Logging level for Module: CRSRTI  0

2014-10-03 09:08:44.601: [    CRSD][1]32ENV Logging level for Module: CRSMAIN  0

2014-10-03 09:08:44.602: [    CRSD][1]32ENV Logging level for Module: CRSPLACE  0

2014-10-03 09:08:44.603: [    CRSD][1]32ENV Logging level for Module: CRSAPP  0

2014-10-03 09:08:44.603: [    CRSD][1]32ENV Logging level for Module: CRSRES  0

2014-10-03 09:08:44.604: [    CRSD][1]32ENV Logging level for Module: CRSOCR  0

2014-10-03 09:08:44.605: [    CRSD][1]32ENV Logging level for Module: CRSTIMER  0

2014-10-03 09:08:44.605: [    CRSD][1]32ENV Logging level for Module: CRSEVT  0

2014-10-03 09:08:44.606: [    CRSD][1]32ENV Logging level for Module: CRSD  0

2014-10-03 09:08:44.607: [    CRSD][1]32ENV Logging level for Module: CLUCLS  0

2014-10-03 09:08:44.607: [    CRSD][1]32ENV Logging level for Module: CLSVER  0

2014-10-03 09:08:44.608: [    CRSD][1]32ENV Logging level for Module: OCRRAW  0

2014-10-03 09:08:44.609: [    CRSD][1]32ENV Logging level for Module: OCROSD  0

2014-10-03 09:08:44.609: [    CRSD][1]32ENV Logging level for Module: CSSCLNT  0

2014-10-03 09:08:44.610: [    CRSD][1]32ENV Logging level for Module: OCRAPI  0

2014-10-03 09:08:44.611: [    CRSD][1]32ENV Logging level for Module: OCRUTL  0

2014-10-03 09:08:44.612: [    CRSD][1]32ENV Logging level for Module: OCRMSG  0

2014-10-03 09:08:44.612: [    CRSD][1]32ENV Logging level for Module: OCRCLI  0

2014-10-03 09:08:44.613: [    CRSD][1]32ENV Logging level for Module: OCRCAC  0

2014-10-03 09:08:44.614: [    CRSD][1]32ENV Logging level for Module: OCRSRV  0

2014-10-03 09:08:44.614: [    CRSD][1]32ENV Logging level for Module: OCRMAS  0

2014-10-03 09:08:44.615: [ CRSMAIN][1]32Filename is /oracle/product/10.2.0/crs/crs/init/jxsmdb2.pid

2014-10-03 09:08:44.651: [ CRSMAIN][1]32Using Authorizer location: /oracle/product/10.2.0/crs/crs/auth/

[  clsdmt][8235]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=jxsmdb2DBG_CRSD))

2014-10-03 09:08:44.667: [ CRSMAIN][1]32Initializing RTI

2014-10-03 09:08:44.719: [CRSTIMER][8749]32Timer Thread Starting.

2014-10-03 09:08:44.740: [  CRSRES][1]32Parameter SECURITY = 1, running in USER Mode

2014-10-03 09:08:44.743: [ CRSMAIN][1]32Initializing EVMMgr

2014-10-03 09:08:44.942: [ COMMCRS][9006]clsc_connect: (1139c41d0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth))

 

2014-10-03 09:08:46.745: [ CRSMAIN][1]32CRSD locked during state recovery, please wait.

2014-10-03 09:08:46.824: [ CRSMAIN][1]32CRSD recovered, unlocked.

2014-10-03 09:08:46.847: [ CRSMAIN][1]32QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))

2014-10-03 09:08:46.847: [ CRSMAIN][1]32QS socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=ora_crsqs))

2014-10-03 09:08:46.855: [ CRSMAIN][1]32CRSD UI socket on: (ADDRESS=(PROTOCOL=ipc)(KEY=CRSD_UI_SOCKET))

2014-10-03 09:08:46.873: [ CRSMAIN][1]32E2E socket on: (ADDRESS=(PROTOCOL=tcp)(HOST=jxsmdb2_priv)(PORT=49896))

2014-10-03 09:08:46.873: [ CRSMAIN][1]32Starting Threads

2014-10-03 09:08:46.874: [ CRSMAIN][10292]32Starting runCommandServer for (UI = 1, E2E = 0). 0

2014-10-03 09:08:46.874: [ CRSMAIN][10549]32Starting runCommandServer for (UI = 1, E2E = 0). 1

2014-10-03 09:08:46.874: [ CRSMAIN][1]32CRS Daemon Started.

2014-10-03 09:08:46.888: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.901: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.911: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.925: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.934: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.942: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.950: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.958: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.966: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.974: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.983: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.991: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:46.999: [  CRSRES][1]32 startup = 1

2014-10-03 09:08:47.173: [  CRSRES][11834]32startRunnable: setting CLI values

2014-10-03 09:08:47.188: [  CRSRES][11834]32Attempting to start `ora.jxsmdb2.vip` on member `jxsmdb2`

2014-10-03 09:08:47.189: [  CRSRES][11577]32startRunnable: setting CLI values

2014-10-03 09:08:47.199: [  CRSRES][11577]32Attempting to start `ora.jxsmdb2.ASM2.asm` on member `jxsmdb2`

2014-10-03 09:08:49.742: [  CRSRES][11834]32Start of `ora.jxsmdb2.vip` on member `jxsmdb2` succeeded.

2014-10-03 09:08:49.775: [  CRSRES][11834]32startRunnable: setting CLI values

2014-10-03 09:08:49.783: [  CRSRES][11834]32Attempting to start `ora.jxsmdb2.LISTENER_JXSMDB2.lsnr` on member `jxsmdb2`

2014-10-03 09:08:53.948: [  CRSRES][11834]32Start of `ora.jxsmdb2.LISTENER_JXSMDB2.lsnr` on member `jxsmdb2` succeeded.

2014-10-03 09:08:54.410: [  CRSRES][12619]32CRS-1002: Resource 'ora.jxsmdb2.LISTENER_JXSMDB2.lsnr' is already running on member 'jxsmdb2'

 

2014-10-03 09:09:08.992: [  CRSRES][12625]32startRunnable: setting CLI values

2014-10-03 09:09:08.999: [  CRSRES][12625]32Attempting to start `ora.jxsmdb2.ons` on member `jxsmdb2`

2014-10-03 09:09:11.139: [  CRSRES][12625]32Start of `ora.jxsmdb2.ons` on member `jxsmdb2` succeeded.

2014-10-03 09:09:11.216: [  CRSRES][11577]32Start of `ora.jxsmdb2.ASM2.asm` on member `jxsmdb2` succeeded.

2014-10-03 09:09:11.239: [  CRSRES][11577]32startRunnable: setting CLI values

2014-10-03 09:09:11.244: [  CRSRES][11577]32Attempting to start `ora.jxsmk.jxsmk2.inst` on member `jxsmdb2`

2014-10-03 09:09:43.269: [  CRSRES][11577]32Start of `ora.jxsmk.jxsmk2.inst` on member `jxsmdb2` succeeded.

2014-10-03 09:09:43.277: [  CRSRES][12894]32Skip online resource: ora.jxsmdb2.ons

2014-10-03 09:09:43.319: [  CRSRES][13151]32startRunnable: setting CLI values

2014-10-03 09:09:43.345: [  CRSRES][12637]32startRunnable: setting CLI values

2014-10-03 09:09:43.349: [  CRSRES][13151]32Attempting to start `ora.jxsmk.db` on member `jxsmdb2`

2014-10-03 09:09:43.358: [  CRSRES][11610]32startRunnable: setting CLI values

2014-10-03 09:09:43.365: [  CRSRES][12637]32Attempting to start `ora.jxsmdb2.gsd` on member `jxsmdb2`

2014-10-03 09:09:43.371: [  CRSRES][11610]32Attempting to start `ora.jxsmdb1.vip` on member `jxsmdb2`

2014-10-03 09:09:43.916: [  CRSRES][13151]32Start of `ora.jxsmk.db` on member `jxsmdb2` succeeded.

2014-10-03 09:09:44.378: [  CRSRES][12637]32Start of `ora.jxsmdb2.gsd` on member `jxsmdb2` succeeded.

2014-10-03 09:09:44.416: [  CRSRES][13668]32CRS-1002: Resource 'ora.jxsmk.db' is already running on member 'jxsmdb2'

 

2014-10-03 09:09:45.730: [  CRSRES][11610]32Start of `ora.jxsmdb1.vip` on member `jxsmdb2` succeeded.

檢視ocssd.log

jxsmdb2->cd cssd

jxsmdb2->tail -f ocssd.log

[    CSSD]2014-10-03 09:05:23.603 [1] >TRACE:   clssnmFatalInit: fatal mode enabled

[    CSSD]2014-10-03 09:05:23.692 [2829] >TRACE:   clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=jxsmdb2_priv)(PORT=49895))

 

[    CSSD]2014-10-03 09:05:23.699 [2829] >TRACE:   clssnmconnect: connecting to node(1), con(1112d8b10), flags 0x0003

[    CSSD]2014-10-03 09:05:23.700 [2829] >TRACE:   clssnmDiscHelper: jxsmdb1, node(1) connection failed, con (1112d8b10), probe(0)

[    CSSD]2014-10-03 09:05:23.741 [3086] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_2))

[    CSSD]2014-10-03 09:05:23.741 [3086] >TRACE:   clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_jxsmdb2_crs))

[    CSSD]2014-10-03 09:05:23.752 [3857] >TRACE:   clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=25)(HOST=191.191.191.101)(PORT=32823))

[    CSSD]2014-10-03 09:05:23.804 [1544] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(3) wrtcnt(78639) LATS(190241056) Disk lastSeqNo(78639)

[    CSSD]2014-10-03 09:05:30.781 [4628] >TRACE:   clssnmRcfgMgrThread: Local Join

[    CSSD]2014-10-03 09:08:44.082 [4628] >WARNING: clssnmLocalJoinEvent: takeover succ

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmDoSyncUpdate: Initiating sync 1

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (27000)ms

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (11)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSendSync: syncSeqNo(1)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)

[    CSSD]2014-10-03 09:08:44.082 [2829] >TRACE:   clssnmHandleSync: diskTimeout set to (27000)ms

[    CSSD]2014-10-03 09:08:44.082 [2829] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[jxsmdb2] seq[1] sync[1]

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(11)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmDoSyncUpdate: node(2) is transitioning from joining state to active state

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (13)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)

[    CSSD]2014-10-03 09:08:44.082 [2829] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(1)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(13)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmCheckDskInfo: Checking disk info...

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmCheckDskInfo: diskTimeout set to (200000)ms

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmEvict: Start

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmWaitOnEvictions: Start

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: Ack message type (15)

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE

[    CSSD]2014-10-03 09:08:44.082 [4628] >TRACE:   clssnmSendUpdate: syncSeqNo(1)

[    CSSD]2014-10-03 09:08:44.083 [4628] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmUpdateNodeState: node 2, state (2/3) unique (1412298321/1412298321) prevConuni(0) birth (1/1) (old/new)

[    CSSD]2014-10-03 09:08:44.083 [2829] >USER:    clssnmHandleUpdate: SYNC(1) from node(2) completed

[    CSSD]2014-10-03 09:08:44.083 [2829] >USER:    clssnmHandleUpdate: NODE 2 (jxsmdb2) IS ACTIVE MEMBER OF CLUSTER

[    CSSD]2014-10-03 09:08:44.083 [2829] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms

[    CSSD]2014-10-03 09:08:44.083 [4628] >TRACE:   clssnmWaitForAcks: done, msg type(15)

[    CSSD]2014-10-03 09:08:44.083 [4628] >TRACE:   clssnmDoSyncUpdate: Sync 1 complete!

[    CSSD]2014-10-03 09:08:44.101 [1] >USER:    NMEVENT_SUSPEND [00][00][00][00]

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmReconfigThread:  started for reconfig (1)

[    CSSD]2014-10-03 09:08:44.105 [4885] >USER:    NMEVENT_RECONFIG [00][00][00][04]

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 1

[    CSSD]2014-10-03 09:08:44.105 [3857] >TRACE:   clssgmPeerListener: connects done (1/1)

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmEstablishMasterNode: MASTER for 1 is node(2) birth(1)

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmChangeMasterNode: requeued 0 RPCs

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes

 

[    CSSD]CLSS-3001: local node number 2, master node number 2

 

[    CSSD]2014-10-03 09:08:44.105 [4885] >TRACE:   clssgmReconfigThread:  completed for reconfig(1), with status(1)

[    CSSD]2014-10-03 09:08:44.266 [3086] >TRACE:   clssgmCommonAddMember: clsomon joined (2/0x1000000/#CSS_CLSSOMON

 

 

檢視crs 服務

crs_stat -t

Name           Type           Target    State     Host       

------------------------------------------------------------

ora....SM1.asm application    ONLINE    OFFLINE               

ora....B1.lsnr application    ONLINE    OFFLINE              

ora....db1.gsd application    ONLINE    OFFLINE              

ora....db1.ons application    ONLINE    OFFLINE              

ora....db1.vip application    ONLINE    ONLINE    jxsmdb2    

ora....SM2.asm application    ONLINE    ONLINE    jxsmdb2    

ora....B2.lsnr application    ONLINE    ONLINE    jxsmdb2    

ora....db2.gsd application    ONLINE    ONLINE    jxsmdb2    

ora....db2.ons application    ONLINE    ONLINE    jxsmdb2    

ora....db2.vip application    ONLINE    ONLINE    jxsmdb2    

ora.jxsmk.db   application    ONLINE    ONLINE    jxsmdb2    

ora....k1.inst application    ONLINE    OFFLINE              

ora....k2.inst application    ONLINE    ONLINE    jxsmdb2 

資料庫終於在2號節點起來了

現在想想最開始的mos說的bug的原因估計是 ocr無法訪問,導致的。Mos上說的打pach 應該是在磁碟,硬體,系統都沒啥問題的情形下。

啟動1號機器crs

 

->tail -f al*

[cssd(143604)]CRS-1605:CSSD voting file is online: /dev/rhdisk5. Details in /oracle/product/10.2.0/crs/log/jxsmdb1/cssd/ocssd.log.

[cssd(143604)]CRS-1601:CSSD Reconfiguration complete. Active nodes are jxsmdb1 jxsmdb2 .

2014-09-30 17:39:11.803

[crsd(139286)]CRS-1012:The OCR service started on node jxsmdb1.

2014-09-30 17:39:12.848

[evmd(151694)]CRS-1401:EVMD started on node jxsmdb1.

2014-09-30 17:39:15.807

[crsd(139286)]CRS-1201:CRSD started on node jxsmdb1.

2014-10-02 00:05:49.042

[crsd(159746)]CRS-1011:OCR cannot determine that the OCR content contains the latest updates. Details in /oracle/product/10.2.0/crs/log/jxsmdb1/crsd/crsd.log.

Terminated
可以看到crs起來了


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26175573/viewspace-1290649/,如需轉載,請註明出處,否則將追究法律責任。

相關文章