ONS自動offline - Authentication OSD error, op: scls_auth_client_response_set

tolywang發表於2010-11-15
4節點10.2.0.4 RAC (64bit) ,  節點2出現問題下線, 現在剩下 1,3,4,5 。 Linux AS 5.3  64bit .  
因為節點3的記憶體壞掉一根, 需要停機更換,   17:15分左右關閉節點3後, 更換記憶體, 然後開啟,所有
節點的所有CRS服務都非常正常,  其他的動作由於是海外的DBA操作, 沒有仔細監控, 他好像rebuild了
一個table的index,    從log中還可以看出, 應該做了expdp備份操作, 從ons log可以看到明顯的錯誤
資訊,  但是還不能明顯看出到底是什麼導致了ons 在 22:30:39 出現問題 。  





mxrac01$crs_stat -t  
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.mxdell.db  application    ONLINE    ONLINE    mxrac01     
ora....l1.inst application    ONLINE    ONLINE    mxrac01     
ora....l3.inst application    ONLINE    ONLINE    mxrac03     
ora....l4.inst application    ONLINE    ONLINE    mxrac04     
ora....l5.inst application    ONLINE    ONLINE    mxrac05     
ora....01.lsnr application    ONLINE    ONLINE    mxrac01     
ora....c01.gsd application    ONLINE    ONLINE    mxrac01     
ora....c01.ons application    ONLINE    ONLINE    mxrac01     
ora....c01.vip application    ONLINE    ONLINE    mxrac01     
ora....03.lsnr application    ONLINE    ONLINE    mxrac03     
ora....c03.gsd application    ONLINE    ONLINE    mxrac03     
ora....c03.ons application    ONLINE    ONLINE    mxrac03     
ora....c03.vip application    ONLINE    ONLINE    mxrac03     
ora....04.lsnr application    ONLINE    ONLINE    mxrac04     
ora....c04.gsd application    ONLINE    ONLINE    mxrac04     
ora....c04.ons application    ONLINE    ONLINE    mxrac04     
ora....c04.vip application    ONLINE    ONLINE    mxrac04     
ora....05.lsnr application    ONLINE    ONLINE    mxrac05     
ora....c05.gsd application    ONLINE    ONLINE    mxrac05     
ora....c05.ons application    ONLINE    OFFLINE              
ora....c05.vip application    ONLINE    ONLINE    mxrac05     
mxrac01$
mxrac01$
mxrac01$






節點5上ons的log , 錯誤好像是許可權相關的一些提示 。


mxrac05$vi ora.mxrac05.ons.log   

2010-11-14 22:30:39.763: [ CSSCLNT][3030990112]clsssInitNative: connect failed, rc 2
2010-11-14 22:30:39.772: [    RACG][3030990112] [29004][3030990112][ora.mxrac05.ons]: clsrccssgetctx: clsssinit() failed. rc=3
2010-11-14 22:30:39.773: [ COMMCRS][3030990112]Authentication OSD error, op: scls_auth_client_response_set
loc: write
info: len -1 != expected 4
dep: 28
2010-11-14 22:30:39.773: [    RACG][3030990112] [29004][3030990112][ora.mxrac05.ons]: clsrcgetprsrctx: prsr_init_ext returned rc = 3
2010-11-14 22:30:39.978: [    RACG][3030990112] [29004][3030990112][ora.mxrac05.ons]: clsrons_init failed, stat = 504, crerr = 32
ons is not running ...

2010-11-14 22:30:39.978: [    RACG][3030990112] [29004][3030990112][ora.mxrac05.ons]: clsrcexecut: cmd = /u01/product/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/product/crs/bin/onsctl ping
2010-11-14 22:30:39.978: [    RACG][3030990112] [29004][3030990112][ora.mxrac05.ons]: clsrcexecut: rc = 1, time = 0.210s
2010-11-14 22:30:39.978: [    RACG][3030990112] [29004][3030990112][ora.mxrac05.ons]: end for resource = ora.mxrac05.ons, action = check, status = 1, time = 0.250s
2010-11-14 22:30:40.646: [ CSSCLNT][2823294240]clsssInitNative: connect failed, rc 2
2010-11-14 22:30:40.646: [    RACG][2823294240] [29024][2823294240][ora.mxrac05.ons]: clsrccssgetctx: clsssinit() failed. rc=3
2010-11-14 22:30:40.647: [ COMMCRS][2823294240]Authentication OSD error, op: scls_auth_client_response_set
loc: write
info: len -1 != expected 4
dep: 28
2010-11-14 22:30:40.647: [    RACG][2823294240] [29024][2823294240][ora.mxrac05.ons]: clsrcgetprsrctx: prsr_init_ex2010-11-15 01:43:21.975: [    RACG][807959840] [18202][807959840][ora.mxrac05.ons]: Number of onsconfiguration retrieved, numcfg = 4
onscfg[0]
   {node = mxrac01, port = 6200}
Adding remote host mxrac01:6200
onscfg[1]
   {node = mxrac03, port = 6200}
Adding remote host mxrac03:6200
onscfg[2]
   {node = mxrac04, port = 6200}




節點5上的crs log (節點3記憶體壞了,更換了一根,下面的log是正常報錯).

2010-09-06 02:20:52.588
[crsd(11964)]CRS-1201:CRSD started on node mxrac05.
2010-11-07 02:32:47.411
[cssd(12637)]CRS-1605:CSSD voting file is online: /ocfs_data/crs/votingdisk. Details in /u01/product/crs/log/mxrac05/cssd/ocssd.log.
[cssd(12637)]CRS-1601:CSSD Reconfiguration complete. Active nodes are mxrac01 mxrac03 mxrac04 mxrac05 .
2010-11-07 02:32:48.583
[crsd(12013)]CRS-1012:The OCR service started on node mxrac05.
2010-11-07 02:32:48.656
[evmd(11875)]CRS-1401:EVMD started on node mxrac05.
2010-11-07 02:32:50.115
[crsd(12013)]CRS-1201:CRSD started on node mxrac05.
2010-11-14 17:13:50.019
[cssd(12637)]CRS-1612:node mxrac03 (3) at 50% heartbeat fatal, eviction in 29.020 seconds
2010-11-14 17:14:04.048
[cssd(12637)]CRS-1611:node mxrac03 (3) at 75% heartbeat fatal, eviction in 14.222 seconds
2010-11-14 17:14:05.050
[cssd(12637)]CRS-1611:node mxrac03 (3) at 75% heartbeat fatal, eviction in 13.222 seconds
2010-11-14 17:14:13.066
[cssd(12637)]CRS-1610:node mxrac03 (3) at 90% heartbeat fatal, eviction in 5.202 seconds
2010-11-14 17:14:14.068
[cssd(12637)]CRS-1610:node mxrac03 (3) at 90% heartbeat fatal, eviction in 4.202 seconds
2010-11-14 17:14:15.070
[cssd(12637)]CRS-1610:node mxrac03 (3) at 90% heartbeat fatal, eviction in 3.202 seconds
2010-11-14 17:14:16.072
[cssd(12637)]CRS-1610:node mxrac03 (3) at 90% heartbeat fatal, eviction in 2.202 seconds
2010-11-14 17:14:17.074
[cssd(12637)]CRS-1610:node mxrac03 (3) at 90% heartbeat fatal, eviction in 1.202 seconds
2010-11-14 17:14:18.075
[cssd(12637)]CRS-1610:node mxrac03 (3) at 90% heartbeat fatal, eviction in 0.192 seconds
[cssd(12637)]CRS-1601:CSSD Reconfiguration complete. Active nodes are mxrac01 mxrac04 mxrac05 .
[cssd(12637)]CRS-1601:CSSD Reconfiguration complete. Active nodes are mxrac01 mxrac03 mxrac04 mxrac05 .





節點5上的Oracle alert log . 原來測試過,只要在哪個節點執行expdp動作,都會有修改service_name
的命令在alert log中出現。  

Sun Nov 14 23:09:05 2010
Thread 5 advanced to log sequence 14077 (LGWR switch)
  Current log# 49 seq# 14077 mem# 0: /ocfs_ctrl_redo/mxdell/redo49_a.log
  Current log# 49 seq# 14077 mem# 1: /ocfs_data/mxdell/redo49_b.log
Sun Nov 14 23:09:16 2010
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$S_5_20101114225709.MXDELL','mxdell' SCOPE=MEMORY SID='mxdell5';
Sun Nov 14 23:09:16 2010
ALTER SYSTEM SET service_names='mxdell' SCOPE=MEMORY SID='mxdell5';
Sun Nov 14 23:10:24 2010
Thread 5 cannot allocate new log, sequence 14078
Checkpoint not complete
  Current log# 49 seq# 14077 mem# 0: /ocfs_ctrl_redo/mxdell/redo49_a.log
  Current log# 49 seq# 14077 mem# 1: /ocfs_data/mxdell/redo49_b.log
Sun Nov 14 23:10:27 2010
Thread 5 advanced to log sequence 14078 (LGWR switch)
  Current log# 50 seq# 14078 mem# 0: /ocfs_ctrl_redo/mxdell/redo50_a.log
  Current log# 50 seq# 14078 mem# 1: /ocfs_data/mxdell/redo50_b.log




節點5 上的檢視process :

mxrac05$ps -ef |grep ons
oracle   13281     1  0 Nov07 ?        00:00:00 /u01/product/crs/opmn/bin/ons -d
oracle   13282 13281  0 Nov07 ?        00:00:00 /u01/product/crs/opmn/bin/ons -d
oracle   19743 15970  0 00:33 pts/0    00:00:00 grep ons




節點5上的Oracle alert log .  22:30左右的log
Sun Nov 14 22:17:12 2010
Thread 5 advanced to log sequence 13988 (LGWR switch)
  Current log# 50 seq# 13988 mem# 0: /ocfs_ctrl_redo/mxdell/redo50_a.log
  Current log# 50 seq# 13988 mem# 1: /ocfs_data/mxdell/redo50_b.log
Sun Nov 14 22:27:28 2010
Thread 5 advanced to log sequence 13989 (LGWR switch)
  Current log# 51 seq# 13989 mem# 0: /ocfs_ctrl_redo/mxdell/redo51_a.log
  Current log# 51 seq# 13989 mem# 1: /ocfs_data/mxdell/redo51_b.log
Sun Nov 14 22:28:18 2010
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$S_5_20101114221629.MXDELL','mxdell' SCOPE=MEMORY SID='mxdell5';
Sun Nov 14 22:28:18 2010
ALTER SYSTEM SET service_names='mxdell' SCOPE=MEMORY SID='mxdell5';
Sun Nov 14 22:34:50 2010
Thread 5 advanced to log sequence 13990 (LGWR switch)
  Current log# 52 seq# 13990 mem# 0: /ocfs_ctrl_redo/mxdell/redo52_a.log
  Current log# 52 seq# 13990 mem# 1: /ocfs_data/mxdell/redo52_b.log
Sun Nov 14 22:35:34 2010
The value (30) of MAXTRANS parameter ignored.
Sun Nov 14 22:35:35 2010
ALTER SYSTEM SET service_names='mxdell','SYS$SYS.KUPC$C_5_20101114223535.MXDELL' SCOPE=MEMORY SID='mxdell5';
Sun Nov 14 22:35:35 2010
ALTER SYSTEM SET service_names='SYS$SYS.KUPC$C_5_20101114223535.MXDELL','mxdell','SYS$SYS.KUPC$S_5_20101114223535.MXDELL' SCOPE=MEMORY SID='mxdell5';
kupprdp: master process DM00 started with pid=92, OS id=31287
         to execute - SYS.KUPM$MCP.MAIN('SYS_EXPORT_TABLE_02', 'SYSTEM', 'KUPC$C_5_20101114223535', 'KUPC$S_5_20101114223535', 0);
kupprdp: worker process DW01 started with worker id=1, pid=104, OS id=31300
         to execute - SYS.KUPW$WORKER.MAIN('SYS_EXPORT_TABLE_02', 'SYSTEM');






手工開啟節點5上的ons .
mxrac05$crs_start    ora.mxrac05.ons
Attempting to start `ora.mxrac05.ons` on member `mxrac05`
Start of `ora.mxrac05.ons` on member `mxrac05` succeeded.
mxrac05$


手工啟動ONS後檢視ons log :
2010-11-15 01:43:21.975: [    RACG][807959840] [18202][807959840][ora.mxrac05.ons]: Adding remote host mxrac04:6200
onscfg[3]
   {node = mxrac05, port = 6200}
Adding remote host mxrac05:6200
onsctl: ons is already running
2010-11-15 01:43:21.975: [    RACG][807959840] [18202][807959840][ora.mxrac05.ons]: clsrcexecut: env ORACLE_CONFIG_HOME=/u01/product/crs
2010-11-15 01:43:21.975: [    RACG][807959840] [18202][807959840][ora.mxrac05.ons]: clsrcexecut: cmd = /u01/product/crs/bin/racgeut -e _USR_ORA_DEBUG=0 540 /u01/product/crs/bin/onsctl start
2010-11-15 01:43:21.975: [    RACG][807959840] [18202][807959840][ora.mxrac05.ons]: clsrcexecut: rc = 1, time = 0.210s
2010-11-15 01:43:22.180: [    RACG][807959840] [18202][807959840][ora.mxrac05.ons]: Number of onsconfiguration retrieved, numcfg = 4
onscfg[0]
   {node = mxrac01, port = 6200}
Adding remote host mxrac01:6200
onscfg[1]
   {node = mxrac03, port = 6200}
Adding remote host mxrac03:6200
onscfg[2]
   {node = mxrac04, port = 6200}
2010-11-15 01:43:22.180: [    RACG][807959840] [18202][807959840][ora.mxrac05.ons]: Adding remote host mxrac04:6200
onscfg[3]
   {node = mxrac05, port = 6200}
Adding remote host mxrac05:6200
ons is running ...






節點5上的ONS恢復正常 。


mxrac05$crs_stat -t  
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.mxdell.db  application    ONLINE    ONLINE    mxrac01     
ora....l1.inst application    ONLINE    ONLINE    mxrac01     
ora....l3.inst application    ONLINE    ONLINE    mxrac03     
ora....l4.inst application    ONLINE    ONLINE    mxrac04     
ora....l5.inst application    ONLINE    ONLINE    mxrac05     
ora....01.lsnr application    ONLINE    ONLINE    mxrac01     
ora....c01.gsd application    ONLINE    ONLINE    mxrac01     
ora....c01.ons application    ONLINE    ONLINE    mxrac01     
ora....c01.vip application    ONLINE    ONLINE    mxrac01     
ora....03.lsnr application    ONLINE    ONLINE    mxrac03     
ora....c03.gsd application    ONLINE    ONLINE    mxrac03     
ora....c03.ons application    ONLINE    ONLINE    mxrac03     
ora....c03.vip application    ONLINE    ONLINE    mxrac03     
ora....04.lsnr application    ONLINE    ONLINE    mxrac04     
ora....c04.gsd application    ONLINE    ONLINE    mxrac04     
ora....c04.ons application    ONLINE    ONLINE    mxrac04     
ora....c04.vip application    ONLINE    ONLINE    mxrac04     
ora....05.lsnr application    ONLINE    ONLINE    mxrac05     
ora....c05.gsd application    ONLINE    ONLINE    mxrac05     
ora....c05.ons application    ONLINE    ONLINE    mxrac05     
ora....c05.vip application    ONLINE    ONLINE    mxrac05

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/35489/viewspace-678154/,如需轉載,請註明出處,否則將追究法律責任。

相關文章