最近遇到的RAC的一個例項無法連線的問題

empo007發表於2007-07-04

最近接到一個客戶的電話,說RAC的一個例項down了,關也關不掉,起也起不來。

SQL> shutdown immediate

ORA-01034: ORACLE not available

ORA-27101: shared memory realm does not exist

IBM AIX RISC System/6000 Error: 2: No such file or directory

SQL> startup ORACLE

instance started.

Total System Global Area 2451540256 bytes

Fixed Size 743712 bytes

Variable Size 1375731712 bytes

Database Buffers 1073741824 bytes

Redo Buffers 1323008 bytes

然後不動了。

[@more@]

於是叫使用者把alert.log檔案發過來,發現遇到了ORA-29740錯誤:

Sun Jul 1 14:03:18 2007
IPC Send timeout detected. Sender ospid 860212
Communications reconfiguration: instance 1
Sun Jul 1 14:03:51 2007
Errors in file /u01/app/oracle/admin/jxdc/bdump/jxdc1_lmon_860212.trc:
ORA-29740: evicted by member 1, group incarnation 5
LMON: terminating instance due to error 29740

從IPC Send timeout detected看出可能是心跳不通,檢視產生的TRACE檔案,發現下面的資訊:

C6562718:0000000A 4 0 10401 29 KSXPUNMAP: client 1
C6562719:0000000B 4 0 10401 28 KSXPMAP: client 1 base 0x7000000000b6000 size 0x92f4a000
C6562922:0000000C 4 3 10429 7 MB SO Al: Allocated MBSO 70000004b1e1bb0
C68EB5F6:0000002D 4 3 10401 40 KSXPICON: submited con request 1104b3c58 to tid(2,4,0x65d3e446)
C68EB5F7:0000002E 4 3 10401 42 KSXPCNHENT: linking new con 110350810 to tid(2,4,0x65d3e446) into cache
C68EEF6D:0000002F 4 3 10401 24 KSXPOPEN: client 5 completed connection 1104b3c58 to inst 2 srqh 1104b3d08
C6A25C1B:00000030 4 3 10427 14 Dest Init: ctx 70000004dfea880, incarn 4, inst 1, receiver 0, tickets 1000
C6A25C2C:00000031 4 3 10401 40 KSXPICON: submited con request 11034b818 to tid(2,1,0x65d3e404)
C6A25C2C:00000032 4 3 10401 42 KSXPCNHENT: linking new con 110350510 to tid(2,1,0x65d3e404) into cache
C6A25C32:00000033 4 3 10427 14 Dest Init: ctx 70000004dfeb8d8, incarn 4, inst 1, receiver 1, tickets 1000
C6A25C38:00000034 4 3 10401 40 KSXPICON: submited con request 1103488d8 to tid(2,2,0x65d3e40a)
C6A25C39:00000035 4 3 10401 42 KSXPCNHENT: linking new con 110350210 to tid(2,2,0x65d3e40a) into cache
C6A25C3D:00000036 4 3 10427 14 Dest Init: ctx 70000004dfeb5d0, incarn 4, inst 1, receiver 2, tickets 1000
C6A25C4D:00000037 4 3 10401 40 KSXPICON: submited con request 11034c3e8 to tid(2,3,0x65d3e40f)
C6A25C4E:00000038 4 3 10401 42 KSXPCNHENT: linking new con 11034ff10 to tid(2,3,0x65d3e40f) into cache
C6A25C6A:00000039 4 3 10427 7 Connect : Connect to inst 1, receiver 0
C6A25C6C:0000003A 4 3 10427 7 Connect : Connect to inst 1, receiver 1
C6A25C6C:0000003B 4 3 10427 7 Connect : Connect to inst 1, receiver 2
C6A25EB0:00000045 4 3 10401 24 KSXPOPEN: client 2 completed connection 1103488d8 to inst 2 srqh 110348988
C6A25EB6:00000046 4 3 10401 24 KSXPOPEN: client 2 completed connection 11034b818 to inst 2 srqh 11034b8c8
C6A25EB8:00000047 4 3 10401 24 KSXPOPEN: client 2 completed connection 11034c3e8 to inst 2 srqh 11034c498
5B32B967:015FA7AF 4 3 10401 16 KSXPSRVDT: snd TO inst 2 ptid 4 to ad735935 ts ad735955 krqh 110597218 buf 0x11052c648
5B32B99A:015FA7B0 4 3 10401 20 KSXPCLOSE: cancel send krqh 110597218 mhno 26514 rqno 5801
5B32D24F:015FA7B1 4 3 10401 20 KSXPCLOSE: cancel send krqh 1105af5d8 mhno 26512 rqno 5799
5B32D26B:015FA7B2 4 3 10401 20 KSXPCLOSE: cancel send krqh 110602738 mhno 26513 rqno 5800
5B32D287:015FA7B3 4 3 10401 20 KSXPCLOSE: cancel send krqh 1105eb728 mhno 26515 rqno 5802
5B32D2A3:015FA7B4 4 3 10401 20 KSXPCLOSE: cancel send krqh 1103c2338 mhno 26516 rqno 5803
5B32D2BE:015FA7B5 4 3 10401 20 KSXPCLOSE: cancel send krqh 110493638 mhno 26517 rqno 5804
5B32D2DB:015FA7B6 4 3 10401 20 KSXPCLOSE: cancel send krqh 110377868 mhno 26518 rqno 5805
5B32D2F6:015FA7B7 4 3 10401 20 KSXPCLOSE: cancel send krqh 110616358 mhno 26519 rqno 5806
5B32D312:015FA7B8 4 3 10401 20 KSXPCLOSE: cancel send krqh 1105e9b98 mhno 26520 rqno 5807

基本可以斷定是心跳不通,ping另一個例項的心跳地址,果然不通。使用者重新拔插心跳線後恢復正常。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/85922/viewspace-923853/,如需轉載,請註明出處,否則將追究法律責任。

相關文章