紅色警報 ORACLE RAC 11.2.0.4 FOR SOLARIS 10 ASM 和DB因叢集心跳丟失重啟
ORACLE RAC 11.2.0.4 FOR SOLARIS 10 ASM 和DB因叢集心跳丟失重啟。該問題是BUG 10194190 18740837
導致的,修復該問題需要打patch 25142535。有一客戶的solaris oracle 11.2.0.4 rac所有節點的DB已經打上補丁,但是
主機執行248天后,跑在上邊的Oracle 11.2.0.4依然因叢集心跳丟失而重啟。根據ORACLE MOS官方回覆,修復該BUG
的一個補丁包 patch 18740837是需要同時在GI軟體上應用。
下邊是相關客戶案例的問題分析及解決處理總結。
1、問題節點DB告警日誌報錯提示
2、問題節點ASM告警日誌報錯提示
3、 +ASM2_lmhb_4609_i78497.trc檔案內容
4、OCSSD日誌提示心跳超時
5、+ASM2_lmhb_4609_i78497.trc檔案區域性資訊
===[ Session State Object ]===
----------------------------------------
SO: 0x3ffdb38d8, type: 4, owner: 0x400b0c258, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x400b0c258, name=session, file=ksu.h LINE:12729 ID:, pg=0
(session) sid: 145 ser: 1 trans: 0x0, creator: 0x400b0c258
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x409) -/-/INC
DID: , short-term DID:
txn branch: 0x0
edition#: 0 oct: 0, prv: 0, sql: 0x0, psql: 0x0, user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476951 seq_num=11521 snap_id=1
wait times: snap=2 min 46 sec, exc=2 min 46 sec, total=2 min 46 sec
wait times: max=0.100000 sec, heur=2 min 46 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x5a8
Wait State:
fixed_waits=0 flags=0x22 boundary=0x0/-1
Session Wait History:
elapsed time of 0.000015 sec since current wait
0: waited for 'CGS wait for IPC msg'
=0x0, =0x0, =0x0
wait_id=432476950 seq_num=11520 snap_id=1
wait times: snap=0.000027 sec, exc=0.000027 sec, total=0.000027 sec
wait times: max=0.000000 sec
wait counts: calls=1 os=1
occurred after 0.000138 sec of elapsed time
1: waited for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476949 seq_num=11519 snap_id=1
wait times: snap=0.102094 sec, exc=0.102094 sec, total=0.102094 sec
wait times: max=0.100000 sec
wait counts: calls=1 os=1
occurred after 0.000015 sec of elapsed time
2: waited for 'CGS wait for IPC msg'
=0x0, =0x0, =0x0
wait_id=432476948 seq_num=11518 snap_id=1
wait times: snap=0.000022 sec, exc=0.000022 sec, total=0.000022 sec
wait times: max=0.000000 sec
wait counts: calls=1 os=1
occurred after 0.000133 sec of elapsed time
3: waited for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476947 seq_num=11517 snap_id=1
wait times: snap=0.102074 sec, exc=0.102074 sec, total=0.102074 sec
wait times: max=0.100000 sec
wait counts: calls=1 os=1
occurred after 0.000013 sec of elapsed time
4: waited for 'CGS wait for IPC msg'
=0x0, =0x0, =0x0
wait_id=432476946 seq_num=11516 snap_id=1
wait times: snap=0.000023 sec, exc=0.000023 sec, total=0.000023 sec
wait times: max=0.000000 sec
wait counts: calls=1 os=1
occurred after 0.000119 sec of elapsed time
5: waited for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476945 seq_num=11515 snap_id=1
wait times: snap=0.103086 sec, exc=0.103086 sec, total=0.103086 sec
wait times: max=0.100000 sec
wait counts: calls=1 os=1
occurred after 0.000012 sec of elapsed time
6: waited for 'CGS wait for IPC msg'
=0x0, =0x0, =0x0
wait_id=432476944 seq_num=11514 snap_id=1
wait times: snap=0.000025 sec, exc=0.000025 sec, total=0.000025 sec
wait times: max=0.000000 sec
wait counts: calls=1 os=1
occurred after 0.000152 sec of elapsed time
7: waited for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476943 seq_num=11513 snap_id=1
wait times: snap=0.103090 sec, exc=0.103090 sec, total=0.103090 sec
wait times: max=0.100000 sec
wait counts: calls=1 os=1
occurred after 0.000016 sec of elapsed time
8: waited for 'CGS wait for IPC msg'
=0x0, =0x0, =0x0
wait_id=432476942 seq_num=11512 snap_id=1
wait times: snap=0.000025 sec, exc=0.000025 sec, total=0.000025 sec
wait times: max=0.000000 sec
wait counts: calls=1 os=1
occurred after 0.000272 sec of elapsed time
9: waited for 'rdbms ipc message'
timeout=0xa, =0x0, =0x0
wait_id=432476941 seq_num=11511 snap_id=1
wait times: snap=0.102084 sec, exc=0.102084 sec, total=0.102084 sec
wait times: max=0.100000 sec
wait counts: calls=1 os=1
occurred after 0.000013 sec of elapsed time
Sampled Session History of session 145 serial 1
---------------------------------------------------
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).
The history is displayed in reverse chronological order.
sample interval: 1 sec, max history 120 sec
---------------------------------------------------
[121 samples, 21:08:20 - 21:10:20]
idle wait at each sample
temporary object counter: 0
----------------------------------------
Virtual Thread:
kgskvt: 3fc434ce8, sess: 3ffdb38d8 sid: 145 ser: 1
vc: 0, proc: 400b0c258, id: 145
consumer group cur: (upd? 0), mapped: _ORACLE_BACKGROUND_GROUP_, orig:
vt_state: 0x100, vt_flags: 0x4030, blkrun: 0, numa: 1
inwait: 0, short wait event: 0 posted_run: 0
location where insched last set: kgskthrrun
location where insched last cleared: kgskthrrun1
location where inwait last set: NULL
location where inwait last cleared: NULL
is_assigned: 0, in_sched: 0 (0)
qcls: 0, qlink: FALSE
vt_active: 0 (pending: 0)
vt_pq_active: 0, dop: 0
used quanta (usecs):
stmt: 0, accum: 0, mapped: 0, tot: 0
cpu start time: 0
idle time: 0, active time: 0 (cg: 0)
cpu yields:
stmt: 0, accum: 0, mapped: 0, tot: 0
cpu waits:
stmt: 0, accum: 0, mapped: 0, tot: 0
cpu wait time (usecs):
stmt: 0, accum: 0, mapped: 0, tot: 0
io waits:
stmt: 0, accum: 0, mapped: 0, tot: 0
io wait time:
stmt: 0, accum: 0, mapped: 0, tot: 0
ASL queued time outs: 0, time: 0 (cur 0, cg 0)
PQQ queued time outs: 0, time: 0 (cur 0, cg 0)
Queue timeout violation: 0
calls aborted: 0, num est exec limit hit: 0
undo current: 0k max: 0k
I/O credits acquired:small=0 large=0
I/O credits waiting for:small=0 large=0
KTU Session Commit Cache Dump for IDLs:
KTU Session Commit Cache Dump for Non-IDLs:
----------------------------------------
KKS-UOL used : 0 locks(used=0, free=0)
KGX Atomic Operation Log 3eae04a58
Mutex 0(0, 0) idn 0 oper NONE(0)
FSO mutex uid 145 efd 0 whr 0 slp 0
KGX Atomic Operation Log 3eae04aa8
Mutex 0(0, 0) idn 0 oper NONE(0)
FSO mutex uid 145 efd 0 whr 0 slp 0
KGX Atomic Operation Log 3eae04af8
Mutex 0(0, 0) idn 0 oper NONE(0)
FSO mutex uid 145 efd 0 whr 0 slp 0
KGX Atomic Operation Log 3eae04b48
Mutex 0(0, 0) idn 0 oper NONE(0)
FSO mutex uid 145 efd 0 whr 0 slp 0
----------------------------------------
KGL-UOL SO Cache(total=0, free=0)
KGX Atomic Operation Log 3eae047a0
Mutex 0(0, 0) idn 0 oper NONE(0)
Library Cache uid 145 efd 0 whr 0 slp 0
oper=0 pt1=0 pt2=0 pt3=0
pt4=0 pt5=0 ub4=0
KGX Atomic Operation Log 3eae047f8
Mutex 0(0, 0) idn 0 oper NONE(0)
Library Cache uid 145 efd 0 whr 0 slp 0
oper=0 pt1=0 pt2=0 pt3=0
pt4=0 pt5=0 ub4=0
KGX Atomic Operation Log 3eae04850
Mutex 0(0, 0) idn 0 oper NONE(0)
Library Cache uid 145 efd 0 whr 0 slp 0
oper=0 pt1=0 pt2=0 pt3=0
pt4=0 pt5=0 ub4=0
KGX Atomic Operation Log 3eae048a8
Mutex 0(0, 0) idn 0 oper NONE(0)
Library Cache uid 145 efd 0 whr 0 slp 0
oper=0 pt1=0 pt2=0 pt3=0
pt4=0 pt5=0 ub4=0
KGX Atomic Operation Log 3eae04900
Mutex 0(0, 0) idn 0 oper NONE(0)
Library Cache uid 145 efd 0 whr 0 slp 0
oper=0 pt1=0 pt2=0 pt3=0
pt4=0 pt5=0 ub4=0
KGX Atomic Operation Log 3eae04958
Mutex 0(0, 0) idn 0 oper NONE(0)
Library Cache uid 145 efd 0 whr 0 slp 0
oper=0 pt1=0 pt2=0 pt3=0
pt4=0 pt5=0 ub4=0
KGX Atomic Operation Log 3eae049b0
Mutex 0(0, 0) idn 0 oper NONE(0)
Library Cache uid 145 efd 0 whr 0 slp 0
oper=0 pt1=0 pt2=0 pt3=0
pt4=0 pt5=0 ub4=0
----------------------------------------
SO: 0x3f464e838, type: 57, owner: 0x3ffdb38d8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x400b0c258, name=dummy, file=ktccts.h LINE:415 ID:, pg=0
(dummy) nxc=0, nlb=0
===[ Callstack ]===
*** 2019-05-21 21:10:20.275
Process diagnostic dump for oracle@sm0ora10 (LMON), OS id=4604,
pid: 9, proc_ser: 1, sid: 145, sess_ser: 1
-------------------------------------------------------------------------------
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 21:10:19 ]
NOTE: scheduling delay has not been sampled for 0.505539 secs 0.000000 secs from [ 21:10:15 - 21:10:20 ], 5 sec avg
0.000000 secs from [ 21:09:20 - 21:10:20 ], 1 min avg
0.000000 secs from [ 21:05:21 - 21:10:20 ], 5 min avg
*** 2019-05-21 21:10:21.374
loadavg : 20.48 16.00 13.13
swap info: free_mem = 194124.91M rsv = 258509.04M
alloc = 239225.71M avail = 235374.84M swap_free = 254658.16M
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 O grid 4604 4596 1 79 20 ? 168184 Sep 15 ? 781:33 asm_lmon_+ASM2
Short stack dump:
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3592<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<- thr_sigsetmask()+512<-sslssalck()+152<-sskgxp_alarm_set()+44<-skgxp_twait()+376<-sslsshandler()+712<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-__pollsys()+8<-_pollsys()+232<-_poll()+140<-ssskgxp_poll()+36<-sskgxp_selectex()+224 <-skgxpiwait()+6456<-skgxpwaiti()+5044<-skgxpwait()+984<-ksxpwait()+3360<-ksliwat()+10676<-kslwait()+240<-ksarcv()+212<-kjclwmg()+36<-kjfcln()+4284<-ksbrdp()+1720<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+512<-ssthrdmain()+324<-main()+316<-_start()+380
*** 2019-05-21 21:10:22.421
==============================
LMON (ospid: 4604) has not moved for 176 sec (1558444222.1558444046)
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+1320<-kjzdicrshnfy()+388<-ksuitm()+1084<-kjgcr_KillInstance()+148<-kjgcr_Main()+7564<-ksbrdp()+1720<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+512<-ssthrdmain()+324<-main()+316<-_start()+380
----- End of Abridged Call Stack Trace -----
6、主機啟動時間
root@0ora10# hostname
ora10
root@ora10# uname -a
SunOS ora10 5.11 11.3 sun4v sparc sun4v
root@# uptime
11:28pm up 248 day(s) , 15:48, 8 users, load average: 6.21, 7.27, 8.09
6、根據ORA-15046和ORA-29770查詢和TRC 堆疊呼叫資訊提示:
ksedsts()+380<-ksdxfstk()+52<-ksdxcb()+3592<-sspuser()+140<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-_syscall6()+32<- thr_sigsetmask()+512<-sslssalck()+152<-sskgxp_alarm_set()+44<-skgxp_twait()+376<-sslsshandler()+712<-__sighndlr()+12<-call_user_handler()+868<-sigacthandler()+92<-__pollsys()+8<-_pollsys()+232<-_poll()+140<-ssskgxp_poll()+36<-sskgxp_selectex()+224 <-skgxpiwait()+6456<-skgxpwaiti()+5044<-skgxpwait()+984<-ksxpwait()+3360<-ksliwat()+10676<-kslwait()+240<-ksarcv()+212<-kjclwmg()+36<-kjfcln()+4284<-ksbrdp()+1720<-opirip()+1680<-opidrv()+748<-sou2o()+88<-opimai_real()+512<-ssthrdmain()+324<-main()+316<-_start()+380 ORACLE MOS官網,發現完全匹配上: (Doc ID 2159643.1)
Solaris: Process spins/ASM and DB Crash if RAC Instance Is Up For > 248 Days by LMHB with ORA-29770
7、後續檢視問題系統補丁安裝情況,發現rac所有節點的DB已經應用上 Doc ID 2159643.1 提到的補丁 25142535
oracle@ora10$opatch lsinventory|grep 25142535
Patch 25142535 : applied on Tue Jul 31 16:26:30 GMT+08:00 2018
oracle@ora10$
oracle@ora11$opatch lsinventory|grep 25142535
Patch 25142535 : applied on Tue Jul 31 16:26:30 GMT+08:00 2018
oracle@ora11$
但是GRID GI沒有應用補丁 25142535
grid@ora10$ opatch lsinventory|grep 25142535
grid@ora10$
grid@ora11$ opatch lsinventory|grep 25142535
grid@ora11$
8、就此問題諮詢MOS,ORACLE MOS給出的官方回覆也沒明確提示補丁 25142535確實是否需要在GI上應用
9、補丁 25142535是補丁 18740837和 10194190的合集,其中 18740837小補丁的readme中有提示RAC環境需要在GI應用
10、問題分析結論:
由目前的實際問題情況及8和9的資訊彙總,可以確定 補丁 25142535需要在oracle 11.2.0.4 rac for solaris的所有節點的
GI叢集軟體上應用。可以肯定的是:當前的 oracle 11.2.0.4 rac for solaris asm and db crash and restart確實是rac在
solaris上執行248天導致的,在叢集所有節點DB軟體上打補丁25142535不能解決,官方也沒有明確說在GI軟體上同時應
用補丁25142535後到底能否解決:
Solaris: Process spins/ASM and DB Crash if RAC Instance Is Up For > 248 Days by LMHB with ORA-29770
如果有類似問題的客戶看到博文,如果有通過GI打補丁25142535解決此問題,或者通過其他途徑解決此問題歡迎:
評論留言,感謝!
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29357786/viewspace-2645319/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- ORACLE RAC 11.2.0.4 ASM加盤導致叢集重啟之ASM sga設定過小OracleASM
- Solaris下Oracle RAC 11.2.0.4 安裝方法Oracle
- oracle 11.2.0.4 rac叢集等待事件enq: TM - contentionOracle事件ENQ
- ORACLE RAC 11.2.0.4 FOR RHEL6叢集無法啟動的處理Oracle
- Oracle Linux 6.7中 Oracle 11.2.0.4 RAC叢集CRS異常處理OracleLinux
- RAC安裝【AIX 7 + 11.2.0.4 + ASM】AIASM
- 私有IP丟失造成Oracle 12C RAC叢集節點不能啟動Oracle
- Solaris叢集節點重啟
- L2-013 紅色警報【並查集】並查集
- 【ASM】Oracle RAC css啟動報錯"Duplicate voting file found"ASMOracleCSS
- Oracle叢集(RAC)時間同步(ntp和CTSS)Oracle
- 記一次oracle 19c RAC叢集重啟單節點DB啟動異常(二)Oracle
- 【RAC】Oracle RAC如何修改心跳網路Oracle
- L2-013 紅色警報 (25 分)(並查集)並查集
- 並查集 - 紅色警報 - 天梯賽 L2-013並查集
- ORACLE11.2.0.4 RAC+ ASM安裝方法 (作業系統CENTOS7.6)OracleASM作業系統CentOS
- ORACLE 11.2.0.4 for solaris更換硬體後主機時間改變導致一節點叢集服務無法啟動Oracle
- Oracle 叢集心跳及其引數misscount/disktimeout/reboottimeOracleboot
- L2-013 紅色警報
- oracle rac 11.2.0.3 升級到11.2.0.4Oracle
- 【ASM】RAC19C因引數設定不當,asm無法啟動ASM
- oracle RAC 診斷叢集狀態命令Oracle
- 紅色警戒開啟黑屏如何解決 window10開啟紅警黑屏怎麼辦
- Oracle:Oracle RAC 11.2.0.4 升級為 19cOracle
- 重建共享(db或asm)密碼檔案 in Oracle 19c RAC-20220209ASM密碼Oracle
- 【DB寶44】Oracle rac叢集中的IP型別簡介Oracle型別
- 【NETWORK】Oracle RAC 心跳地址配置說明Oracle
- Oracle 11.2.0.4 rac for aix acfs異常環境的克隆環境ASM磁碟組掛載緩慢OracleAIASM
- Oracle RAC啟動因CTSS導致的異常Oracle
- Oracle RAC日常運維-NetworkManager導致叢集故障Oracle運維
- 【ASK_ORACLE】Relink RAC叢集詳細步驟Oracle
- oracle RAC+DG 擴容ASM和表空間(Linux)OracleASMLinux
- Oracle 11gR2 RAC 叢集服務啟動與關閉總結Oracle
- Solaris 10下遷移10G RAC (六)
- Solaris 10下遷移10G RAC (八)
- Solaris 10下遷移10G RAC (四)
- Solaris 10下遷移10G RAC (二)
- Solaris 10下遷移10G RAC (七)