ORACLE RAC 一節點出現Suspending MMON slave action kewrmrfsa_ for 82800 seconds
今天上午,廣州一Oracle 11g rac叢集的節點1出現告警提示:Suspending MMON slave action kewrmrfsa_ for 82800 seconds
出現的現象是,普通使用者登入系統掛起:
[oracle@gzbmi01 ~]$ sqlplus test/test
SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
從作業系統看,系統的cpu和記憶體資源都很空閒,但是oracle有比較多的遠端訪問連線掛起
[oracle@gzbmi01 ~]$ps -ef|grep LOCAL=NO|wc -l
137
[oracle@gzbmi01 ~]$
節點1使用sys使用者能正常登入,但是不能發起任何查詢,節點1發起的查詢全部掛起:
[oracle@gzbmi01 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;
但是rac的另外一個節點則是正常的,從節點2查詢資料庫的等待事件:
SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;
EVENT COUNT(*)
---------------------------------------------------------------- ----------
enq: HW - contention 1
SQL*Net message to client 1
SQL>select sid,username,sql_id,module,machine,program from v$session where event='enq: HW - contention' and sql_id is not null
SID USERNAME SQL_ID MODULE MACHINE PROGRAM
---------- ------------------------------ ------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ------------------------------------------------
430 BMICNEW 4vs91dcv7u1p6 plsqldev.exe Q3RF184E plsqldev.exe
SQL> set long 100000
SQL> select sql_fulltext from v$sqlarea where sql_id='4vs91dcv7u1p6';
SQL_FULLTEXT
--------------------------------------------------------------------------------
insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,t
erminal,action#,returncode, obj$creator,obj$name,auth$privileges,auth$grantee, n
ew$owner,new$name,ses$actions,ses$tid,logoff$pread, logoff$lwrite,logoff$dead,co
mment$text,spare1,spare2, priv$used,clientid,sessioncpu,proxy$sid,user$guid, in
stance#,process#,xid,scn,auditid, sqlbind,sqltext,obj$edition,dbid) values(:1,
:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP), :4,:5,:6,:7,:8, :9,:10,:11,:12,
:13,:14,:15,:16,:17, :18,:19,:20,:21,:22, :23,:24,:25,:26,:27, :28,
:29,:30,:31,:32, :33,:34,:35,:36)
SQL> select sid,spid, osuser, s.program from v$session s,v$process p where
s.paddr=p.addr and sid=&sid; 2
Enter value for sid: 430
old 2: s.paddr=p.addr and sid=&sid
new 2: s.paddr=p.addr and sid=430
SID SPID OSUSER PROGRAM
---------- ------------------------ ------------------------------ ------------------------------------------------
430 33139 Administrator plsqldev.exe
SQL>
查詢到的sql是資料庫自身的基表進行的Insert,於是嘗試查殺該會話對應的系統程式:kill -9 33139
但是,查殺會話程式後問題依然存在。
有人說可能跟歸檔路徑有關係,經檢視資料庫的歸檔路徑是+ARCH,在節點一發起:alter system switch logfile;
能正常執行並且能在歸檔的asm磁碟組arch中查到該檔案:
Fri May 19 10:43:11 2017
Thread 1 advanced to log sequence 21922 (LGWR switch)
Current log# 7 seq# 21922 mem# 0: +DATA/gzbmi/onlinelog/group_7.349.918726525
經過與地市技術人員溝通決定,重新啟動節點1上的資料庫例項:
[oracle@gzbmi01 ~]$ ps -ef|grep ora_
oracle 36354 1 0 Jan13 ? 01:19:43 ora_pmon_gzbmi1
oracle 36356 1 0 Jan13 ? 00:41:35 ora_psp0_gzbmi1
oracle 36358 1 0 Jan13 ? 1-05:44:24 ora_vktm_gzbmi1
oracle 36362 1 0 Jan13 ? 00:08:05 ora_gen0_gzbmi1
oracle 36364 1 0 Jan13 ? 02:44:09 ora_diag_gzbmi1
oracle 36366 1 0 Jan13 ? 00:33:55 ora_dbrm_gzbmi1
oracle 36368 1 0 Jan13 ? 00:17:22 ora_ping_gzbmi1
oracle 36370 1 0 Jan13 ? 00:06:58 ora_acms_gzbmi1
oracle 36372 1 0 Jan13 ? 12:43:37 ora_dia0_gzbmi1
oracle 36374 1 0 Jan13 ? 16:38:02 ora_lmon_gzbmi1
oracle 36376 1 0 Jan13 ? 05:14:26 ora_lmd0_gzbmi1
oracle 36378 1 0 Jan13 ? 13:07:21 ora_lms0_gzbmi1
oracle 36382 1 0 Jan13 ? 13:04:36 ora_lms1_gzbmi1
oracle 36386 1 0 Jan13 ? 13:06:21 ora_lms2_gzbmi1
oracle 36390 1 0 Jan13 ? 00:06:41 ora_rms0_gzbmi1
oracle 36392 1 0 Jan13 ? 00:17:44 ora_lmhb_gzbmi1
oracle 36394 1 0 Jan13 ? 00:57:45 ora_mman_gzbmi1
oracle 36396 1 0 Jan13 ? 01:37:48 ora_dbw0_gzbmi1
oracle 36398 1 0 Jan13 ? 01:39:29 ora_dbw1_gzbmi1
oracle 36400 1 0 Jan13 ? 01:39:10 ora_dbw2_gzbmi1
oracle 36402 1 0 Jan13 ? 01:38:37 ora_dbw3_gzbmi1
oracle 36404 1 0 Jan13 ? 03:27:51 ora_lgwr_gzbmi1
oracle 36406 1 0 Jan13 ? 04:29:21 ora_ckpt_gzbmi1
oracle 36408 1 0 Jan13 ? 01:26:17 ora_smon_gzbmi1
oracle 36410 1 0 Jan13 ? 00:04:03 ora_reco_gzbmi1
oracle 36412 1 0 Jan13 ? 00:09:02 ora_rbal_gzbmi1
oracle 36414 1 0 Jan13 ? 00:08:44 ora_asmb_gzbmi1
oracle 36422 1 0 Jan13 ? 00:03:50 ora_d000_gzbmi1
oracle 36424 1 0 Jan13 ? 00:03:48 ora_s000_gzbmi1
oracle 36426 1 0 Jan13 ? 00:12:38 ora_mark_gzbmi1
oracle 36432 1 0 Jan13 ? 06:48:47 ora_lck0_gzbmi1
oracle 36434 1 0 Jan13 ? 00:15:49 ora_rsmn_gzbmi1
oracle 36537 1 0 Jan13 ? 00:25:46 ora_arc0_gzbmi1
oracle 36539 1 0 Jan13 ? 00:29:15 ora_arc1_gzbmi1
oracle 36541 1 0 Jan13 ? 00:07:45 ora_arc2_gzbmi1
oracle 36543 1 0 Jan13 ? 00:25:44 ora_arc3_gzbmi1
oracle 36546 1 0 Jan13 ? 00:07:06 ora_gtx0_gzbmi1
oracle 36548 1 0 Jan13 ? 00:45:36 ora_rcbg_gzbmi1
oracle 41230 1 0 10:01 ? 00:00:03 ora_p000_gzbmi1
oracle 51435 47466 0 11:06 pts/2 00:00:00 grep ora_
[oracle@gzbmi01 ~]$ ps -ef|grep asm_
grid 22587 1 0 Jan13 ? 00:56:55 asm_pmon_+ASM1
grid 22589 1 0 Jan13 ? 00:40:03 asm_psp0_+ASM1
grid 22591 1 0 Jan13 ? 1-05:47:40 asm_vktm_+ASM1
grid 22595 1 0 Jan13 ? 00:07:35 asm_gen0_+ASM1
grid 22597 1 0 Jan13 ? 02:31:03 asm_diag_+ASM1
grid 22599 1 0 Jan13 ? 00:21:24 asm_ping_+ASM1
grid 22601 1 0 Jan13 ? 19:48:03 asm_dia0_+ASM1
grid 22603 1 0 Jan13 ? 06:51:21 asm_lmon_+ASM1
grid 22605 1 0 Jan13 ? 04:58:18 asm_lmd0_+ASM1
grid 22607 1 0 Jan13 ? 08:30:03 asm_lms0_+ASM1
grid 22611 1 0 Jan13 ? 00:15:48 asm_lmhb_+ASM1
grid 22613 1 0 Jan13 ? 00:08:13 asm_mman_+ASM1
grid 22615 1 0 Jan13 ? 00:09:25 asm_dbw0_+ASM1
grid 22617 1 0 Jan13 ? 00:09:44 asm_lgwr_+ASM1
grid 22619 1 0 Jan13 ? 00:23:57 asm_ckpt_+ASM1
grid 22621 1 0 Jan13 ? 00:06:58 asm_smon_+ASM1
grid 22623 1 0 Jan13 ? 01:34:49 asm_rbal_+ASM1
grid 22625 1 0 Jan13 ? 01:08:11 asm_gmon_+ASM1
grid 22627 1 0 Jan13 ? 00:23:08 asm_mmon_+ASM1
grid 22629 1 0 Jan13 ? 03:20:59 asm_mmnl_+ASM1
grid 22631 1 0 Jan13 ? 00:28:17 asm_lck0_+ASM1
grid 22715 1 0 Jan13 ? 00:06:40 asm_asmb_+ASM1
oracle 51438 47466 0 11:06 pts/2 00:00:00 grep asm_
[grid@gzbmi01 ~]$srvctl stop instance -d gzbmi -n gzbmi01
oracle使用者的告警日誌提示:
Fri May 19 11:01:46 2017
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Stopping background process QMNC
Stopping background process MMNL
License high water mark = 212
Stopping Job queue slave processes, flags = 7
Job queue slave processes stopped
Fri May 19 11:02:40 2017
Stopping background process CJQ0
Fri May 19 11:06:49 2017
Active call for process 41230 user 'oracle' program 'oracle@gzbmi01 (P000)'
SHUTDOWN: waiting for active calls to complete.
Fri May 19 11:07:34 2017
All dispatchers and shared servers shutdown
ALTER DATABASE CLOSE NORMAL /* db agent *//* {1:61387:21743} */
Fri May 19 11:09:59 2017
Reconfiguration started (old inc 4, new inc 6)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Fri May 19 11:09:59 2017
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri May 19 11:09:59 2017
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri May 19 11:09:59 2017
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Fri May 19 11:11:48 2017
License high water mark = 212
USER (ospid: 52209): terminating the instance
Instance terminated by USER, pid = 52209
[oracle@gzbmi01 ~]$ ps -ef|grep ora_
[oracle@gzbmi01 ~]$
重新啟動節點1上的例項:
[grid@gzbmi01 ~]$srvctl start instance -d gzbmi -n gzbmi01
oracle使用者告警日誌提示:
Fri May 19 11:14:12 2017
Adjusting the default value of parameter parallel_max_servers
from 1280 to 470 due to the value of parameter processes (500)
Starting ORACLE instance (normal)
......
Starting background process QMNC
Fri May 19 11:14:53 2017
QMNC started with pid=46, OS id=52912
Completed: ALTER DATABASE OPEN /* db agent *//* {1:61387:21775} */
Fri May 19 11:14:56 2017
Starting background process CJQ0
Fri May 19 11:14:56 2017
CJQ0 started with pid=51, OS id=52941
節點1上的例項重新啟動後恢復正常:
[oracle@gzbmi01 ~]$ sqlplus bmiCnew/bmiCnew
SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>
應用程式啟動後功能測試恢復正常,具體的節點1例項hang的原因不太清楚,地市技術人員說前期做過資料抽取,
但是從資料庫告警日誌和叢集日誌沒有找到資源緊張引發的告警資訊。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29357786/viewspace-2139592/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Oracle RAC新增節點Oracle
- Oracle RAC 新增節點Oracle
- Oracle Rac 刪除節點Oracle
- slave之Seconds_Behind_Master析AST
- [網摘] Oracle RAC新增節點Oracle
- 檢視oracle rac的節點Oracle
- 用xtrabackup工具建立slave節點
- oracle11g RAC新增節點Oracle
- Oracle10g RAC 加節點Oracle
- 【RAC】Oracle10g RAC 節點重配的方式Oracle
- ORACLE 10G增加一個節點rac3Oracle 10g
- oracle 11g rac新增節點前之清除節點資訊Oracle
- 刪除oracle10g rac(rhel4)節點_節點Oracle
- ORACLE RAC環境下刪除節點Oracle
- Oracle RAC某一節點異常,你該怎麼辦?Oracle
- oracle 10g rac,刪除故障節點並新增新節點Oracle 10g
- oracle 10g rac 新增節點與刪除節點步驟Oracle 10g
- oracle 11gR2 rac 兩節點有一個節點down掉問題處理Oracle
- Oracle RAC 一個節點不能自動啟動 怪問題Oracle
- Oracle10203RAC環境刪除節點(一)Oracle
- Oracle10203RAC環境新增新節點(一)Oracle
- RAC 雙節點 轉單節點流程
- oracle11g_RAC新增刪除節點Oracle
- Oracle 11g RAC重新新增節點Oracle
- Oracle 10g RAC刪除、增加節點Oracle 10g
- Oracle RAC 10g叢集節點增加Oracle
- Oracle 10g RAC增加節點例項Oracle 10g
- Oracle9.2.0.4 RAC系統加入新節點Oracle
- 【RAC】Oracle10g rac新增刪除節點命令參考Oracle
- 【RAC】Oracle11g RAC刪除節點相關事項Oracle
- 【RAC】Oracle11g RAC新增新節點相關事項Oracle
- Oracle10g RAC 單獨關閉一個節點(維護)Oracle
- oracle rac 如果硬碟很多,節點也多,一定要做linkOracle硬碟
- MySQL 5.5 Slave節點備份指令碼(mysqldump)MySql指令碼
- Oracle優化案例-新增RAC節點(二十九)Oracle優化
- Oracle RAC節點時間差過大解決方法Oracle
- oracle11GR2 RAC節點crash故障分析Oracle
- 關於Oracle RAC節點間免密碼策略Oracle密碼