ORACLE RAC 一節點出現Suspending MMON slave action kewrmrfsa_ for 82800 seconds

清風艾艾發表於2017-05-19

今天上午,廣州一Oracle 11g rac叢集的節點1出現告警提示:Suspending MMON slave action kewrmrfsa_ for 82800 seconds

出現的現象是,普通使用者登入系統掛起:

[oracle@gzbmi01 ~]$ sqlplus test/test
SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017
Copyright (c) 1982, 2013, Oracle.  All rights reserved.

從作業系統看,系統的cpu和記憶體資源都很空閒,但是oracle有比較多的遠端訪問連線掛起

[oracle@gzbmi01 ~]$ps -ef|grep LOCAL=NO|wc -l

137

[oracle@gzbmi01 ~]$

節點1使用sys使用者能正常登入,但是不能發起任何查詢,節點1發起的查詢全部掛起:

[oracle@gzbmi01 ~]$ sqlplus /  as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;


但是rac的另外一個節點則是正常的,從節點2查詢資料庫的等待事件:

SQL> select event,count(*) from v$session where wait_class<>'Idle' group by event;
EVENT                                                                         COUNT(*)
---------------------------------------------------------------- ----------
enq: HW - contention                                                      1
SQL*Net message to client                                              1

SQL>select sid,username,sql_id,module,machine,program from v$session where event='enq: HW - contention' and sql_id is not null
       SID USERNAME                       SQL_ID        MODULE                              MACHINE                                                   PROGRAM
---------- ------------------------------ ------------- ---------------------------------------------------------------- ---------------------------------------------------------------- ------------------------------------------------
       430 BMICNEW                        4vs91dcv7u1p6 plsqldev.exe                        Q3RF184E                                         plsqldev.exe
SQL> set long 100000
SQL> select sql_fulltext from v$sqlarea where sql_id='4vs91dcv7u1p6';
SQL_FULLTEXT
--------------------------------------------------------------------------------
insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,t
erminal,action#,returncode, obj$creator,obj$name,auth$privileges,auth$grantee, n
ew$owner,new$name,ses$actions,ses$tid,logoff$pread, logoff$lwrite,logoff$dead,co
mment$text,spare1,spare2,  priv$used,clientid,sessioncpu,proxy$sid,user$guid, in
stance#,process#,xid,scn,auditid,  sqlbind,sqltext,obj$edition,dbid)  values(:1,
:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP),     :4,:5,:6,:7,:8,     :9,:10,:11,:12,
 :13,:14,:15,:16,:17,     :18,:19,:20,:21,:22,     :23,:24,:25,:26,:27,     :28,
:29,:30,:31,:32,     :33,:34,:35,:36)


SQL> select sid,spid, osuser, s.program from v$session s,v$process p where
s.paddr=p.addr and sid=&sid;  2
Enter value for sid: 430
old   2: s.paddr=p.addr and sid=&sid
new   2: s.paddr=p.addr and sid=430
       SID SPID                     OSUSER                         PROGRAM
---------- ------------------------ ------------------------------ ------------------------------------------------
       430 33139                    Administrator                  plsqldev.exe
SQL>

    查詢到的sql是資料庫自身的基表進行的Insert,於是嘗試查殺該會話對應的系統程式:kill -9 33139

但是,查殺會話程式後問題依然存在。

    有人說可能跟歸檔路徑有關係,經檢視資料庫的歸檔路徑是+ARCH,在節點一發起:alter system switch logfile;

能正常執行並且能在歸檔的asm磁碟組arch中查到該檔案:

Fri May 19 10:43:11 2017
Thread 1 advanced to log sequence 21922 (LGWR switch)
  Current log# 7 seq# 21922 mem# 0: +DATA/gzbmi/onlinelog/group_7.349.918726525

    經過與地市技術人員溝通決定,重新啟動節點1上的資料庫例項:

[oracle@gzbmi01 ~]$ ps -ef|grep ora_
oracle   36354     1  0 Jan13 ?        01:19:43 ora_pmon_gzbmi1
oracle   36356     1  0 Jan13 ?        00:41:35 ora_psp0_gzbmi1
oracle   36358     1  0 Jan13 ?        1-05:44:24 ora_vktm_gzbmi1
oracle   36362     1  0 Jan13 ?        00:08:05 ora_gen0_gzbmi1
oracle   36364     1  0 Jan13 ?        02:44:09 ora_diag_gzbmi1
oracle   36366     1  0 Jan13 ?        00:33:55 ora_dbrm_gzbmi1
oracle   36368     1  0 Jan13 ?        00:17:22 ora_ping_gzbmi1
oracle   36370     1  0 Jan13 ?        00:06:58 ora_acms_gzbmi1
oracle   36372     1  0 Jan13 ?        12:43:37 ora_dia0_gzbmi1
oracle   36374     1  0 Jan13 ?        16:38:02 ora_lmon_gzbmi1
oracle   36376     1  0 Jan13 ?        05:14:26 ora_lmd0_gzbmi1
oracle   36378     1  0 Jan13 ?        13:07:21 ora_lms0_gzbmi1
oracle   36382     1  0 Jan13 ?        13:04:36 ora_lms1_gzbmi1
oracle   36386     1  0 Jan13 ?        13:06:21 ora_lms2_gzbmi1
oracle   36390     1  0 Jan13 ?        00:06:41 ora_rms0_gzbmi1
oracle   36392     1  0 Jan13 ?        00:17:44 ora_lmhb_gzbmi1
oracle   36394     1  0 Jan13 ?        00:57:45 ora_mman_gzbmi1
oracle   36396     1  0 Jan13 ?        01:37:48 ora_dbw0_gzbmi1
oracle   36398     1  0 Jan13 ?        01:39:29 ora_dbw1_gzbmi1
oracle   36400     1  0 Jan13 ?        01:39:10 ora_dbw2_gzbmi1
oracle   36402     1  0 Jan13 ?        01:38:37 ora_dbw3_gzbmi1
oracle   36404     1  0 Jan13 ?        03:27:51 ora_lgwr_gzbmi1
oracle   36406     1  0 Jan13 ?        04:29:21 ora_ckpt_gzbmi1
oracle   36408     1  0 Jan13 ?        01:26:17 ora_smon_gzbmi1
oracle   36410     1  0 Jan13 ?        00:04:03 ora_reco_gzbmi1
oracle   36412     1  0 Jan13 ?        00:09:02 ora_rbal_gzbmi1
oracle   36414     1  0 Jan13 ?        00:08:44 ora_asmb_gzbmi1
oracle   36422     1  0 Jan13 ?        00:03:50 ora_d000_gzbmi1
oracle   36424     1  0 Jan13 ?        00:03:48 ora_s000_gzbmi1
oracle   36426     1  0 Jan13 ?        00:12:38 ora_mark_gzbmi1
oracle   36432     1  0 Jan13 ?        06:48:47 ora_lck0_gzbmi1
oracle   36434     1  0 Jan13 ?        00:15:49 ora_rsmn_gzbmi1
oracle   36537     1  0 Jan13 ?        00:25:46 ora_arc0_gzbmi1
oracle   36539     1  0 Jan13 ?        00:29:15 ora_arc1_gzbmi1
oracle   36541     1  0 Jan13 ?        00:07:45 ora_arc2_gzbmi1
oracle   36543     1  0 Jan13 ?        00:25:44 ora_arc3_gzbmi1
oracle   36546     1  0 Jan13 ?        00:07:06 ora_gtx0_gzbmi1
oracle   36548     1  0 Jan13 ?        00:45:36 ora_rcbg_gzbmi1
oracle   41230     1  0 10:01 ?        00:00:03 ora_p000_gzbmi1
oracle   51435 47466  0 11:06 pts/2    00:00:00 grep ora_
[oracle@gzbmi01 ~]$ ps -ef|grep asm_
grid     22587     1  0 Jan13 ?        00:56:55 asm_pmon_+ASM1
grid     22589     1  0 Jan13 ?        00:40:03 asm_psp0_+ASM1
grid     22591     1  0 Jan13 ?        1-05:47:40 asm_vktm_+ASM1
grid     22595     1  0 Jan13 ?        00:07:35 asm_gen0_+ASM1
grid     22597     1  0 Jan13 ?        02:31:03 asm_diag_+ASM1
grid     22599     1  0 Jan13 ?        00:21:24 asm_ping_+ASM1
grid     22601     1  0 Jan13 ?        19:48:03 asm_dia0_+ASM1
grid     22603     1  0 Jan13 ?        06:51:21 asm_lmon_+ASM1
grid     22605     1  0 Jan13 ?        04:58:18 asm_lmd0_+ASM1
grid     22607     1  0 Jan13 ?        08:30:03 asm_lms0_+ASM1
grid     22611     1  0 Jan13 ?        00:15:48 asm_lmhb_+ASM1
grid     22613     1  0 Jan13 ?        00:08:13 asm_mman_+ASM1
grid     22615     1  0 Jan13 ?        00:09:25 asm_dbw0_+ASM1
grid     22617     1  0 Jan13 ?        00:09:44 asm_lgwr_+ASM1
grid     22619     1  0 Jan13 ?        00:23:57 asm_ckpt_+ASM1
grid     22621     1  0 Jan13 ?        00:06:58 asm_smon_+ASM1
grid     22623     1  0 Jan13 ?        01:34:49 asm_rbal_+ASM1
grid     22625     1  0 Jan13 ?        01:08:11 asm_gmon_+ASM1
grid     22627     1  0 Jan13 ?        00:23:08 asm_mmon_+ASM1
grid     22629     1  0 Jan13 ?        03:20:59 asm_mmnl_+ASM1
grid     22631     1  0 Jan13 ?        00:28:17 asm_lck0_+ASM1
grid     22715     1  0 Jan13 ?        00:06:40 asm_asmb_+ASM1
oracle   51438 47466  0 11:06 pts/2    00:00:00 grep asm_

[grid@gzbmi01 ~]$srvctl stop instance -d gzbmi -n gzbmi01

oracle使用者的告警日誌提示:

Fri May 19 11:01:46 2017
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Stopping background process QMNC
Stopping background process MMNL
License high water mark = 212
Stopping Job queue slave processes, flags = 7
Job queue slave processes stopped
Fri May 19 11:02:40 2017
Stopping background process CJQ0
Fri May 19 11:06:49 2017
Active call for process 41230 user 'oracle' program 'oracle@gzbmi01 (P000)'
SHUTDOWN: waiting for active calls to complete.
Fri May 19 11:07:34 2017
All dispatchers and shared servers shutdown
ALTER DATABASE CLOSE NORMAL /* db agent *//* {1:61387:21743} */
Fri May 19 11:09:59 2017
Reconfiguration started (old inc 4, new inc 6)
List of instances:
 1 2 (myinst: 1) 
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Fri May 19 11:09:59 2017
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri May 19 11:09:59 2017
 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Fri May 19 11:09:59 2017
 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info 
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
Fri May 19 11:11:48 2017
License high water mark = 212
USER (ospid: 52209): terminating the instance
Instance terminated by USER, pid = 52209

[oracle@gzbmi01 ~]$ ps -ef|grep ora_

[oracle@gzbmi01 ~]$

重新啟動節點1上的例項:

[grid@gzbmi01 ~]$srvctl start instance -d gzbmi -n gzbmi01

oracle使用者告警日誌提示:

Fri May 19 11:14:12 2017
Adjusting the default value of parameter parallel_max_servers
from 1280 to 470 due to the value of parameter processes (500)
Starting ORACLE instance (normal)

......

Starting background process QMNC
Fri May 19 11:14:53 2017
QMNC started with pid=46, OS id=52912 
Completed: ALTER DATABASE OPEN /* db agent *//* {1:61387:21775} */
Fri May 19 11:14:56 2017
Starting background process CJQ0
Fri May 19 11:14:56 2017
CJQ0 started with pid=51, OS id=52941

節點1上的例項重新啟動後恢復正常:

[oracle@gzbmi01 ~]$ sqlplus bmiCnew/bmiCnew

SQL*Plus: Release 11.2.0.4.0 Production on Fri May 19 11:18:33 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>

應用程式啟動後功能測試恢復正常,具體的節點1例項hang的原因不太清楚,地市技術人員說前期做過資料抽取,

但是從資料庫告警日誌和叢集日誌沒有找到資源緊張引發的告警資訊。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29357786/viewspace-2139592/,如需轉載,請註明出處,否則將追究法律責任。

相關文章