ORA-00494: enqueue [CF] held for too long (more than 900 seconds)

parknkjun發表於2016-11-16
近日某客戶資料庫凌晨5點左右crash,報如下錯誤:
Fri Nov 11 03:17:51 2016
Thread 1 advanced to log sequence 583682 (LGWR switch)
  Current log# 16 seq# 583682 mem# 0: /jzh/jzh_u38/jzh/redo16a.log
Fri Nov 11 05:23:03 2016
ALTER SYSTEM ARCHIVE LOG
Fri Nov 11 05:23:03 2016
Thread 1 cannot allocate new log, sequence 583683
Private strand flush not complete
  Current log# 16 seq# 583682 mem# 0: /jzh/jzh_u38/jzh/redo16a.log
Fri Nov 11 05:23:03 2016
Thread 1 advanced to log sequence 583683 (LGWR switch)
  Current log# 6 seq# 583683 mem# 0: /jzh/jzh_u38/jzh/redo06a.log
Fri Nov 11 05:24:13 2016
ALTER SYSTEM ARCHIVE LOG
Fri Nov 11 05:26:58 2016
Thread 1 advanced to log sequence 583684 (LGWR switch)
  Current log# 7 seq# 583684 mem# 0: /jzh/jzh_u38/jzh/redo07a.log
Fri Nov 11 05:27:27 2016
Starting control autobackup
Fri Nov 11 05:27:54 2016
Control autobackup written to SBT_TAPE device
        comment 'API Version 2.0,MMS Version 5.0.0.0',
        media '@aaaag'
        handle 'c-1734772114-20161111-03'
Fri Nov 11 05:48:24 2016
Errors in file /u01/oracle/admin/jzh/bdump/jzh_arc1_1940.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 1921'
Fri Nov 11 05:48:25 2016
System State dumped to trace file /u01/oracle/admin/jzh/bdump/jzh_arc1_1940.trc
Fri Nov 11 05:48:35 2016
Killing enqueue blocker (pid=1921) on resource CF-00000000-00000000
 by killing session 824.1
Killing enqueue blocker (pid=1921) on resource CF-00000000-00000000
 by terminating the process
ARC1: terminating instance due to error 2103
Termination issued to instance processes. Waiting for the processes to exit
Fri Nov 11 05:48:46 2016
Instance termination failed to kill one or more processes
Fri Nov 11 05:55:48 2016
Instance terminated by ARC1, pid = 1940
Fri Nov 11 07:05:18 2016
Starting ORACLE instance (normal)
Fri Nov 11 07:05:18 2016
Specified value of sga_max_size is too small, bumping to 12247367680
說明:1、pid 1921/sid 824程式持有enqueue [CF]超過900s,R持有資源CF-00000000-00000000
          2、pid 1940 ARC1程式kill pid 1921失敗,並關閉例項。
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining Scoring Engine
and Real Application Testing options
ORACLE_HOME = /u01/oracle/product/db10gr2
System name:    HP-UX
Node name:      JZHDB2
Release:        B.11.31
Version:        U
Machine:        9000/800
Instance name: jzh
Redo thread mounted by this instance: 1
Oracle process number: 9
Unix process pid: 1940, image: oracle@JZHDB2 (ARC1)
*** SERVICE NAME:(SYS$BACKGROUND) 2016-11-11 05:48:24.968
*** SESSION ID:(1651.1) 2016-11-11 05:48:24.968
*** 2016-11-11 05:48:24.968
Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)
Possible local blocker ospid=1921 sid=824 sser=1 time_held=910 secs (ges mode req=6 held=4)
DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER
  possible owner[10.1921] on resource CF-00000000-00000000
Dumping process info of pid[10.1921] requested by pid[9.1940]
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 1921'
-------------------------------------------------------------------------------
ENQUEUE [CF] HELD FOR TOO LONG
enqueue holder: 'inst 1, osid 1921'
Process 'inst 1, osid 1921' is holding an enqueue for maximum allowed time.
The process will be terminated. 
Oracle Support Services triaging information: to find the root-cause, look
at the call stack of process 'inst 1, osid 1921' located below. Ask the
developer that owns the first NON-service layer in the stack to investigate.
Common service layers are enqueues (ksq), latches (ksl), library cache
pins and locks (kgl), and row cache locks (kqr). 
Dumping process 10.1921 info:
*** 2016-11-11 05:48:24.982
Dumping diagnostic information for ospid 1921:
OS pid = 1921
loadavg : 0.17 0.18 0.22
Swapinfo : 
        Avail = 111554.56Mb Used = 37997.61Mb
        Swap free = 73556.95Mb Kernel rsvd = 10888.46Mb
        Free Mem  = 22945.59Mb 
  F S      UID   PID  PPID  C PRI NI             ADDR   SZ            WCHAN    STIME TTY       TIME COMD
11001 R   oracle  1921     1  0 152 20        41489b600 29229                -  Apr 15  ?        8080:31 ora_ckpt_jzh
11001 S   oracle 19215     1  0 154 20        6dc245080 28763        3821d9b68  Nov  6  ?         0:13 oraclejzh (LOCAL=NO)
11001 S   oracle 19213     1  0 154 20        e9d85dd00 28827         dcd59268  Nov  6  ?         0:15 oraclejzh (LOCAL=NO)
11001 S   oracle 19211     1  0 154 20        29d10aa80 28826        cb19bce68  Nov  6  ?         0:15 oraclejzh (LOCAL=NO)
12401 S   patrol 15982     1  0 153 20        48f97f080 7320        a5c32a0a0 05:39:35 ?         0:00 /opt/langtools/bin/gdb64 -quie
t /u01/oracle/product/db10gr2/bin/oracle 1921
*** 2016-11-11 05:48:25.581
----------------------------------------
SO: c000000301002038, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
  (process) Oracle pid=10, calls cur/top: c0000003015c8548/c0000003015c8548, flag: (6) SYSTEM
            int error: 0, call error: 0, sess error: 0, txn error 0
  (post info) last post received: 0 0 24
              last post received-location: ksasnd
              last process to post me: c000000301009f38 159 0
              last post sent: 0 0 25
              last post sent-location: ksasnr
              last process posted by me: c000000301009f38 159 0
  (latch info) wait_event=0 bits=0
    Process Group: DEFAULT, pseudo proc: c00000038c9cf2b8
    O/S info: user: oracle, term: UNKNOWN, ospid: 1921 
    OSD pid info: Unix process pid: 1921, image: oracle@JZHDB2 (CKPT)
Dump of memory from 0xC00000038B934648 to 0xC00000038B934850
C00000038B934640                   00000010 00000000          [........]
C00000038B934650 C0000003 04851500 00000007 000313A7  [................]
C00000038B934660 C0000003 048515E0 00000007 000313A7  [................]
C00000038B934670 C0000003 048517B8 00000007 000313A7  [................]
C00000038B934680 C0000003 0484A998 00000007 000313A7  [................]
C00000038B934690 C0000003 048ECBB8 00000007 000313A7  [................]
C00000038B9346A0 C0000003 048F6960 00000007 000313A7  [......i`........]
C00000038B9346B0 C0000003 048F8210 00000007 000313A7  [................]
C00000038B9346C0 C0000003 048F8E68 00000007 000313A7  [.......h........]
C00000038B9346D0 C0000003 04901360 00000007 000313A7  [.......`........]
C00000038B9346E0 C0000003 04901A78 00000007 000313A7  [.......x........]
C00000038B9346F0 C0000003 04901ED8 00000007 000313A7  [................]
C00000038B934700 C0000003 04902178 00000007 000313A7  [......!x........]
C00000038B934710 C0000003 0647F1B8 00000007 000313A7  [.....G..........]
C00000038B934720 C0000003 0647F298 00000007 000313A7  [.....G..........]
C00000038B934730 C0000003 048386E0 00000007 000313A7  [................]
C00000038B934740 C0000003 048387C0 00000007 000313A7  [................]
C00000038B934750 C0000003 0484D840 00000007 000313A7  [.......@........]
C00000038B934760 C0000003 0484D920 00000007 000313A7  [....... ........]
C00000038B934770 C0000003 0484F8D0 00000007 000313A7  [................]
C00000038B934780 C0000003 0484F9B0 00000007 000313A7  [................]
C00000038B934790 C0000003 0484FAA8 00000007 000313A7  [................]
C00000038B9347A0 C0000003 0484FB88 00000007 000313A7  [................]
C00000038B9347B0 C0000003 0484FC68 00000007 000313A7  [.......h........]
C00000038B9347C0 C0000003 0484FE28 00000007 000313A7  [.......(........]
C00000038B9347D0 C0000003 0484FF08 00000007 000313A7  [................]
C00000038B9347E0 C0000003 0484FFE8 00000007 000313A7  [................]
C00000038B9347F0 C0000003 049266D0 00000007 000313A7  [......f.........]
C00000038B934800 C0000003 049267B0 00000007 000313A7  [......g.........]
C00000038B934810 C0000003 04926890 00000007 000313A7  [......h.........]
C00000038B934820 C0000003 04926988 00000007 000313A7  [......i.........]
C00000038B934830 C0000003 0492B9A0 00000007 000313A7  [................]
C00000038B934840 C0000003 04851420 00000007 000313A7  [....... ........]
----------------------------------------
SO: c0000003015c0b00, type: 4, owner: c000000301002038, flag: INIT/-/-/0x00
  (session) sid: 824 trans: 0000000000000000, creator: c000000301002038, flag: (51) USR/- BSY/-/-/-/-/-
            DID: 0001-000A-00000004, short-term DID: 0000-0000-00000000
            txn branch: 0000000000000000
            oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS
  waiting for 'db file sequential read' blocking sess=0x0000000000000000 seq=52963 wait_time=0 seconds since wait started=910
          file#=7c, block#=1, blocks=1
  Dumping Session Wait History
   for 'db file single write' count=1 wait_time=488
          file#=7b, block#=1, blocks=1
   for 'db file sequential read' count=1 wait_time=4382
          file#=7b, block#=1, blocks=1
   for 'db file single write' count=1 wait_time=333
          file#=7a, block#=1, blocks=1
   for 'db file sequential read' count=1 wait_time=4759
          file#=7a, block#=1, blocks=1
   for 'db file single write' count=1 wait_time=587
          file#=79, block#=1, blocks=1
   for 'db file sequential read' count=1 wait_time=7210
          file#=79, block#=1, blocks=1
   for 'db file single write' count=1 wait_time=554
          file#=78, block#=1, blocks=1
   for 'db file sequential read' count=1 wait_time=8361
          file#=78, block#=1, blocks=1
   for 'db file single write' count=1 wait_time=330
          file#=77, block#=1, blocks=1
   for 'db file sequential read' count=1 wait_time=2886
          file#=77, block#=1, blocks=1
  temporary object counter: 0
KTU Session Commit Cache Dump for IDLs: 
KTU Session Commit Cache Dump for Non-IDLs: 
----------------------------------------
UOL used : 0 locks(used=0, free=2)
KGX Atomic Operation Log c000000388ff8820
 Mutex 0000000000000000(0, 0) idn 0 oper NONE
 Cursor Parent uid 824 efd 13 whr 22 slp 0
 oper=NONE pt1=c000000383fb0c50 pt2=c000000383fb0e38 pt3=c000000383fb1358
 pt4=0000000000000000 u41=0 stt=0
KGX Atomic Operation Log c000000388ff8868
 Mutex c000000382b23650(1073741824, 13490240) idn c0000003 oper NONE
 Cursor Stat uid 824 efd 13 whr 2 slp 0
 oper=NONE pt1=0000000000000000 pt2=0000000000000000 pt3=0000000000000000
 pt4=0000000000000000 u41=0 stt=8
KGX Atomic Operation Log c000000388ff88b0
 Mutex 0000000000000000(0, 0) idn 0 oper NONE
 Library Cache uid 824 efd 0 whr 0 slp 0
Session 824: no row
===================================================
SYSTEM STATE
------------
System global information:
     processes: base c00000038b930d40, size 1500, cleanup c000000301000078
     allocation: free sessions c00000038c9d7738, free calls 0000000000000000
     control alloc errors: 0 (process), 0 (session), 0 (call)
     PMON latch cleanup depth: 0
     seconds since PMON's last scan for dead processes: 18
說明:1、ARCH1程式請求的資源正是CF-00000000-00000000
         2、pid 1921為oracle後臺程式ckpt,阻塞了ARCH1程式
         3、pid 1921 ckpt程式持有enqueue [CF] on resource CF-00000000-00000000為910s.
        ORA-00494錯誤通常是由於系統I/O原因,結合故障時,本地磁碟I/O較高,MOS上找到文件ORA-00494: Enqueue [CF] Error when        using rman (文件 ID 1598848.1)
CAUSE
The snapshot controlfile setting for rman was using a slow filesystem.
SOLUTION
The snapshot controlfile setting for rman was changed to a different filesystem that performs better, for instance:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/new_mount/snapcf.ctl';
         說明: 1、當前系統已開啟control file自動備份
                   2、snapshot controlfile備份存在本地磁碟,即$ORACLE_HOME/dbs目錄下
                   3、snapshot controlfile大小約74M
-rw-r-----   1 oracle     oinstall   74252288 Nov 23  2015 c-1734772114-20151123-14
-rw-r-----   1 oracle     oinstall   74252288 Jan  7  2016 c-1734772114-20160107-11
-rw-r-----   1 oracle     oinstall   74252288 Jan  7  2016 c-1734772114-20160107-12
-rw-r-----   1 oracle     oinstall   74252288 Jan 13  2016 c-1734772114-20160113-12
-rw-r-----   1 oracle     oinstall   74252288 Feb 17  2016 c-1734772114-20160217-12
-rw-r-----   1 oracle     oinstall   74252288 Feb 17  2016 c-1734772114-20160217-13
-rw-r-----   1 oracle     oinstall   74252288 Mar 21  2016 c-1734772114-20160321-13
-rw-r-----   1 oracle     oinstall      5236 Apr 11  2016 alert_jzh.log
-rw-r-----   1 oracle     oinstall   74252288 Apr 22  2016 c-1734772114-20160422-13
-rw-r-----   1 oracle     oinstall   74252288 Apr 22  2016 c-1734772114-20160422-14
-rw-r-----   1 oracle     oinstall   74252288 Apr 22  2016 c-1734772114-20160422-16
-rw-r-----   1 oracle     oinstall   74252288 May  3  2016 c-1734772114-20160503-07
-rw-r-----   1 oracle     oinstall      3072 Nov  5 21:02 orapwjzh
-rw-rw----   1 oracle     oinstall      1544 Nov 11 05:48 hc_jzh.dat
-rw-r-----   1 oracle     oinstall   74170368 Nov 11 16:25 snapcf_jzh.f
總結:由於本地磁碟I/O效能較差,導致snapshot controlfile備份很慢,導致資料庫crash
建議:將snapshot存放於I/O效能較好的磁碟或儲存上,另外archivelog條目超過25000個,建議備份時crosscheck,並delete obsolete/epired archivelog







來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/10271187/viewspace-2128566/,如需轉載,請註明出處,否則將追究法律責任。

相關文章