近日某客戶資料庫凌晨5點左右crash，報如下錯誤：
Fri Nov 11 03:17:51 2016
Thread 1 advanced to log sequence 583682 (LGWR switch)
Current log# 16 seq# 583682 mem# 0: /jzh/jzh_u38/jzh/redo16a.log
Fri Nov 11 05:23:03 2016
ALTER SYSTEM ARCHIVE LOG
Fri Nov 11 05:23:03 2016
Thread 1 cannot allocate new log, sequence 583683
Private strand flush not complete
Current log# 16 seq# 583682 mem# 0: /jzh/jzh_u38/jzh/redo16a.log
Fri Nov 11 05:23:03 2016
Thread 1 advanced to log sequence 583683 (LGWR switch)
Current log# 6 seq# 583683 mem# 0: /jzh/jzh_u38/jzh/redo06a.log
Fri Nov 11 05:24:13 2016
ALTER SYSTEM ARCHIVE LOG
Fri Nov 11 05:26:58 2016
Thread 1 advanced to log sequence 583684 (LGWR switch)
Current log# 7 seq# 583684 mem# 0: /jzh/jzh_u38/jzh/redo07a.log
Fri Nov 11 05:27:27 2016
Starting control autobackup
Fri Nov 11 05:27:54 2016
Control autobackup written to SBT_TAPE device
comment 'API Version 2.0,MMS Version 5.0.0.0',
media '@aaaag'
handle 'c-1734772114-20161111-03'
Fri Nov 11 05:48:24 2016
Errors in file /u01/oracle/admin/jzh/bdump/jzh_arc1_1940.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 1921'
Fri Nov 11 05:48:25 2016
System State dumped to trace file /u01/oracle/admin/jzh/bdump/jzh_arc1_1940.trc
Fri Nov 11 05:48:35 2016
Killing enqueue blocker (pid=1921) on resource CF-00000000-00000000
by killing session 824.1
Killing enqueue blocker (pid=1921) on resource CF-00000000-00000000
by terminating the process
ARC1: terminating instance due to error 2103
Termination issued to instance processes. Waiting for the processes to exit
Fri Nov 11 05:48:46 2016
Instance termination failed to kill one or more processes
Fri Nov 11 05:55:48 2016
Instance terminated by ARC1, pid = 1940
Fri Nov 11 07:05:18 2016
Starting ORACLE instance (normal)
Fri Nov 11 07:05:18 2016
Specified value of sga_max_size is too small, bumping to 12247367680
說明：1、pid 1921/sid 824程式持有enqueue [CF]超過900s,R持有資源CF-00000000-00000000
2、pid 1940 ARC1程式kill pid 1921失敗，並關閉例項。
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining Scoring Engine
and Real Application Testing options
ORACLE_HOME = /u01/oracle/product/db10gr2
System name: HP-UX
Node name: JZHDB2
Release: B.11.31
Version: U
Machine: 9000/800
Instance name: jzh
Redo thread mounted by this instance: 1
Oracle process number: 9
Unix process pid: 1940, image: oracle@JZHDB2 (ARC1)
*** SERVICE NAME:(SYS$BACKGROUND) 2016-11-11 05:48:24.968
*** SESSION ID:(1651.1) 2016-11-11 05:48:24.968
*** 2016-11-11 05:48:24.968
Unable to get enqueue on resource CF-00000000-00000000 (ges mode req=3 held=6)
Possible local blocker ospid=1921 sid=824 sser=1 time_held=910 secs (ges mode req=6 held=4)
DUMP LOCAL BLOCKER: initiate state dump for KILL BLOCKER
possible owner[10.1921] on resource CF-00000000-00000000
Dumping process info of pid[10.1921] requested by pid[9.1940]
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 1921'
-------------------------------------------------------------------------------
ENQUEUE [CF] HELD FOR TOO LONG
enqueue holder: 'inst 1, osid 1921'
Process 'inst 1, osid 1921' is holding an enqueue for maximum allowed time.
The process will be terminated.
Oracle Support Services triaging information: to find the root-cause, look
at the call stack of process 'inst 1, osid 1921' located below. Ask the
developer that owns the first NON-service layer in the stack to investigate.
Common service layers are enqueues (ksq), latches (ksl), library cache
pins and locks (kgl), and row cache locks (kqr).
Dumping process 10.1921 info:
*** 2016-11-11 05:48:24.982
Dumping diagnostic information for ospid 1921:
OS pid = 1921
loadavg : 0.17 0.18 0.22
Swapinfo :
Avail = 111554.56Mb Used = 37997.61Mb
Swap free = 73556.95Mb Kernel rsvd = 10888.46Mb
Free Mem = 22945.59Mb
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME COMD
11001 R oracle 1921 1 0 152 20 41489b600 29229 - Apr 15 ? 8080:31 ora_ckpt_jzh
11001 S oracle 19215 1 0 154 20 6dc245080 28763 3821d9b68 Nov 6 ? 0:13 oraclejzh (LOCAL=NO)
11001 S oracle 19213 1 0 154 20 e9d85dd00 28827 dcd59268 Nov 6 ? 0:15 oraclejzh (LOCAL=NO)
11001 S oracle 19211 1 0 154 20 29d10aa80 28826 cb19bce68 Nov 6 ? 0:15 oraclejzh (LOCAL=NO)
12401 S patrol 15982 1 0 153 20 48f97f080 7320 a5c32a0a0 05:39:35 ? 0:00 /opt/langtools/bin/gdb64 -quie
t /u01/oracle/product/db10gr2/bin/oracle 1921
*** 2016-11-11 05:48:25.581
----------------------------------------
SO: c000000301002038, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=10, calls cur/top: c0000003015c8548/c0000003015c8548, flag: (6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 24
last post received-location: ksasnd
last process to post me: c000000301009f38 159 0
last post sent: 0 0 25
last post sent-location: ksasnr
last process posted by me: c000000301009f38 159 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: c00000038c9cf2b8
O/S info: user: oracle, term: UNKNOWN, ospid: 1921
OSD pid info: Unix process pid: 1921, image: oracle@JZHDB2 (CKPT)
Dump of memory from 0xC00000038B934648 to 0xC00000038B934850
C00000038B934640 00000010 00000000 [........]
C00000038B934650 C0000003 04851500 00000007 000313A7 [................]
C00000038B934660 C0000003 048515E0 00000007 000313A7 [................]
C00000038B934670 C0000003 048517B8 00000007 000313A7 [................]
C00000038B934680 C0000003 0484A998 00000007 000313A7 [................]
C00000038B934690 C0000003 048ECBB8 00000007 000313A7 [................]
C00000038B9346A0 C0000003 048F6960 00000007 000313A7 [......i`........]
C00000038B9346B0 C0000003 048F8210 00000007 000313A7 [................]
C00000038B9346C0 C0000003 048F8E68 00000007 000313A7 [.......h........]
C00000038B9346D0 C0000003 04901360 00000007 000313A7 [.......`........]
C00000038B9346E0 C0000003 04901A78 00000007 000313A7 [.......x........]
C00000038B9346F0 C0000003 04901ED8 00000007 000313A7 [................]
C00000038B934700 C0000003 04902178 00000007 000313A7 [......!x........]
C00000038B934710 C0000003 0647F1B8 00000007 000313A7 [.....G..........]
C00000038B934720 C0000003 0647F298 00000007 000313A7 [.....G..........]
C00000038B934730 C0000003 048386E0 00000007 000313A7 [................]
C00000038B934740 C0000003 048387C0 00000007 000313A7 [................]
C00000038B934750 C0000003 0484D840 00000007 000313A7 [.......@........]
C00000038B934760 C0000003 0484D920 00000007 000313A7 [....... ........]
C00000038B934770 C0000003 0484F8D0 00000007 000313A7 [................]
C00000038B934780 C0000003 0484F9B0 00000007 000313A7 [................]
C00000038B934790 C0000003 0484FAA8 00000007 000313A7 [................]
C00000038B9347A0 C0000003 0484FB88 00000007 000313A7 [................]
C00000038B9347B0 C0000003 0484FC68 00000007 000313A7 [.......h........]
C00000038B9347C0 C0000003 0484FE28 00000007 000313A7 [.......(........]
C00000038B9347D0 C0000003 0484FF08 00000007 000313A7 [................]
C00000038B9347E0 C0000003 0484FFE8 00000007 000313A7 [................]
C00000038B9347F0 C0000003 049266D0 00000007 000313A7 [......f.........]
C00000038B934800 C0000003 049267B0 00000007 000313A7 [......g.........]
C00000038B934810 C0000003 04926890 00000007 000313A7 [......h.........]
C00000038B934820 C0000003 04926988 00000007 000313A7 [......i.........]
C00000038B934830 C0000003 0492B9A0 00000007 000313A7 [................]
C00000038B934840 C0000003 04851420 00000007 000313A7 [....... ........]
----------------------------------------
SO: c0000003015c0b00, type: 4, owner: c000000301002038, flag: INIT/-/-/0x00
(session) sid: 824 trans: 0000000000000000, creator: c000000301002038, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-000A-00000004, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS
waiting for 'db file sequential read' blocking sess=0x0000000000000000 seq=52963 wait_time=0 seconds since wait started=910
file#=7c, block#=1, blocks=1
Dumping Session Wait History
for 'db file single write' count=1 wait_time=488
file#=7b, block#=1, blocks=1
for 'db file sequential read' count=1 wait_time=4382
file#=7b, block#=1, blocks=1
for 'db file single write' count=1 wait_time=333
file#=7a, block#=1, blocks=1
for 'db file sequential read' count=1 wait_time=4759
file#=7a, block#=1, blocks=1
for 'db file single write' count=1 wait_time=587
file#=79, block#=1, blocks=1
for 'db file sequential read' count=1 wait_time=7210
file#=79, block#=1, blocks=1
for 'db file single write' count=1 wait_time=554
file#=78, block#=1, blocks=1
for 'db file sequential read' count=1 wait_time=8361
file#=78, block#=1, blocks=1
for 'db file single write' count=1 wait_time=330
file#=77, block#=1, blocks=1
for 'db file sequential read' count=1 wait_time=2886
file#=77, block#=1, blocks=1
temporary object counter: 0
KTU Session Commit Cache Dump for IDLs:
KTU Session Commit Cache Dump for Non-IDLs:
----------------------------------------
UOL used : 0 locks(used=0, free=2)
KGX Atomic Operation Log c000000388ff8820
Mutex 0000000000000000(0, 0) idn 0 oper NONE
Cursor Parent uid 824 efd 13 whr 22 slp 0
oper=NONE pt1=c000000383fb0c50 pt2=c000000383fb0e38 pt3=c000000383fb1358
pt4=0000000000000000 u41=0 stt=0
KGX Atomic Operation Log c000000388ff8868
Mutex c000000382b23650(1073741824, 13490240) idn c0000003 oper NONE
Cursor Stat uid 824 efd 13 whr 2 slp 0
oper=NONE pt1=0000000000000000 pt2=0000000000000000 pt3=0000000000000000
pt4=0000000000000000 u41=0 stt=8
KGX Atomic Operation Log c000000388ff88b0
Mutex 0000000000000000(0, 0) idn 0 oper NONE
Library Cache uid 824 efd 0 whr 0 slp 0
Session 824: no row
===================================================
SYSTEM STATE
------------
System global information:
processes: base c00000038b930d40, size 1500, cleanup c000000301000078
allocation: free sessions c00000038c9d7738, free calls 0000000000000000
control alloc errors: 0 (process), 0 (session), 0 (call)
PMON latch cleanup depth: 0
seconds since PMON's last scan for dead processes: 18

說明：1、ARCH1程式請求的資源正是CF-00000000-00000000
2、pid 1921為oracle後臺程式ckpt，阻塞了ARCH1程式
3、pid 1921 ckpt程式持有enqueue [CF] on resource CF-00000000-00000000為910s.
ORA-00494錯誤通常是由於系統I/O原因，結合故障時，本地磁碟I/O較高，MOS上找到文件ORA-00494: Enqueue [CF] Error when using rman (文件 ID 1598848.1)

CAUSE
The snapshot controlfile setting for rman was using a slow filesystem.
SOLUTION
The snapshot controlfile setting for rman was changed to a different filesystem that performs better, for instance:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/new_mount/snapcf.ctl';
說明： 1、當前系統已開啟control file自動備份
2、snapshot controlfile備份存在本地磁碟,即$ORACLE_HOME/dbs目錄下
3、snapshot controlfile大小約74M
-rw-r----- 1 oracle oinstall 74252288 Nov 23 2015 c-1734772114-20151123-14
-rw-r----- 1 oracle oinstall 74252288 Jan 7 2016 c-1734772114-20160107-11
-rw-r----- 1 oracle oinstall 74252288 Jan 7 2016 c-1734772114-20160107-12
-rw-r----- 1 oracle oinstall 74252288 Jan 13 2016 c-1734772114-20160113-12
-rw-r----- 1 oracle oinstall 74252288 Feb 17 2016 c-1734772114-20160217-12
-rw-r----- 1 oracle oinstall 74252288 Feb 17 2016 c-1734772114-20160217-13
-rw-r----- 1 oracle oinstall 74252288 Mar 21 2016 c-1734772114-20160321-13
-rw-r----- 1 oracle oinstall 5236 Apr 11 2016 alert_jzh.log
-rw-r----- 1 oracle oinstall 74252288 Apr 22 2016 c-1734772114-20160422-13
-rw-r----- 1 oracle oinstall 74252288 Apr 22 2016 c-1734772114-20160422-14
-rw-r----- 1 oracle oinstall 74252288 Apr 22 2016 c-1734772114-20160422-16
-rw-r----- 1 oracle oinstall 74252288 May 3 2016 c-1734772114-20160503-07
-rw-r----- 1 oracle oinstall 3072 Nov 5 21:02 orapwjzh
-rw-rw---- 1 oracle oinstall 1544 Nov 11 05:48 hc_jzh.dat
-rw-r----- 1 oracle oinstall 74170368 Nov 11 16:25 snapcf_jzh.f
總結：由於本地磁碟I/O效能較差，導致snapshot controlfile備份很慢，導致資料庫crash
建議：將snapshot存放於I/O效能較好的磁碟或儲存上，另外archivelog條目超過25000個，建議備份時crosscheck，並delete obsolete/epired archivelog

ORA-00494: enqueue [CF] held for too long (more than 900 seconds)

相關文章