ORA-00494: enqueue [CF] held for too long (more than 900 seconds) -RAC
客戶的一臺rac一個月hang死一次,上次重啟機器後一切正常,這次必須一次性解決,生產庫不可能老是重啟賽。
##############################################
系統軟體日誌:syslog.log
##############
Jan 6 04:57:51 db2 cimserver[1971]: PGS13606: A failure was detected in provider module HPUXStorageProviderModule. The generation of indications by pro
viders in this module may be affected. To ensure these providers are serving active subscriptions, disable and then re-enable this module using the cimproviu
der command.
Jan 6 06:24:17 db2 automountd[1485]: match_addr: getaddrinfo failed: host nor service provided, or not known
Jan 6 06:24:17 db2 automountd[1485]: match_addr: getaddrinfo failed: host nor service provided, or not known
Jan 6 06:41:47 db2 above message repeats 47 times
##############
uamdb2 automountd[1485]和oracle的ORA-00494: enqueue [CF] held 同時報錯,在11月28日出問題時也是同時報錯。
諮詢HP工程師,回覆說uamdb2 automountd[1485]錯誤是由網路問題,或者是DNS服務解析出問題引起的。
停止了系統的autofs服務後,系統報錯消失,但oracle仍然報錯ORA-00494: enqueue [CF] held
##############################################
oracle日誌:alert_db2.log
同時發現db2_diag_26520.trc未生成
##############
Wed Jan 6 06:24:15 2010
Errors in file /oracle/admin/db/bdump/db2_diag_26520.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 2, osid 22109'
Wed Jan 6 06:24:18 2010
System State dumped to trace file /oracle/admin/db/bdump/db2_diag_26520.trc
System State dumped to trace file /oracle/admin/db/bdump/db2_diag_26520.trc
Wed Jan 6 06:24:37 2010
System State dumped to trace file /oracle/admin/db/bdump/db2_diag_26520.trc
Wed Jan 6 06:25:15 2010
Killing enqueue blocker (pid=22109) on resource CF-00000000-00000000
by killing session 1061.7968
Wed Jan 6 06:25:34 2010
Killing enqueue blocker (pid=22109) on resource CF-00000000-00000000
by killing session 1061.7968
Wed Jan 6 06:27:24 2010
System State dumped to trace file /oracle/admin/db/bdump/db2_diag_26520.trc
Wed Jan 6 06:28:24 2010
Killing enqueue blocker (pid=22109) on resource CF-00000000-00000000
by killing session 1061.7968
Wed Jan 6 06:28:31 2010
System State dumped to trace file /oracle/admin/db/bdump/db2_diag_26520.trc
Wed Jan 6 06:29:31 2010
Killing enqueue blocker (pid=22109) on resource CF-00000000-00000000
by killing session 1061.7968
##############################################
oracle 監聽器日誌:/oracle/product/102/network/log/listener_db2.log
##############
06-JAN-2010 06:08:41 * service_died * uamdb1 * 12537
06-JAN-2010 06:09:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=db)(CID=(PROGRAM=rman@db1)(HOST=db1)(USER=root))) * (ADDRESS=(PROTOCOL=tcp)(HOST=136.5.74.132)(PORT=57242)) * establish * db * 0
06-JAN-2010 06:09:01 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=uamdb)(CID=(PROGRAM=rman)(HOST=db1)(USER=root))) * (ADDRESS=(PROTOCOL=tcp)(HOST=136.5.74.132)(PORT=57244)) * establish * db * 0
06-JAN-2010 06:09:02 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=db)(CID=(PROGRAM=rman)(HOST=db1)(USER=root))) * (ADDRESS=(PROTOCOL=tcp)(HOST=136.5.74.132)(PORT=57246)) * establish * uamdb * 0
06-JAN-2010 06:09:02 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=db)(CID=(PROGRAM=rman)(HOST=db1)(USER=root))) * (ADDRESS=(PROTOCOL=tcp)(HOST=136.5.74.132)(PORT=57247)) * establish * db * 0
06-JAN-2010 06:09:04 * service_update * db2 * 0
06-JAN-2010 06:09:22 * service_update * db2 * 0
06-JAN-2010 06:09:44 * service_register * db1 * 1184
##############################################
rac crs資源檢查
##############
node1
##############
uamdb1#[/]crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.uamdb.db application ONLINE ONLINE uamdb2
ora....b1.inst application ONLINE ONLINE uamdb1
ora....b2.inst application ONLINE ONLINE uamdb2
ora....SM1.asm application ONLINE ONLINE uamdb1
ora....B1.lsnr application ONLINE ONLINE uamdb1
ora.uamdb1.gsd application ONLINE ONLINE uamdb1
ora.uamdb1.ons application ONLINE ONLINE uamdb1
ora.uamdb1.vip application ONLINE ONLINE uamdb1
ora....SM2.asm application ONLINE ONLINE uamdb2
ora....B2.lsnr application ONLINE ONLINE uamdb2
ora.uamdb2.gsd application ONLINE ONLINE uamdb2
ora.uamdb2.ons application ONLINE ONLINE uamdb2
ora.uamdb2.vip application ONLINE ONLINE uamdb2
uamdb1#[/]crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
##############
通過以上檢查,發現node1資源狀態完全正常
##############
node2
##############
uamdb2#[/]crs_stat -t
#掛死,不出結果
uamdb2#[/]crsctl check crs
#掛死,不出結果
uamdb2#[/]crsctl check crsd
CRS appears healthy
uamdb2#[/]crsctl check evmd
EVM appears healthy
uamdb2#[/]crsctl check cssd
#掛死,不出結果
##############
通過以上檢查,發現node2資源狀態不正常,css掛死。
##############
檢查oracle系統問題
##############
node2
##############
sys@DB_SQL> select event,count(*) from v$active_session_history group by event order by 2;
EVENT COUNT(*)
---------------------------------------------------------------- ----------
PX Deq: reap credit 1
name-service call wait 1
os thread startup 14
control file sequential read 19
ksdxexeotherwait 56
enq: WF - contention 859
control file parallel write 2782
enq: CF - contention 5493
24914
##############
enq: CF - contention :5493 CFqueue很嚴重
##############
鎖等待:
sys@DB_SQL> SELECT s.username,l.OBJECT_ID,l.SESSION_ID,s.SERIAL#,l.ORACLE_USERNAME,l.OS_USER_NAME,l.PROCESS FROM V$LOCKED_OBJECT l,V$SESSION S WHERE l.SESSION_ID=S.SID;
USERNAME OBJECT_ID SESSION_ID SERIAL# ORACLE_USERNAME OS_USER_NAME PROCESS
------------------------------ ---------- ---------- ---------- ------------------- ------------------------------
9167 1052 40224 oracle 24193
9238 1052 40224 oracle 24193
2 rows selected.
sys@DB_SQL> select sid,serial#,username,program,sql_hash_value,
2 to_char(logon_time,'yyyy/mm/dd hh24:mi:ss') as login_time
3 from v$session
4 where paddr in ( select addr from v$process where spid=&spid);
Enter value for spid: 24193
SID SERIAL# USERNAME PROGRAM hash_value LOGIN_TIME
------ ------- ---------- -------------------------------- ---------- -------------------
1052 40224 oracle@db2 (m000) 0 2010/01/07 09:33:18
1 row selected.
sys@UAMDB_SQL> SELECT l.sid, s.blocking_session blocker, s.event, l.type, l.lmode, l.request, o.object_name, o.object_type
2 FROM v$lock l, dba_objects o, v$session s
3 WHERE l.id1 = o.object_id (+)
4 AND l.sid= s.sid
5 and l.sid=&sid
6 ORDER BY sid, type;
Enter value for sid: 1052
SID BLOCKER EVENT TYPE LMODE REQUEST OBJECT_NAME OBJECT_TYPE
------ ---------- -------------------- ---------- ---------- ----------
1052 gc current request SH 6 0
1052 gc current request TM 6 0 WRH$_ACTIVE_SESSION_HISTORY TABLE
1052 gc current request TM 3 0 WRM$_DATABASE_INSTANCE TABLE
1052 gc current request TX 6 0
1052 gc current request WF 4 0
sys@DB_SQL> SELECT s.sid , s.serial#, s.status, s.server, s.username, e.event, e.time_waited
2 FROM v$session_event e, v$session s
3 WHERE e.sid=s.sid
4 and e.sid=&sid;
Enter value for sid: 1052
SID SERIAL# STATUS SERVER USERNAME EVENT TIME_WAITED
------ ------- -------- --------- ---------- -------------------- -----------
1052 40224 ACTIVE DEDICATED class slave wait 0
1052 40224 ACTIVE DEDICATED control file sequential read 0
1052 40224 ACTIVE DEDICATED gc cr block 2-way 0
1052 40224 ACTIVE DEDICATED gc current block 2-way 0
1052 40224 ACTIVE DEDICATED gc current block busy 0
1052 40224 ACTIVE DEDICATED gc current grant busy 0
1052 40224 ACTIVE DEDICATED enq: TM - contention 4
1052 40224 ACTIVE DEDICATED row cache lock 0
1052 40224 ACTIVE DEDICATED library cache lock 0
1052 40224 ACTIVE DEDICATED events in waitclass Other 0
該session 有4個TM等待,TM等待一般是由於 外來鍵沒有索引 引起的
################
外來鍵沒有索引的列
################
sys@DB_SQL> SELECT * FROM (
2 SELECT c.table_name, cc.column_name, cc.position column_position
3 FROM user_constraints c, user_cons_columns cc
4 WHERE c.constraint_name = cc.constraint_name
5 AND c.constraint_type = 'R'
6 MINUS
7 SELECT i.table_name, ic.column_name, ic.column_position
8 FROM user_indexes i, user_ind_columns ic
9 WHERE i.index_name = ic.index_name
10 )
11 ORDER BY table_name, column_position;
TABLE_NAME COLUMN_NAME COLUMN_POSITION
------------------------------ ---------------------------------------- ---------------
HS$_CLASS_CAPS CAP_NUMBER 1
HS$_CLASS_DD DD_TABLE_ID 1
HS$_FDS_INST FDS_CLASS_ID 1
HS$_INST_CAPS CAP_NUMBER 1
HS$_INST_DD DD_TABLE_ID 1
OLAP_OLEDB_MDPROPS VALUE 1
REGISTRY$ PID 2
WRM$_SNAPSHOT INSTANCE_NUMBER 2
WRM$_SNAPSHOT STARTUP_TIME 3
9 rows selected.
#######################
enqueue的狀態
#######################
sys@DB_SQL> select * from v$enqueue_stat where cum_wait_time>0 order by inst_id,cum_wait_time;
INST_ID EQ TOTAL_REQ# TOTAL_WAIT# SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
---------- -- ---------- ----------- ---------- ----------- -------------
2 KM 2 1 2 0 1
2 RR 2 1 2 0 1
2 PE 397651 3 397651 0 1
2 TL 2 1 2 0 1
2 SW 29 1 29 0 1
2 XR 781 1 781 0 1
2 CN 7 1 7 0 1
2 PW 14 2 2 12 2
2 RT 6 4 5 1 4
2 PG 27 10 27 0 5
2 DL 82 16 82 0 6
2 BR 102 41 102 0 11
2 KO 81 27 81 0 12
2 UL 1395 149 1395 0 32
2 WP 38 32 38 0 36
2 MR 62790 68 62790 0 38
2 SK 263 120 263 0 40
2 IR 7173 160 7173 0 46
2 TS 2904 381 2904 0 84
2 PI 600 211 600 0 94
2 MD 487 244 487 0 100
2 TO 1139 543 1139 0 142
2 RS 1332 599 1332 0 178
2 TC 1095 776 1095 0 308
2 TX 403790 502 403790 0 422
2 FB 9488 2532 9488 0 682
2 TT 362946 3822 362946 0 882
2 AF 17905 1913 17905 0 1075
2 MW 5111 3746 5111 0 1527
2 HW 610709 5172 610709 0 1632
2 PR 28957 36 28957 0 1763
2 DR 5112 5014 5112 0 2646
2 TD 21603 15045 21603 0 3441
2 RO 21953 7763 21953 0 4177
2 US 40633 32406 40633 0 6173
2 TA 35746 26669 35746 0 9295
2 JS 104933748 5264 104933573 175 16486
2 WL 580 260 358 222 178461
2 PS 1550184 690899 1537349 12835 250683
2 WF 63022 40630 44837 18185 47065384
2 FU 264 261 8 256 230306409
2 CF 20987280 11795 20970328 16933 407841619
2 TM 664700 13494 664021 679 926373131
CF和TM都是相當的高。
##############
node1
##############
sys@DB_SQL> SELECT s.username,l.OBJECT_ID,l.SESSION_ID,s.SERIAL#,l.ORACLE_USERNAME,l.OS_USER_NAME,l.PROCESS FROM V$LOCKED_OBJECT l,V$SESSION S WHERE l.SESSION_ID=S.SID;
no rows selected
##############
沒有鎖等待
sys@DB_SQL> select event,count(*) from v$active_session_history group by event order by 2;
EVENT COUNT(*)
---------------------------------------------------------------- ----------
rdbms ipc reply 1
gc current block busy 1
null event 1
gc cr grant 2-way 1
enq: PS - contention 1
gc current block 2-way 2
library cache lock 2
latch free 2
gc cr block 2-way 3
gc current grant busy 4
log file sequential read 4
Log archive I/O 4
DFS lock handle 5
gc cr disk read 5
Backup: sbtclose2 6
ges inquiry response 7
row cache lock 10
wait for scn ack 18
db file parallel write 19
log file parallel write 21
control file parallel write 22
Backup: sbtinit2 36
log file sync 41
control file sequential read 50
os thread startup 55
Backup: sbtbackup 83
db file sequential read 111
ksdxexeotherwait 186
RMAN backup & recovery I/O 328
693
Backup: sbtwrite2 1591
enq: WF - contention 2709
reliable message 3714
enq: TM - contention 10651
enq: CF - contention 15625
35 rows selected.
sys@DB_SQL> select * from v$enqueue_stat where cum_wait_time>0 order by inst_id,cum_wait_time;
INST_ID EQ TOTAL_REQ# TOTAL_WAIT# SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
---------- -- ---------- ----------- ---------- ----------- -------------
1 TL 1 1 1 0 1
1 FU 57 2 57 0 3
1 DL 62 11 62 0 5
1 SK 76 22 76 0 14
1 PI 160 53 160 0 15
1 PG 32 25 32 0 16
1 MR 23713 34 23713 0 24
1 WP 21 21 21 0 25
1 MD 134 68 134 0 32
1 TO 312 153 312 0 45
1 RS 364 160 364 0 63
1 BR 459 176 459 0 68
1 HW 622463 747 622463 0 230
1 TX 1785623 84 1785623 0 284
1 PR 9439 6 9439 0 317
1 AF 5256 573 5256 0 385
1 FB 7088 1358 7088 0 597
1 MW 1364 1364 1364 0 608
1 TC 685 203 685 0 645
1 DR 1364 1364 1364 0 902
1 TT 100900 4119 100900 0 975
1 RO 7356 2134 7356 0 1037
1 TD 5352 5352 5352 0 1511
1 US 10831 6203 10831 0 1523
1 JS 15730382 1599 15730336 46 2122
1 TA 8234 8107 8234 0 3271
1 PS 405099 156349 401698 3401 63259
1 WL 159 63 92 67 64756
1 TM 4522892 4094 4522890 0 317188
1 WF 19798 7608 13427 6371 3337224
1 CF 5556633 3787 5554145 2482 66750958
###################################################################################
問題現狀跟Bug 6857917很相像。
Description
An Oracle foreground process can cause a cluster wide hang if it is holding a critical enqueue (such as CF) and it hangs when registering with CSS. With this fix the process will attempt to register BEFORE acquiring such an enqueue. Note: This fix does not attempt to fix any process hang when registering will CSS, but instead tries to ensure that if such a hang does occur then the impact is reduced.
Description
An Oracle foreground process can cause a cluster wide hang if it is holding a critical enqueue (such as CF) and it hangs when registering with CSS. With this fix the process will attempt to register BEFORE acquiring such an enqueue. Note: This fix does not attempt to fix any process hang when registering will CSS, but instead tries to ensure that if such a hang does occur then the impact is reduced.
#####################
最終處理方式:升級到10.2.0.4.2後,資料庫正常,再也沒有發生hang死的情況。
#####################
故障在2個月後又重現,這次oracle的原廠工程師分析後,指出系統的記憶體洩漏的bug問題,同時我也處理了個oracle的記憶體洩漏的bug,期待問題能解決。
hp:建議補丁:建議安裝補丁:PHKL_40208;我最終是打上了最新的golden patch
oracle:
------------------------------------
下面是METALINK給出的解釋:
I have checked the information provided. crsd is reporting timeouts when running the checking
actions. The racgmain process on the other hand, executed by racgwrap, remain
running so the number is increasing.
This issue has been reported in Bug 7009245 ""RACGMAIN CHECK" PROCESS NOT TERMINATING", and drill down and
determine the root cause this require to get a stack trace from racgmain
process(es) as well as you should run OSWatcher with a specific parameters,
finally the issue of Bug 7009245 has been fixed by the fix of Bug 6196746
(unpublished bug -- summary is "racgwrap spawns racgmain child process. crsd
timeout kills racgwrap but bot racgmain")
So please perform. the following action plan that intend to solve the issue.
.
ACTION PLAN
============
1. Stop CRS on this node.
2. Make a copy of racgwrap located under $ORACLE_HOME/bin and $CRS_HOME/bin
3. Edit the file racgwrap and modify the last 3 lines from:
.
$ORACLE_HOME/bin/racgmain "$@"
status=$?
exit $status
.
to:
.
# Line added to test fix for Bug 6196746
exec $ORACLE_HOME/bin/racgmain "$@"
.
4. Restart CRS and make sure that all the resources are startes
This should solve the issue. also if you are not able to stop CRS right now, you can try the
solution without crs stop/start.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/11088128/viewspace-675791/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- [20210113]SP2-0232 Input too long. Must be less than 241 characters.txt
- DNS: More than just namesDNS
- C# The file is too long. This operation is currently limited to supporting files less than 2 gigabytes in size.C#MIT
- data too long for column
- More than one file was found with OS independent path
- OpenKruise 2021 規劃曝光:More than workloadsUI
- Error running ‘Application’Command line is too longErrorAPP
- ORA-25307 ENQUEUE RATE TOO HIGH. ENABLE FLOW CONTROLENQ
- ORA-01489: result of string concatenation is too long
- 關於Argument list too long的問題
- Specified key was too long; max key length is 1000 bytes
- File name too long window和linux排查,解決Linux
- auto assign ipv6 for more than /64 prefix, subnet for /112 #164
- 異常解決——The server time zone value 'Öйú±ê׼ʱ¼ä' is unrecognized or represents more than oneServerZed
- JDBC:The server time zone value '�й���ʱ��' is unrecognized or represents more than one time zone.JDBCServerZed
- SAP:CX_SY_READ_SRC_LINE_TOO_LONG解決
- mysql specified key was too long與Index column size too large. The maximum column size is 767 bytes.MySqlIndex
- mysql執行報The server time zone value 'Öйú±ê׼ʱ¼ä' is unrecognized or represents more than one tim......MySqlServerZed
- 刪除大量檔案Argument list too long錯誤解決
- git拉取程式碼報錯filename too long unable to create fileGit
- 偶遇ERROR 1071 (42000): Specified key was too long; max key length is 767 bytesError
- 解決 Specified key was too long ... 767 bytes 的本質問題
- Command line is too long. Shorten command line for JooLunMallApiApplication or also for Spring Boot default configuration?APIAPPSpring Boot
- 執行遷移檔案報錯 1071 Specified key was too long.
- 表結構遷移檔案Identifier name 'xxxxxx' is too long解決IDE
- Java新增構造方法異常異常——Could not autowire. there is more than one bean of '' typeJava構造方法Bean
- cf900D. Unusual Sequences(容斥 莫比烏斯反演)
- Excel為批註設定圖片背景 出現Bad Request - Request Too longExcel
- idea遇見Command line is too long. Shorten command line for Main or also for Application default configuration?IdeaAIAPP
- IntelliJ IDEA 執行專案的時候提示 Command line is too long 錯誤IntelliJIdea
- RMAN備份出現ORA-01801: date format is too long for internal buffer錯誤ORM
- idea在使用git clone 時出現Filename too long的報錯資訊IdeaGit
- Oracle Enqueue WaitsOracleENQAI
- oracle enqueue(zt)OracleENQ
- oracle enqueue typeOracleENQ
- 【k8s】etcd叢集took too long to execute慢日誌告警問題分析K8S
- What is the "WF - Contention'' Enqueue ?ENQ
- Spring Boot Intellij 執行應用的時候 Command line is too long. Shorten command line for 錯誤Spring BootIntelliJ
- IDEA命令列縮短器助你解決此問題:Command line is too long. Shorten command line...Idea命令列