gc current request 引起長期鎖表的故障
gc current request 引起長期鎖表的故障
故障描述:一個數倉系統長期跑批時出現鎖表,鎖表偶爾嚴重時影響整體跑批流程,進而影響第2天業務執行。根據業務請求需要進行最佳化。最佳化目標儘量減少鎖表時間(業務人員最低要求不影響整體跑批流程)。
業務人員訴求分析:找出嚴重鎖表原因,解決鎖表嚴重問題不影響整體跑批即可。
DBA解決思路分析:現象=》AWR/ASH/ADDM報告和呼叫(監控foglight/bmc)等=》結合經驗分析結果=》結合現象+技術原理分析=》最終確定問題原因+解決方案=》長期觀察是否解決問題
1.系統環境
SQL> select * from gv$version;
INST_ID BANNER
---------- --------------------------------------------------------------------------------
2 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2 PL/SQL Release 11.2.0.4.0 - Production
2 CORE 11.2.0.4.0 Production
2 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
2 NLSRTL Version 11.2.0.4.0 - Production
1 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
1 PL/SQL Release 11.2.0.4.0 - Production
1 CORE 11.2.0.4.0 Production
1 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
1 NLSRTL Version 11.2.0.4.0 - Production
10 rows selected.
2.故障時間點確認
Troubleshooting:資料說話。
我們這裡用foglight(DB監控軟體)來抓取資料(CPU/IO/MEMORY波峰),確定故障時間點。
2.1 以等待事件為基礎資料
7天統計如下:
截圖省略...
9月13號,業務人員報故障的時間段
截圖省略...
==經DBA分析,決定選擇00:00:00--01:00:00和03:00:00--04:00:00進行分析==
3.根據效能分析工具AWR的資料進行分析
00:00:00--01:00:00 AWR報告
3.1 因為是鎖表也是一個種等待事件,所以直接快速關注events
==node A==
Top 10 Foreground Events by Total Wait Time
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
enq: CF - contention 3,651 13.7K 3746 57.9 Other
enq: TX - row lock contention 51 3610.8 70800 15.3 Application
direct path read 10,652 1705.3 160 7.2 User I/O
db file scattered read 35,561 1056.8 30 4.5 User I/O
direct path write 22,929 989.5 43 4.2 User I/O
DB CPU 530.4 2.2
direct path read temp 18,124 270.6 15 1.1 User I/O
db file sequential read 46,447 264.3 6 1.1 User I/O
enq: HW - contention 3,241 199.1 61 .8 Configuration
gc current block busy 3,794 164.5 43 .7 Cluster
==node B==
Top 10 Foreground Events by Total Wait Time
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
enq: CF - contention 7,114 6801.1 956 30.9 Other
direct path write 102,606 4393.3 43 19.9 User I/O
direct path read 28,670 4047.9 141 18.4 User I/O
db file scattered read 91,564 2344.5 26 10.6 User I/O
DB CPU 1296 5.9
gc current request 1 383.1 383095 1.7 Cluster
db file parallel read 3,286 284.7 87 1.3 User I/O
db file sequential read 86,524 247 3 1.1 User I/O
CSS initialization 298 222.5 747 1.0 Other
control file parallel write 31,876 148.7 5 .7 System I/O
Performance Degradation as a Result of 'enq: CF - contention' (文件 ID 1072417.1)
Waits for 'Enq: TX - ...' Type Events - Transaction (TX) Lock Example Scenarios (文件 ID 62354.1)
主要問題:
1.enq: CF - contention 推斷分析:redo log 產生過大且頻繁
2.enq: TX - row lock contention 推斷分析:應用設計有問題:長事務,未批次提交
4.驗證推斷分析的結果
檢視等待事件
00:00:00--01:00:00 EVENT TOP 10
INSTANCE_NUMBER EVENT CNT
--------------- ---------------------------------------- ----------
1 enq: CF - contention 1338
2 enq: CF - contention 690
2 direct path write 382
1 enq: TX - row lock contention 360
2 direct path read 321
2 db file scattered read 251
1 db file scattered read 94
1 direct path read 65
2 PX Deq: Table Q Get Keys 64
==此時間段,還有事務進行的,只是 redo log 等待嚴重。
00:00:00--08:00:00 EVENT TOP 10
(08:00:00為KILL LOCK TABLE後正常跑批流程的時間點)
INSTANCE_NUMBER EVENT CNT
--------------- ---------------------------------------- ----------
1 enq: TX - row lock contention 2878
2 gc current request 2171
2 direct path read 2005
1 direct path read 1827
1 enq: CF - contention 1338
2 Disk file operations I/O 900
1 Disk file operations I/O 888
2 enq: CF - contention 690
1 db file sequential read 472
9 rows selected.
==此時間段,基本上已經沒有事務進行了,完成被TX和gc current request等待所佔據。
00:00:00--01:00:00
==node A==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 6.6 3.2 0.40 0.17
DB CPU(s): 0.2 0.1 0.01 0.00
Redo size (bytes): 1,669,685.0 814,982.5
==node B==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 6.1 3.1 0.24 0.17
DB CPU(s): 0.4 0.2 0.01 0.01
Redo size (bytes): 2,170,422.3 1,082,389.6
03:00:00--04:00:00
==node A==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 1.1 0.6 0.06 0.02
DB CPU(s): 0.0 0.0 0.00 0.00
Redo size (bytes): 329,441.6 183,969.1
==node B==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 1.0 0.6 0.10 0.03
DB CPU(s): 0.0 0.0 0.00 0.00
Redo size (bytes): 1,489.5 840.2
00:00:00--01:00:00與03:00:00--04:00:00時間段,每秒產生redo size比值
node A :814,982/183,969 ==4.4
node B :1,082,389/840 ==1288.5
==根據redo size每秒產生的量,可以推斷出3-4這個時間段與0-1時間段的事務量劇減,可以說沒有事務完成,資料庫完全處於一個HANG的狀態下。
下一步看看是誰HANG了整個資料庫
00:00:00--01:00:00
enq: CF - contention =>
SQL_ID COUNT(1)
------------- ----------
b9zh67r40t24q 461
9tw8kwwj97zub 353
c404bhf4rjaum 204
66pb0fp8jt1c4 197
5dj472vrff0k6 127
SQL_TEXT
--------------------------------------------------------------------------------
INSERT /*+append parallel 32*/ INTO TRANSACTION_DETAIL(AGREEMENT_SEQ,CHARGE_CDE,
INSERT /*+append parallel 32*/ INTO SAP_INTERMEDIATE(CONTRACT_TYP,F_ACCOUNT_CD,F
INSERT /*+append parallel 32*/ INTO DC_FILE_DETAIL_RECORD(ACCOUNT_NBR_TRACE,ACCO
INSERT /*+append parallel 32*/ INTO RECEIVABLE_PAID(ACCOUNT_NBR,ACTUAL_REALIZATI
INSERT /*+append parallel 32*/ INTO INCOME_RECOGNITION(AGREEMENT_SEQ,CONTRACT_ID
enq: TX - row lock contention =>
SQL_TEXT
--------------------------------------------------------------------------------
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND r
00:00:00--08:00:00
gc current request =>
SQL_TEXT
--------------------------------------------------------------------------------
insert /*+append*/ into F_NS_PD_BMC_DETAIL(ADDRESS_ONE,ADDRESS_THREE,ADDRE
==據業務人員所述,當晚kill lock table F_NS_PD_BMC_DETAIL,資料庫就正常執行了
==DBA工具分析也關注到了這張問題表F_NS_PD_BMC_DETAIL,這物件伴隨著gc current request等待事件。
所以這個 gc current request 等待事件值得分析。
接下來分析gc current request產生的原因:
分析思路如下:
A.什麼時候開始產生的gc current request等待事件?
B.什麼時候結束了gc current request等待事件?
C.這個gc current request等待事件與BUG是否有關係?
A.什麼時候開始產生的gc current request等待事件?
gc current request 開始
SQL> select * from (
2 select INSTANCE_NUMBER,event,sql_id,to_char(SAMPLE_TIME,'yyyy-mm-dd hh24:mi:ss')
3 from dba_hist_active_sess_history
4 where to_char(SAMPLE_TIME,'yyyy-mm-dd')='2017-09-13'
5 and event like 'gc current request' order by SAMPLE_TIME) where rownum<20;
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 00:54:26
2 gc current request fz486aazdcx0h 2017-09-13 00:54:36
2 gc current request fz486aazdcx0h 2017-09-13 00:54:46
2 gc current request fz486aazdcx0h 2017-09-13 00:54:56
2 gc current request fz486aazdcx0h 2017-09-13 00:55:06
2 gc current request fz486aazdcx0h 2017-09-13 00:55:16
2 gc current request fz486aazdcx0h 2017-09-13 00:55:26
2 gc current request fz486aazdcx0h 2017-09-13 00:55:36
2 gc current request fz486aazdcx0h 2017-09-13 00:55:46
2 gc current request fz486aazdcx0h 2017-09-13 00:55:56
2 gc current request fz486aazdcx0h 2017-09-13 00:56:06
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 00:56:16
2 gc current request fz486aazdcx0h 2017-09-13 00:56:26
2 gc current request fz486aazdcx0h 2017-09-13 00:56:36
2 gc current request fz486aazdcx0h 2017-09-13 00:56:46
2 gc current request fz486aazdcx0h 2017-09-13 00:56:56
2 gc current request fz486aazdcx0h 2017-09-13 00:57:06
2 gc current request fz486aazdcx0h 2017-09-13 00:57:16
2 gc current request fz486aazdcx0h 2017-09-13 00:57:26
B.什麼時候結束了gc current request等待事件?
gc current request 結束
SQL> select * from (
2 select INSTANCE_NUMBER,event,sql_id,to_char(SAMPLE_TIME,'yyyy-mm-dd hh24:mi:ss')
3 from dba_hist_active_sess_history
4 where to_char(SAMPLE_TIME,'yyyy-mm-dd')='2017-09-13'
5 and event like 'gc current request' order by SAMPLE_TIME desc) where rownum<20;
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 06:56:22
2 gc current request fz486aazdcx0h 2017-09-13 06:56:12
2 gc current request fz486aazdcx0h 2017-09-13 06:56:02
2 gc current request fz486aazdcx0h 2017-09-13 06:55:52
2 gc current request fz486aazdcx0h 2017-09-13 06:55:42
2 gc current request fz486aazdcx0h 2017-09-13 06:55:32
2 gc current request fz486aazdcx0h 2017-09-13 06:55:22
2 gc current request fz486aazdcx0h 2017-09-13 06:55:12
2 gc current request fz486aazdcx0h 2017-09-13 06:55:02
2 gc current request fz486aazdcx0h 2017-09-13 06:54:52
2 gc current request fz486aazdcx0h 2017-09-13 06:54:42
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 06:54:32
2 gc current request fz486aazdcx0h 2017-09-13 06:54:22
2 gc current request fz486aazdcx0h 2017-09-13 06:54:12
2 gc current request fz486aazdcx0h 2017-09-13 06:54:02
2 gc current request fz486aazdcx0h 2017-09-13 06:53:52
2 gc current request fz486aazdcx0h 2017-09-13 06:53:42
2 gc current request fz486aazdcx0h 2017-09-13 06:53:32
2 gc current request fz486aazdcx0h 2017-09-13 06:53:22
==SQL_ID='fz486aazdcx0h' 從 2017-09-13 00:54:26 開始,每10秒鐘出現一次gc current request等待。直到2017-09-13 06:56:22,kill lock table後結束gc current request等待事件。
顯然 insert /*+append*/ into F_NS_PD_BMC_DETAIL(跑批中主要步驟)沒有完成,是與gc current request 有著密不可分的關係。
解決gc current request問題就可以解決跑批的問題了。
C.這個gc current request等待事件與BUG是否有關係?
已經定位根本問題了,gc current request問題。
ETL process hangs when using an Oracle 11g RAC database (文件 ID 1996100.1)
解決辦法:
Option 1
Set the "_gc_read_mostly_locking" parameter to FALSE using the following steps:
a. Stop STARETL Process. (If running on linux, you can check the process by running "ps -ef | grep star". If any process is found it can be terminated using the "kill" command)
b. Connect Oracle Database Hosting The STARUSER schema as the "SYS" User.
c. Backup the content of spfile to pfile and set _gc_read_mostly_locking to FALSE using the below commands:
SQL> create pfile='/oracle/product/base/oradb/admin/P6_test/adump/bakspfile.ora' from spfile;
SQL> alter system set "_gc_read_mostly_locking"=FALSE scope=spfile;
d. Restart Oracle Database hosting the STARUSER schema.
e. Rerun STARETL.
enq: TX - row lock contention =>
SQL_TEXT
--------------------------------------------------------------------------------
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND r
經7天觀察,跑批流程一切正常再沒有出現進行不去的情況。已經解決了業務人員的基本要求。
DBA分析問題,不能已解決問題為標準,其實構架上應用上也是存在問題的。所以,繼續研究一下。
在分析過程中,還發現不少問題。
1.鎖爭用嚴重。RAC並行使用有問題。分庫分表應用構架有問題。同時的業務模組(insert into),同時在2個節點進行插入操作。
2.logfile 檔案爭用嚴重;批次insert,沒有分批造成過多的redo log和archive log 。
3.長時間的enq: TX - row lock contention 竟然是一個程式模組引起的。寫死了,沒有提交動作程式也是奇葩。
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND ... 不 commit,也是一個奇葩的程式了。
########################################################################################
版權所有,文章允許轉載,但必須以連結方式註明源地址,否則追究法律責任!【QQ交流群:53993419】
QQ:14040928 E-mail:dbadoudou@163.com
本文連結: http://blog.itpub.net/26442936/viewspace-2145624/
########################################################################################
故障描述:一個數倉系統長期跑批時出現鎖表,鎖表偶爾嚴重時影響整體跑批流程,進而影響第2天業務執行。根據業務請求需要進行最佳化。最佳化目標儘量減少鎖表時間(業務人員最低要求不影響整體跑批流程)。
業務人員訴求分析:找出嚴重鎖表原因,解決鎖表嚴重問題不影響整體跑批即可。
DBA解決思路分析:現象=》AWR/ASH/ADDM報告和呼叫(監控foglight/bmc)等=》結合經驗分析結果=》結合現象+技術原理分析=》最終確定問題原因+解決方案=》長期觀察是否解決問題
1.系統環境
SQL> select * from gv$version;
INST_ID BANNER
---------- --------------------------------------------------------------------------------
2 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2 PL/SQL Release 11.2.0.4.0 - Production
2 CORE 11.2.0.4.0 Production
2 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
2 NLSRTL Version 11.2.0.4.0 - Production
1 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
1 PL/SQL Release 11.2.0.4.0 - Production
1 CORE 11.2.0.4.0 Production
1 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
1 NLSRTL Version 11.2.0.4.0 - Production
10 rows selected.
2.故障時間點確認
Troubleshooting:資料說話。
我們這裡用foglight(DB監控軟體)來抓取資料(CPU/IO/MEMORY波峰),確定故障時間點。
2.1 以等待事件為基礎資料
7天統計如下:
截圖省略...
9月13號,業務人員報故障的時間段
截圖省略...
==經DBA分析,決定選擇00:00:00--01:00:00和03:00:00--04:00:00進行分析==
3.根據效能分析工具AWR的資料進行分析
00:00:00--01:00:00 AWR報告
3.1 因為是鎖表也是一個種等待事件,所以直接快速關注events
==node A==
Top 10 Foreground Events by Total Wait Time
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
enq: CF - contention 3,651 13.7K 3746 57.9 Other
enq: TX - row lock contention 51 3610.8 70800 15.3 Application
direct path read 10,652 1705.3 160 7.2 User I/O
db file scattered read 35,561 1056.8 30 4.5 User I/O
direct path write 22,929 989.5 43 4.2 User I/O
DB CPU 530.4 2.2
direct path read temp 18,124 270.6 15 1.1 User I/O
db file sequential read 46,447 264.3 6 1.1 User I/O
enq: HW - contention 3,241 199.1 61 .8 Configuration
gc current block busy 3,794 164.5 43 .7 Cluster
==node B==
Top 10 Foreground Events by Total Wait Time
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
enq: CF - contention 7,114 6801.1 956 30.9 Other
direct path write 102,606 4393.3 43 19.9 User I/O
direct path read 28,670 4047.9 141 18.4 User I/O
db file scattered read 91,564 2344.5 26 10.6 User I/O
DB CPU 1296 5.9
gc current request 1 383.1 383095 1.7 Cluster
db file parallel read 3,286 284.7 87 1.3 User I/O
db file sequential read 86,524 247 3 1.1 User I/O
CSS initialization 298 222.5 747 1.0 Other
control file parallel write 31,876 148.7 5 .7 System I/O
Performance Degradation as a Result of 'enq: CF - contention' (文件 ID 1072417.1)
Waits for 'Enq: TX - ...' Type Events - Transaction (TX) Lock Example Scenarios (文件 ID 62354.1)
主要問題:
1.enq: CF - contention 推斷分析:redo log 產生過大且頻繁
2.enq: TX - row lock contention 推斷分析:應用設計有問題:長事務,未批次提交
4.驗證推斷分析的結果
檢視等待事件
00:00:00--01:00:00 EVENT TOP 10
INSTANCE_NUMBER EVENT CNT
--------------- ---------------------------------------- ----------
1 enq: CF - contention 1338
2 enq: CF - contention 690
2 direct path write 382
1 enq: TX - row lock contention 360
2 direct path read 321
2 db file scattered read 251
1 db file scattered read 94
1 direct path read 65
2 PX Deq: Table Q Get Keys 64
==此時間段,還有事務進行的,只是 redo log 等待嚴重。
00:00:00--08:00:00 EVENT TOP 10
(08:00:00為KILL LOCK TABLE後正常跑批流程的時間點)
INSTANCE_NUMBER EVENT CNT
--------------- ---------------------------------------- ----------
1 enq: TX - row lock contention 2878
2 gc current request 2171
2 direct path read 2005
1 direct path read 1827
1 enq: CF - contention 1338
2 Disk file operations I/O 900
1 Disk file operations I/O 888
2 enq: CF - contention 690
1 db file sequential read 472
9 rows selected.
==此時間段,基本上已經沒有事務進行了,完成被TX和gc current request等待所佔據。
00:00:00--01:00:00
==node A==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 6.6 3.2 0.40 0.17
DB CPU(s): 0.2 0.1 0.01 0.00
Redo size (bytes): 1,669,685.0 814,982.5
==node B==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 6.1 3.1 0.24 0.17
DB CPU(s): 0.4 0.2 0.01 0.01
Redo size (bytes): 2,170,422.3 1,082,389.6
03:00:00--04:00:00
==node A==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 1.1 0.6 0.06 0.02
DB CPU(s): 0.0 0.0 0.00 0.00
Redo size (bytes): 329,441.6 183,969.1
==node B==
Load Profile
Per Second Per Transaction Per Exec Per Call
DB Time(s): 1.0 0.6 0.10 0.03
DB CPU(s): 0.0 0.0 0.00 0.00
Redo size (bytes): 1,489.5 840.2
00:00:00--01:00:00與03:00:00--04:00:00時間段,每秒產生redo size比值
node A :814,982/183,969 ==4.4
node B :1,082,389/840 ==1288.5
==根據redo size每秒產生的量,可以推斷出3-4這個時間段與0-1時間段的事務量劇減,可以說沒有事務完成,資料庫完全處於一個HANG的狀態下。
下一步看看是誰HANG了整個資料庫
00:00:00--01:00:00
enq: CF - contention =>
SQL_ID COUNT(1)
------------- ----------
b9zh67r40t24q 461
9tw8kwwj97zub 353
c404bhf4rjaum 204
66pb0fp8jt1c4 197
5dj472vrff0k6 127
SQL_TEXT
--------------------------------------------------------------------------------
INSERT /*+append parallel 32*/ INTO TRANSACTION_DETAIL(AGREEMENT_SEQ,CHARGE_CDE,
INSERT /*+append parallel 32*/ INTO SAP_INTERMEDIATE(CONTRACT_TYP,F_ACCOUNT_CD,F
INSERT /*+append parallel 32*/ INTO DC_FILE_DETAIL_RECORD(ACCOUNT_NBR_TRACE,ACCO
INSERT /*+append parallel 32*/ INTO RECEIVABLE_PAID(ACCOUNT_NBR,ACTUAL_REALIZATI
INSERT /*+append parallel 32*/ INTO INCOME_RECOGNITION(AGREEMENT_SEQ,CONTRACT_ID
enq: TX - row lock contention =>
SQL_TEXT
--------------------------------------------------------------------------------
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND r
00:00:00--08:00:00
gc current request =>
SQL_TEXT
--------------------------------------------------------------------------------
insert /*+append*/ into F_NS_PD_BMC_DETAIL(ADDRESS_ONE,ADDRESS_THREE,ADDRE
==據業務人員所述,當晚kill lock table F_NS_PD_BMC_DETAIL,資料庫就正常執行了
==DBA工具分析也關注到了這張問題表F_NS_PD_BMC_DETAIL,這物件伴隨著gc current request等待事件。
所以這個 gc current request 等待事件值得分析。
接下來分析gc current request產生的原因:
分析思路如下:
A.什麼時候開始產生的gc current request等待事件?
B.什麼時候結束了gc current request等待事件?
C.這個gc current request等待事件與BUG是否有關係?
A.什麼時候開始產生的gc current request等待事件?
gc current request 開始
SQL> select * from (
2 select INSTANCE_NUMBER,event,sql_id,to_char(SAMPLE_TIME,'yyyy-mm-dd hh24:mi:ss')
3 from dba_hist_active_sess_history
4 where to_char(SAMPLE_TIME,'yyyy-mm-dd')='2017-09-13'
5 and event like 'gc current request' order by SAMPLE_TIME) where rownum<20;
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 00:54:26
2 gc current request fz486aazdcx0h 2017-09-13 00:54:36
2 gc current request fz486aazdcx0h 2017-09-13 00:54:46
2 gc current request fz486aazdcx0h 2017-09-13 00:54:56
2 gc current request fz486aazdcx0h 2017-09-13 00:55:06
2 gc current request fz486aazdcx0h 2017-09-13 00:55:16
2 gc current request fz486aazdcx0h 2017-09-13 00:55:26
2 gc current request fz486aazdcx0h 2017-09-13 00:55:36
2 gc current request fz486aazdcx0h 2017-09-13 00:55:46
2 gc current request fz486aazdcx0h 2017-09-13 00:55:56
2 gc current request fz486aazdcx0h 2017-09-13 00:56:06
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 00:56:16
2 gc current request fz486aazdcx0h 2017-09-13 00:56:26
2 gc current request fz486aazdcx0h 2017-09-13 00:56:36
2 gc current request fz486aazdcx0h 2017-09-13 00:56:46
2 gc current request fz486aazdcx0h 2017-09-13 00:56:56
2 gc current request fz486aazdcx0h 2017-09-13 00:57:06
2 gc current request fz486aazdcx0h 2017-09-13 00:57:16
2 gc current request fz486aazdcx0h 2017-09-13 00:57:26
B.什麼時候結束了gc current request等待事件?
gc current request 結束
SQL> select * from (
2 select INSTANCE_NUMBER,event,sql_id,to_char(SAMPLE_TIME,'yyyy-mm-dd hh24:mi:ss')
3 from dba_hist_active_sess_history
4 where to_char(SAMPLE_TIME,'yyyy-mm-dd')='2017-09-13'
5 and event like 'gc current request' order by SAMPLE_TIME desc) where rownum<20;
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 06:56:22
2 gc current request fz486aazdcx0h 2017-09-13 06:56:12
2 gc current request fz486aazdcx0h 2017-09-13 06:56:02
2 gc current request fz486aazdcx0h 2017-09-13 06:55:52
2 gc current request fz486aazdcx0h 2017-09-13 06:55:42
2 gc current request fz486aazdcx0h 2017-09-13 06:55:32
2 gc current request fz486aazdcx0h 2017-09-13 06:55:22
2 gc current request fz486aazdcx0h 2017-09-13 06:55:12
2 gc current request fz486aazdcx0h 2017-09-13 06:55:02
2 gc current request fz486aazdcx0h 2017-09-13 06:54:52
2 gc current request fz486aazdcx0h 2017-09-13 06:54:42
INSTANCE_NUMBER EVENT SQL_ID TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request fz486aazdcx0h 2017-09-13 06:54:32
2 gc current request fz486aazdcx0h 2017-09-13 06:54:22
2 gc current request fz486aazdcx0h 2017-09-13 06:54:12
2 gc current request fz486aazdcx0h 2017-09-13 06:54:02
2 gc current request fz486aazdcx0h 2017-09-13 06:53:52
2 gc current request fz486aazdcx0h 2017-09-13 06:53:42
2 gc current request fz486aazdcx0h 2017-09-13 06:53:32
2 gc current request fz486aazdcx0h 2017-09-13 06:53:22
==SQL_ID='fz486aazdcx0h' 從 2017-09-13 00:54:26 開始,每10秒鐘出現一次gc current request等待。直到2017-09-13 06:56:22,kill lock table後結束gc current request等待事件。
顯然 insert /*+append*/ into F_NS_PD_BMC_DETAIL(跑批中主要步驟)沒有完成,是與gc current request 有著密不可分的關係。
解決gc current request問題就可以解決跑批的問題了。
C.這個gc current request等待事件與BUG是否有關係?
已經定位根本問題了,gc current request問題。
ETL process hangs when using an Oracle 11g RAC database (文件 ID 1996100.1)
解決辦法:
Option 1
Set the "_gc_read_mostly_locking" parameter to FALSE using the following steps:
a. Stop STARETL Process. (If running on linux, you can check the process by running "ps -ef | grep star". If any process is found it can be terminated using the "kill" command)
b. Connect Oracle Database Hosting The STARUSER schema as the "SYS" User.
c. Backup the content of spfile to pfile and set _gc_read_mostly_locking to FALSE using the below commands:
SQL> create pfile='/oracle/product/base/oradb/admin/P6_test/adump/bakspfile.ora' from spfile;
SQL> alter system set "_gc_read_mostly_locking"=FALSE scope=spfile;
d. Restart Oracle Database hosting the STARUSER schema.
e. Rerun STARETL.
enq: TX - row lock contention =>
SQL_TEXT
--------------------------------------------------------------------------------
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND r
經7天觀察,跑批流程一切正常再沒有出現進行不去的情況。已經解決了業務人員的基本要求。
DBA分析問題,不能已解決問題為標準,其實構架上應用上也是存在問題的。所以,繼續研究一下。
在分析過程中,還發現不少問題。
1.鎖爭用嚴重。RAC並行使用有問題。分庫分表應用構架有問題。同時的業務模組(insert into),同時在2個節點進行插入操作。
2.logfile 檔案爭用嚴重;批次insert,沒有分批造成過多的redo log和archive log 。
3.長時間的enq: TX - row lock contention 竟然是一個程式模組引起的。寫死了,沒有提交動作程式也是奇葩。
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND ... 不 commit,也是一個奇葩的程式了。
########################################################################################
版權所有,文章允許轉載,但必須以連結方式註明源地址,否則追究法律責任!【QQ交流群:53993419】
QQ:14040928 E-mail:dbadoudou@163.com
本文連結: http://blog.itpub.net/26442936/viewspace-2145624/
########################################################################################
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-2156405/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【等待事件】global cache cr request/gc current request事件GC
- 解決gc current request等待事件GC事件
- gc current request等待時間處理GC
- gc current block pin time gc current block flush time 疑惑GCBloC
- GC故障2GC
- GC故障3GC
- gc current request等待事件,介紹p1,p2,p3轉換方法GC事件
- gc cr request等待事件GC事件
- gc current/cr block busy等待事件GCBloC事件
- ORACLE 鎖表的解決方法及查詢引起鎖表SQL語句[轉]OracleSQL
- ORACLE 鎖表的解決方法及查詢引起鎖表SQL語句方法OracleSQL
- mysql innodb 索引失效問題引起表級鎖MySql索引
- gc cr request 'gcs log flush sync'GC
- gc current block busy和LMS優先順序GCBloC
- 【故障處理】佇列等待之TX - allocate ITL entry引起的死鎖處理佇列
- DNS 引起經典RAC故障DNS
- 一個由於侵入框架引起的故障框架
- 故障分析 | show processlist 引起的效能問題
- 磁碟故障引起的系統變慢定位
- 儲存過程遇到gc cr request等待儲存過程GC
- oracle自治事務引起的死鎖Oracle
- 由Oracle Bug引起的AWR Snapshot收集故障Oracle
- namenode gc導致的故障一例薦GC
- 如何診斷RAC系統中的'gc cr multi block request'?GCBloC
- 【Mysql】兩條select for update引起的死鎖MySql
- 併發插入引起的死鎖問題排查
- 事務註解(@Transactional)引起的資料覆蓋故障
- 記錄一次因subprocess PIPE 引起的線上故障
- 使用者被鎖-中介軟體配置引起的
- 查詢鎖表 - 誰鎖住了我的表
- 由於域名解析引起的dataguard傳輸日誌故障
- Processes引數設定引起的故障解決一例
- timesten 資料庫sys.tables共享鎖長期存在的處理資料庫
- 非常牛叉的樓主,自己的問題其實就是答案--用springmvc上傳檔案時報The current request is not a multipart request異常SpringMVC
- 密碼過期引起的ssh無法登陸密碼
- Oracle的鎖表與解鎖Oracle
- 頻繁GC (Allocation Failure)及young gc時間過長分析GCAI
- Oracle-真實環境的丟失current redo log file的故障恢復Oracle