針對enq: KO - fast object checkpoint的優化
測試庫在大批量更新某個表的資料,總共有2億條,每次更新20萬並sleep 1秒;
此時開發發郵件說對此表執行查詢一直沒有返回結果;登入該server進行檢視;
首先嚐試執行查詢語句
select * from justin t where rownum < 10,果然一直沒有返回結果,貌似hang住了;
檢視此時的資料庫等待事件,該sql正在等待enq: KO - fast object checkpoint;
google一把,tanel poder對此有很詳細的解釋
https://forums.oracle.com/forums/thread.jspa?threadID=855718
You see this event because object level checkpoint which happens when you run direct path reads on a segment (like with serial direct read full table scan or parallel execution full segment scans).
Now the question is how much of your sessions response time is spent waiting for that event - and whether the winning in query speed due direct read/parallel execution outweighs the time spent waiting for this checkpoint.
Waiting for KO enqueue - fast object checkpoint means that your session has sent CKPT process a message with instruction to do object level checkpoint and is waiting for a reply.
CKPT in turn asks DBWR process to perform. the checkpoint and may wait for response.
So you should check what the DBWR and CKPT processes are doing - whether they're stuck waiting for something (such controlfile enqueue) or doing IO or completely idle.
This you can do using v$session_wait or v$session_event...
You can run this:
select sid,program from v$session where program like '%CKPT%' or program like '%DBW%';
and then run
select sid,event,state,p1,p2,p3,seconds_in_wait from v$session_wait where sid in (select sid from v$session where program like '%CKPT%' or program like '%DBW%');
Please post the output here - if the CKPT or DBWR processes are hung waiting for something, these queries would show it.
If they're not completely hung it may be that they're just very slow (due bad disk write performance for example) or a rare chance that the CKPT has somehow missed the message your session sent (or vice versa). But that would be rare...
根據提供的sql檢視現在的資料庫,
SQL> select sid,program from v$session where program like '%CKPT%' or program like '%DBW%';
SID PROGRAM
---------- ------------------------------------------------
253 oracle@localhost.localdomain (CKPT)
2521 oracle@localhost.localdomain (DBW0)
2773 oracle@localhost.localdomain (DBW1)
SQL> select sid,event,state,p1,p2,p3,seconds_in_wait from v$session_wait where sid in (select sid from v$session where program like '%CKPT%' or program like '%DBW%');
SID EVENT STATE P1 P2 P3 SECONDS_IN_WAIT
---------- ------------------------------ ---------- ---------- ---------- ---------- ---------------
253 rdbms ipc message WAITING 300 0 0 3
2521 db file async I/O submit WAITING 967 0 0 1
2773 db file async I/O submit WAITING 1357 0 0 2
ckpt在等待rdbms ipc message,而兩個dbwr都在等待db file async I/O submit;
檢視metalink,1274737.1對db file async I/O submit有詳細描述,
According to the Oracle documentation, the asynch is disabled for normal file systems if FILESYSTEMIO_OPTIONS=NONE. But, when setting DISK_ASYNCH_IO to TRUE, the wait event list shows 'db file async I/O submit'.
According to unpublished the Bug 9649885 DB FILE ASYNC I/O SUBMIT EVENT NOT TRACKED WHEN DISK_ASYCH_IO = TRUE, when DISK_ASYNCH_IO=TRUE,
the wait event 'db file async I/O submit' is posted even if the IO calls cannot be performed asynchronously and this is the current behavior.
'db file async I/O submit' should be treated as 'db file parallel write' in previous releases. Changing the DISK_ASYCH_IO to FALSE removes the wait event 'db file async I/O submit'.
檢視現有引數設定
SQL> show parameter filesystemio_options
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options string NONE
SQL> show parameter disk
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups string
asm_diskstring string
disk_asynch_io boolean TRUE
符合文件描述的情形(稍後將disk_asynch_io設定為false並重啟,dbwr程式的等待事件改為了db file parallel write),不過這都是現象而不是原因;
檢視os效能指標
開啟vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 4 519316 101216 40428 31199052 0 0 9920 688 2292 3954 1 0 86 13 0
0 3 519316 93904 40428 31205652 0 0 3880 13856 2958 4425 0 0 92 8 0
0 4 519316 93832 40424 31205632 0 0 5268 8372 2485 3691 1 1 88 10 0
0 4 519316 93888 40436 31205620 0 0 9268 304 2168 3587 1 0 90 9 0
0 4 519316 93944 40456 31205584 0 0 9408 452 2217 3763 1 0 88 11 0
0 4 519316 94880 40448 31204720 0 0 4160 5656 1818 2442 1 0 90 9 0
發現bi/bo比較高,另外b和wa兩列均有等待,說明目前系統io存在一定問題(--b 等待佇列中的程式數(等待I/O),通常情況下是接近0的,--wa 等待i/o的時間)
檢視I/O情況
[root@localhost oracle]# iostat -d -x -k 1 5
Linux 2.6.18-128.el5 (localhost.localdomain) 08/31/2011
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 331.68 581.19 43.56 4752.48 1508.91 20.04 3.48 5.71 1.59 99.11
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda3 0.00 331.68 581.19 43.56 4752.48 1508.91 20.04 3.48 5.71 1.59 99.11
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 83.00 376.00 33.00 3144.00 432.00 17.49 2.97 6.95 2.44 100.00
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda3 0.00 83.00 376.00 33.00 3144.00 432.00 17.49 2.97 6.95 2.44 100.00
--%util: 一秒中有百分之多少的時間用於 I/O 操作,如果 %util 接近 100%,說明產生的I/O請求太多,I/O系統已經滿負荷,該磁碟可能存在瓶頸。
另外,檢視資料庫的redo log情況,該庫沒有開啟歸檔模式,狀態除了current的全部為active;
SQL> select * from V$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------------- ------------ ---------------
1 1 3501 524288000 512 1 NO ACTIVE 145309090 31-AUG-11 145345927 31-AUG-11
2 1 3502 524288000 512 1 NO CURRENT 145345927 31-AUG-11 2.8147E+14
3 1 3496 524288000 512 1 NO ACTIVE 145154118 31-AUG-11 145187494 31-AUG-11
4 1 3497 524288000 512 1 NO ACTIVE 145187494 31-AUG-11 145210653 31-AUG-11
5 1 3498 524288000 512 1 NO ACTIVE 145210653 31-AUG-11 145237393 31-AUG-11
6 1 3499 524288000 512 1 NO ACTIVE 145237393 31-AUG-11 145271423 31-AUG-11
7 1 3500 524288000 512 1 NO ACTIVE 145271423 31-AUG-11 145309090 31-AUG-11
至此有了個大致的結論:
對錶justin進行DML操作導致磁碟I/O壓力增加,此時再對錶進行select * from justin之類的全表掃描查詢(select count(*) from justin則不會,因為走的是索引掃描),導致所謂的object level checkpoint,
需將buffer cache中的髒資料寫入磁碟,此過程需要很長的等待時間,於是查詢就被阻塞了,通過hanganalyze的跟蹤檔案也可以驗證這一點;
解決方案:
磁碟優化比較麻煩,而且這是測試庫;
增大db_writer_process引數,由2調大為8,以增強資料寫入磁碟的能力;
增大log_buffer,由20多M調大為40M,同時增加3組logfile,避免因redo造成I/O進一步惡化;
調整後重啟資料庫,再次進行測試,發現查詢時候依舊進行enq: KO - fast object checkpoint等待,但是等待時間已經大大的縮短;而磁碟iostat的結果顯示%util平均值為30-40左右;
另外,重啟前db_cache_size為0,但是v$sgastat中顯示buffer_cache為14G多了,為了加快髒資料寫入磁碟的速度,將db_cache_size顯示的指定為2G
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15480802/viewspace-706375/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- enq: RO fast object reuse 和 enq: KO fast object checkpointENQASTObject
- enq: KO - fast object checkpoint 等待事件與 direct path read - 1ENQASTObject事件
- enq: KO - fast object checkpoint 等待事件與 direct path read - 2ENQASTObject事件
- enq: KO - fast object checkpoint 等待事件與 direct path read - 3ENQASTObject事件
- 11.2使用KEEP池導致ENQ: KO - Fast Object Checkpoint等待ENQASTObject
- [20210527]enq KO - fast object checkpoint Final Blocker.txtENQASTObjectBloC
- enq: RO - fast object reuseENQASTObject
- 記一次enq: RO - fast object reuse等待事件ENQASTObject事件
- checkpoint 優化優化
- Spark效能優化:對RDD持久化或CheckPoint操作Spark優化持久化
- [20210527]enq RO - fast object reuse Final Blocker.txtENQASTObjectBloC
- MySQL 針對 like 條件的優化MySql優化
- 如何針對SEO優化您的WordPress Robots.txt優化
- 【譯】針對 Airbnb 清單頁的 React 效能優化AIReact優化
- 10G FAST DUAL 的優化AST優化
- 針對行動網路開發的優化建議優化
- 針對oracle效能的io調優Oracle
- mysql之 CentOS系統針對mysql引數優化MySqlCentOS優化
- fast_start_mttr_target & log_checkpoint_timeoutAST
- 你的網站需要針對移動端優化的10個理由網站優化
- object checkpoint物件檢查點小記Object物件
- 對checkpoint的理解(轉載)
- 【CHECKPOINT】Oracle檢查點優化與故障處理Oracle優化
- 對“checkpoint not completed!”的理解 (zt)
- 10個針對Web開發者的優秀Chrome外掛WebChrome
- 對IN & EXISTS NOT IN & NOT EXISTS的優化優化
- 穩定的高效能才有價值,PBlaze5針對讀請求QoS優化優化
- 針對Report Parameter 的個性化設定例項
- 寫了一個類GeneticOptimizeStrategy,針對VNPY策略遺傳演算法優化演算法優化
- Win10設定在播放視訊時針對視訊進行優化方法Win10優化
- CentOS 6.5針對MongoDB叢集應該調整和優化哪些核心引數?CentOSMongoDB優化
- HTTP Archive:全球網頁平均大小超1M,需要針對移動優化HTTPHive網頁優化
- 【Java技術專題】「效能優化系列」針對Java物件壓縮及序列化技術的探索之路Java優化物件
- Laravel_Model_Cache(針對快速更新的數值屬性進行快取優化的外掛)Laravel快取優化
- 關於mysql ..source針對大批量檔案的優勢MySql
- Oracle優化案例-系統切換引起的enq: SQ - contention(二十八)Oracle優化ENQ
- 對checkpoint的理解一文的學習
- 自己優化的對拍程式優化