針對enq: KO - fast object checkpoint的優化

myownstars發表於2011-08-31

測試庫在大批量更新某個表的資料,總共有2億條,每次更新20萬並sleep 1秒;
此時開發發郵件說對此表執行查詢一直沒有返回結果;登入該server進行檢視;
首先嚐試執行查詢語句
select * from justin t  where rownum < 10,果然一直沒有返回結果,貌似hang住了;
檢視此時的資料庫等待事件,該sql正在等待enq: KO - fast object checkpoint;
google一把,tanel poder對此有很詳細的解釋
https://forums.oracle.com/forums/thread.jspa?threadID=855718
You see this event because object level checkpoint which happens when you run direct path reads on a segment (like with serial direct read full table scan or parallel execution full segment scans).
Now the question is how much of your sessions response time is spent waiting for that event - and whether the winning in query speed due direct read/parallel execution outweighs the time spent waiting for this checkpoint.
Waiting for KO enqueue - fast object checkpoint means that your session has sent CKPT process a message with instruction to do object level checkpoint and is waiting for a reply.
CKPT in turn asks DBWR process to perform. the checkpoint and may wait for response.
So you should check what the DBWR and CKPT processes are doing - whether they're stuck waiting for something (such controlfile enqueue) or doing IO or completely idle.
This you can do using v$session_wait or v$session_event...
You can run this:
select sid,program from v$session where program like '%CKPT%' or program like '%DBW%';
and then run
select sid,event,state,p1,p2,p3,seconds_in_wait from v$session_wait where sid in (select sid from v$session where program like '%CKPT%' or program like '%DBW%');
Please post the output here - if the CKPT or DBWR processes are hung waiting for something, these queries would show it.
If they're not completely hung it may be that they're just very slow (due bad disk write performance for example) or a rare chance that the CKPT has somehow missed the message your session sent (or vice versa). But that would be rare...
根據提供的sql檢視現在的資料庫,
SQL> select sid,program from v$session where program like '%CKPT%' or program like '%DBW%';

       SID PROGRAM
---------- ------------------------------------------------
       253
oracle@localhost.localdomain (CKPT)
      2521
oracle@localhost.localdomain (DBW0)
      2773
oracle@localhost.localdomain (DBW1)

SQL> select sid,event,state,p1,p2,p3,seconds_in_wait from v$session_wait where sid in (select sid from v$session where program like '%CKPT%' or program like '%DBW%');

       SID EVENT                          STATE              P1         P2         P3 SECONDS_IN_WAIT
---------- ------------------------------ ---------- ---------- ---------- ---------- ---------------
       253 rdbms ipc message              WAITING           300          0          0               3
      2521 db file async I/O submit       WAITING           967          0          0               1
      2773 db file async I/O submit       WAITING          1357          0          0               2
ckpt在等待rdbms ipc message,而兩個dbwr都在等待db file async I/O submit;
檢視metalink,1274737.1對db file async I/O submit有詳細描述,
According to the Oracle documentation, the asynch is disabled for normal file systems if FILESYSTEMIO_OPTIONS=NONE. But, when setting DISK_ASYNCH_IO to TRUE, the wait event list shows 'db file async I/O submit'.
According to unpublished the Bug 9649885 DB FILE ASYNC I/O SUBMIT EVENT NOT TRACKED WHEN DISK_ASYCH_IO = TRUE, when DISK_ASYNCH_IO=TRUE,
the wait event 'db file async I/O submit' is posted even if the IO calls cannot be performed asynchronously and this is the current behavior.
'db file async I/O submit' should be treated as 'db file parallel write' in previous releases. Changing the DISK_ASYCH_IO to FALSE removes the wait event 'db file async I/O submit'.
檢視現有引數設定
SQL> show parameter filesystemio_options

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options                 string      NONE
SQL> show parameter disk

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string
asm_diskstring                       string
disk_asynch_io                       boolean     TRUE
符合文件描述的情形(稍後將disk_asynch_io設定為false並重啟,dbwr程式的等待事件改為了db file parallel write),不過這都是現象而不是原因;
檢視os效能指標
開啟vmstat
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
0  4 519316 101216  40428 31199052    0    0  9920   688 2292 3954  1  0 86 13  0
0  3 519316  93904  40428 31205652    0    0  3880 13856 2958 4425  0  0 92  8  0
0  4 519316  93832  40424 31205632    0    0  5268  8372 2485 3691  1  1 88 10  0
0  4 519316  93888  40436 31205620    0    0  9268   304 2168 3587  1  0 90  9  0
0  4 519316  93944  40456 31205584    0    0  9408   452 2217 3763  1  0 88 11  0
0  4 519316  94880  40448 31204720    0    0  4160  5656 1818 2442  1  0 90  9  0
發現bi/bo比較高,另外b和wa兩列均有等待,說明目前系統io存在一定問題(--b 等待佇列中的程式數(等待I/O),通常情況下是接近0的,--wa 等待i/o的時間)
檢視I/O情況
[root@localhost oracle]# iostat -d -x -k 1 5
Linux 2.6.18-128.el5 (localhost.localdomain)    08/31/2011

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   331.68 581.19 43.56  4752.48  1508.91    20.04     3.48    5.71   1.59  99.11
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00   331.68 581.19 43.56  4752.48  1508.91    20.04     3.48    5.71   1.59  99.11

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00    83.00 376.00 33.00  3144.00   432.00    17.49     2.97    6.95   2.44 100.00
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda3              0.00    83.00 376.00 33.00  3144.00   432.00    17.49     2.97    6.95   2.44 100.00
--%util: 一秒中有百分之多少的時間用於 I/O 操作,如果 %util 接近 100%,說明產生的I/O請求太多,I/O系統已經滿負荷,該磁碟可能存在瓶頸。

另外,檢視資料庫的redo log情況,該庫沒有開啟歸檔模式,狀態除了current的全部為active;
SQL> select * from V$log;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIME      NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------------- ------------ ---------------
         1          1       3501  524288000        512          1 NO  ACTIVE               145309090 31-AUG-11          145345927 31-AUG-11
         2          1       3502  524288000        512          1 NO  CURRENT              145345927 31-AUG-11         2.8147E+14
         3          1       3496  524288000        512          1 NO  ACTIVE               145154118 31-AUG-11          145187494 31-AUG-11
         4          1       3497  524288000        512          1 NO  ACTIVE               145187494 31-AUG-11          145210653 31-AUG-11
         5          1       3498  524288000        512          1 NO  ACTIVE               145210653 31-AUG-11          145237393 31-AUG-11
         6          1       3499  524288000        512          1 NO  ACTIVE               145237393 31-AUG-11          145271423 31-AUG-11
         7          1       3500  524288000        512          1 NO  ACTIVE               145271423 31-AUG-11          145309090 31-AUG-11

至此有了個大致的結論:
對錶justin進行DML操作導致磁碟I/O壓力增加,此時再對錶進行select * from justin之類的全表掃描查詢(select count(*) from justin則不會,因為走的是索引掃描),導致所謂的object level checkpoint,
需將buffer cache中的髒資料寫入磁碟,此過程需要很長的等待時間,於是查詢就被阻塞了,通過hanganalyze的跟蹤檔案也可以驗證這一點;

解決方案:
磁碟優化比較麻煩,而且這是測試庫;
增大db_writer_process引數,由2調大為8,以增強資料寫入磁碟的能力;
增大log_buffer,由20多M調大為40M,同時增加3組logfile,避免因redo造成I/O進一步惡化;
調整後重啟資料庫,再次進行測試,發現查詢時候依舊進行enq: KO - fast object checkpoint等待,但是等待時間已經大大的縮短;而磁碟iostat的結果顯示%util平均值為30-40左右;

另外,重啟前db_cache_size為0,但是v$sgastat中顯示buffer_cache為14G多了,為了加快髒資料寫入磁碟的速度,將db_cache_size顯示的指定為2G

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15480802/viewspace-706375/,如需轉載,請註明出處,否則將追究法律責任。

相關文章