gc current/cr block busy等待事件

賀子_DBA時代發表於2022-02-23

最近遇到一個效能問題,top 5等待事件為 log file sync +gc cr block busy,於是總結下這倆等待事件以及他倆之間的關係

一:gc current/cr block busy等待事件

首先知道:這裡CR和current 是不同的概念,如果是讀的話,那就是cr request,如果是更改的話,那就是current request(當前讀)。
1)gc current block busy 等待事件
When a request needs a block in current mode, it sends a request to the master instance. The requestor evenutally gets the block via cache fusion transfer. However sometimes the block transfer  is delayed due to either the block was being used by a session on another instance or the block transfer was delayed because the holding instance could not writethe corresponding redo records to the online logfile immediately. 
當請求的block是current模式,會傳送一個請求到master 例項,最終請求者透過cache fusion獲取到這個block。但是有時block在transfer過程中會有延時,比如這個block正在被另一個例項的會話使用,或者持有block的例項不能及時的將redo records寫入online logfile。

One can use the session level dynamic performance views v$session and v$session_event to find the programs or sesions causing the most waits on this events 

SQL> select a.sid , a.time_waited , b.program , b.module from v$session_event a , v$session b where a.sid=b.sid and a.event='gc current block busy' order by a.time_waited;


2)gc cr block busy 等待事件  
When a request needs a block in CR mode, it sends a request to the master instance. The requestor evenutally gets the block via cache fusion transfer. However sometimes the block transfer is delayed due toeither the block was being used by a session on another instance or the block transfer was delayed because the holding instance could not write the corresponding redo records to the online logfile immediately. 

One can use the session level dynamic performance views v$session and v$session_event to find the programs or sessions causing the most waits on this events 


SQL>  select a.sid , a.time_waited , b.program , b.module from v$session_event  a ,v$session b where a.sid=b.sid and a.event='gc cr block busy' order by a.time_waited;

3) 相關說明
gc current block busy 等待是RAC中global cache全域性快取當前塊的爭用等待事件, 該等待事件時長由三個部分組成:
 
Time to process current block request in the cache= (pin time + flush time + send time)
gc current block flush time
The current block flush time is part of the service (or processing) time for a currentblock. The pending redo needs to be flushed to the log file by LGWR before LMSsends it. The operation is asynchronous in that LMS queues the request, postsLGWR, and continues processing. The LMS would check its log flush queue forcompletions and then send the block, or go to sleep and be posted by LGWR. Theredo log write time and redo log sync time can influence theoverall service time significantly.
 
flush time 是為了保證Instance Recovery例項恢復機制,而要求每一個current block在本地節點local instance被修改後(modify/update) 必須要將該current block相關的redo 寫入到logfile 後(要求LGWR必須完成寫入後才能返回),才能由LMS程式傳輸給其他節點使用。(前提是當rac中的另一個節點需要讀取的時候才會觸發LMS去傳輸給其他節點,即cache fusion)------這裡就會導致log file sync等待事件的產生!!!

4)gc buffer busy acquire/release
而gc buffer busy acquire/release 往往是 gc current block busy的衍生產品, 當同一例項內的多個程式併發地訪問同一個資料塊時 ,首先發起的程式 將進入 gc current block busy的等待 ,而在 buffer waiter list 上的後續程式 會陷入gc buffer busy acquire/release 等待(A user on the same instance has started a remote operation on the same resource and the request has not completed yet or the block was requested by another node and the block has not been released by the local instance when the new local access was made), 這裡存在一個排隊效應, 即 gc current block busy是緩慢的,那麼在 排隊的gc buffer busy acquire/release就會更慢:

Pin time = (timeto read the block into cache) + (time to modify/process the buffer)
Busy time =(average pin time) * (number of interested users waiting ahead of me)

不侷限於current block (reference AWR Avg global cache current block flush time(ms)),  cr block(Avg global cache cr block flush time (ms)) 也存在flush time。

可以透過設定_cr_server_log_flush to false(LMS are/is waiting for LGWR to flush the pending redo during CR fabrication.Without going too much in to details, you can turn off the behaviour by setting   _cr_server_log_flush to false.) 來禁止cr server flush redo log,但是該引數對於current block的flush time無效, 也強烈不推薦使用。

二:解決辦法
針對gc cr block busy等待事件
1)修改應用,儘量避免跨節點獲取資料,該方法同時對gc current block busy 等待事件flush time也有效果!
2)透過設定_cr_server_log_flush to false,來禁止cr server flush redo log,但是該引數對於current block的flush time無效;
3)可以提高redo log file的磁碟io吞吐能力 (該方法治標不治本,如果 log file parallel write等待事件和log file sync等待事件的時間差 ,如果兩者的時間接近,則說明儲存IO資源緊張是引起log  file sync的主要原因,因為log file parallel write只包括io的部分)
三:為什麼設定_cr_server_log_flush to false對gc current block busy等待事件沒有效果?
首先gc current block busy是針對更新而言的,一個節點更新了某個塊沒有提交,這個時候另一個節點也需要更新這個塊,那麼這個時候需要當前讀,就可能發生gc current block busy等待事件,然後flush time 是為了保證Instance Recovery例項恢復機制,而要求每一個current block在本地節點local instance被修改後(modify/update) 必須要將該current block相關的redo 寫入到logfile 後,(要求LGWR必須完成寫入後才能返回),才能由LMS程式傳輸給其他節點使用。(前提是當rac中的另一個節點需要讀取的時候才會觸發LMS去傳輸給其他節點,即cache fusion),所以gc current block busy等待事件意味著另一個節點是在這個塊的基礎上再次修改,而Oracle rac中各自節點都有各自的undo表空間,所以在剛開始修改塊的節點必須要將該current block相關的redo 寫入到logfile 來保證例項恢復一定能成功!   針對gc cr block busy 等待事件就不一樣了,他是針對跨節點查詢的,不涉及跨節點例項恢復,所以可以透過設定_cr_server_log_flush to false,來禁止cr server flush redo log





來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29654823/viewspace-2857525/,如需轉載,請註明出處,否則將追究法律責任。

相關文章