latch：cache buffers chains解決步驟

問題產生原因：
某天檢視v$session_wait時發現有很多cache buffer chains，但是情況緊急，所以就只是殺了幾個執行時間較長的sql，然後就發現等待事件漸漸消失了。
latch：cache buffers chains解決步驟

找到為何引起此等待事件：

1. 先找到出問題時段的ash
SQL> create table mao_ash as select * from dba_hist_active_sess_history where SAMPLE_TIME between TO_TIMESTAMP ('2013-12-27 10:00:00', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('2013-12-27 12:00:00', 'YYYY-MM-DD HH24:MI:SS');

2. Verify the issue time frame:

select /*+ parallel 8 */ instance_number,sample_id, sample_time, count(*) from mao_ash t
group by instance_number,sample_id, sample_time
order by 3;

INSTANCE_NUMBER SAMPLE_ID SAMPLE_TIME COUNT(*)
2 72736930 2013-12-27 11:14:48.374 1
1 72762620 2013-12-27 11:14:51.059 11 <<<<< Begin--active session突然變為雙數，並且持續了一段時間
1 72762630 2013-12-27 11:15:01.161 11

1 72762970 2013-12-27 11:20:44.756 10
1 72762980 2013-12-27 11:20:54.856 11
1 72762990 2013-12-27 11:21:04.956 15
1 72763000 2013-12-27 11:21:15.056 16
......
1 72763940 2013-12-27 11:37:04.830 11
1 72763950 2013-12-27 11:37:14.930 11
1 72763960 2013-12-27 11:37:25.032 11
1 72763970 2013-12-27 11:37:35.142 12
1 72763980 2013-12-27 11:37:45.242 9 <<<< End---acive session變為單數
1 72763990 2013-12-27 11:37:55.342 8
以上可以定位問題出現的時間段。

3. Verify the wait events:

select t.instance_number,
t.sample_id,
t.sample_time,
t.event,
t.session_state,
--t.r,
t.c
from (select t.*,
--row_number() over(partition by instance_number, sample_id order by c desc) r
rank() over(partition by instance_number, sample_id order by c desc) r
from (select /*+ parallel 8 */ t.*,
count(*) over(partition by instance_number, sample_id, event) c,
row_number() over(partition by instance_number, sample_id, event order by 1) r1
from mao_ash t) t
where r1 = 1) t
where r < 3
order by sample_time, r;

INSTANCE_NUMBER SAMPLE_ID SAMPLE_TIME EVENT SESSION_STATE C
2 72736930 2013-12-27 11:14:48.374 ON CPU 1---在這個時間點，有一個sql在on cpu
1 72762620 2013-12-27 11:14:51.059 ON CPU 9---在這個時間點，有九個sql在on cpu
1 72762620 2013-12-27 11:14:51.059 library cache lock WAITING 1---在這個時間點，有一個library cache lock WAITING 等待事件
1 72762620 2013-12-27 11:14:51.059 cursor: pin S wait on X WAITING 1
......
1 72763100 2013-12-27 11:22:56.079 ON CPU 7
1 72763100 2013-12-27 11:22:56.079 library cache lock WAITING 4
......
1 72763290 2013-12-27 11:26:08.193 ON CPU 10
1 72763300 2013-12-27 11:26:18.291 ON CPU 12
2 72737620 2013-12-27 11:26:25.403 ON CPU 1
1 72763310 2013-12-27 11:26:28.391 ON CPU 11
......
1 72763720 2013-12-27 11:33:22.568 ON CPU 17
1 72763730 2013-12-27 11:33:32.689 ON CPU 18
1 72763740 2013-12-27 11:33:42.788 ON CPU 18
.....;.
備註：等待事件是cache buffers chains，但這裡是有library cache lock引起的，所以給我們的感覺是cache buffer chains，這裡並不能透過p1，p2來定位問題。

4. Find out the holders：
select t.instance_number,
t.sample_time,
t.sample_id,
t.session_id,
t.sql_id,
t.session_type,
t.event,
t.session_state,
--t.blocking_session,
--t.blocking_inst_id,
--t.blocking_session_status,
--t.lv,
--t.r,
t.c
from (select t.*,
row_number() over(partition by instance_number, sample_id order by c desc) r
--rank() over(partition by instance_number, sample_id order by c desc) r
from (select t.*,
count(*) over(partition by instance_number, sample_id, session_id) c,
row_number() over(partition by instance_number, sample_id, session_id order by 1) r1
from (select /*+ parallel 8 */
level lv, connect_by_isleaf isleaf, t.*
from mao_ash t
start with blocking_session is not null
connect by nocycle
prior blocking_session = session_id
and prior t.blocking_session_serial# =
session_serial#
and ((prior sample_time) - sample_time between
interval '-3' second and interval '3' second)) t
where t.isleaf = 1) t
where r1 = 1) t
where r < 3
order by sample_time, r;

INSTANCE_NUMBER SAMPLE_TIME SAMPLE_ID SESSION_ID SQL_ID SESSION_TYPE EVENT SESSION_STATE C
1 2013-12-27 11:09:47.982 72762320 2697 62h7yux977dmw FOREGROUND db file parallel read WAITING 1
1 2013-12-27 11:09:58.082 72762330 2697 62h7yux977dmw FOREGROUND gc cr multi block request WAITING 1
1 2013-12-27 11:10:08.183 72762340 2697 62h7yux977dmw FOREGROUND ON CPU 1
1 2013-12-27 11:10:18.282 72762350 2697 62h7yux977dmw FOREGROUND ON CPU 1
1 2013-12-27 11:10:28.382 72762360 2697 62h7yux977dmw FOREGROUND gc current block 2-way WAITING 1
1 2013-12-27 11:10:38.482 72762370 2697 62h7yux977dmw FOREGROUND ON CPU 1
......sid為2697正在執行62h7yux977dmw的sql，與此同時在11:09:47這個時間點，有一個session正在等待

1 2013-12-27 11:28:39.723 72763440 2720 dts1t1fjha4m2 FOREGROUND gc current block 2-way WAITING 1
1 2013-12-27 11:43:18.608 72764310 2753 BACKGROUND log file parallel write WAITING 1
備註：這個sql很強大，可以找出是罪魁禍首的那個sql，這裡就是62h7yux977dmw了，因為它引起的session等待最多。

5. Find out the which SQL cause the most CPU usage:

select sql_id,count(*)
from mao_ash t
where sample_time >
to_timestamp('2013-12-27 11:30:40', 'yyyy-mm-dd hh24:mi:ss')
and session_state = 'ON CPU'
group by sql_id order by 2 desc;

SQL_ID COUNT(*)
58xvzzydq83f1 350
4fk8mz3jx2898 63
6zwy49juu8wxa 52
ayvngp9bb3dum 48
a3v2gkv5r4gj6 47
451xth6g96cx7 35

結果：
1.調整58xvzzydq83f1，讓sql儘快執行完畢，而不是一直執行著，消耗著cpu
2.找出62h7yux977dmw的sql_text，再做調整。其實這裡62h7yux977dmw可能只執行了一次，可能由於shared_pool比較忙，所以很有可能在v$sql裡找不到。

latch：cache buffers chains解決步驟

相關文章