latch:cache buffers chains解決步驟

531968912發表於2017-12-19

latch:cache buffers chains解決步驟

問題產生原因:
 某天檢視v$session_wait時發現有很多cache buffer chains,但是情況緊急,所以就只是殺了幾個執行時間較長的sql,然後就發現等待事件漸漸消失了。
 latch:cache buffers chains解決步驟

找到為何引起此等待事件:

1. 先找到出問題時段的ash
SQL> create table mao_ash  as select * from dba_hist_active_sess_history where SAMPLE_TIME between TO_TIMESTAMP ('2013-12-27 10:00:00', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP ('2013-12-27 12:00:00', 'YYYY-MM-DD HH24:MI:SS');

2. Verify the issue time frame:

select /*+ parallel 8 */ instance_number,sample_id, sample_time, count(*)  from  mao_ash   t
group by instance_number,sample_id, sample_time
order by 3;

INSTANCE_NUMBER SAMPLE_ID SAMPLE_TIME COUNT(*)
2 72736930 2013-12-27 11:14:48.374 1
1 72762620 2013-12-27 11:14:51.059 11 <<<<< Begin--active session突然變為雙數,並且持續了一段時間
1 72762630 2013-12-27 11:15:01.161 11

1 72762970 2013-12-27 11:20:44.756 10
1 72762980 2013-12-27 11:20:54.856 11
1 72762990 2013-12-27 11:21:04.956 15
1 72763000 2013-12-27 11:21:15.056 16
......
1 72763940 2013-12-27 11:37:04.830 11
1 72763950 2013-12-27 11:37:14.930 11
1 72763960 2013-12-27 11:37:25.032 11
1 72763970 2013-12-27 11:37:35.142 12
1 72763980 2013-12-27 11:37:45.242 9 <<<< End---acive session變為單數
1 72763990 2013-12-27 11:37:55.342 8
以上可以定位問題出現的時間段。

3. Verify the wait events:

select t.instance_number,
 t.sample_id,
 t.sample_time,
 t.event,
 t.session_state,
 --t.r,
 t.c
 from (select t.*,
 --row_number() over(partition by instance_number, sample_id order by c desc) r
 rank() over(partition by instance_number, sample_id order by c desc) r
 from (select /*+ parallel 8 */ t.*,
 count(*) over(partition by instance_number, sample_id, event) c,
 row_number() over(partition by instance_number, sample_id, event order by 1) r1
 from  mao_ash   t) t
 where r1 = 1) t
 where r < 3
 order by sample_time, r;

INSTANCE_NUMBER SAMPLE_ID SAMPLE_TIME EVENT SESSION_STATE C
2 72736930 2013-12-27 11:14:48.374 ON CPU 1---在這個時間點,有一個sql在on cpu
1 72762620 2013-12-27 11:14:51.059 ON CPU 9---在這個時間點,有九個sql在on cpu
1 72762620 2013-12-27 11:14:51.059 library cache lock WAITING 1---在這個時間點,有一個library cache lock WAITING 等待事件
1 72762620 2013-12-27 11:14:51.059 cursor: pin S wait on X WAITING 1
......
1 72763100 2013-12-27 11:22:56.079 ON CPU 7
1 72763100 2013-12-27 11:22:56.079 library cache lock WAITING 4
......
1 72763290 2013-12-27 11:26:08.193 ON CPU 10
1 72763300 2013-12-27 11:26:18.291 ON CPU 12
2 72737620 2013-12-27 11:26:25.403 ON CPU 1
1 72763310 2013-12-27 11:26:28.391 ON CPU 11
......
1 72763720 2013-12-27 11:33:22.568 ON CPU 17
1 72763730 2013-12-27 11:33:32.689 ON CPU 18
1 72763740 2013-12-27 11:33:42.788 ON CPU 18
.....;.
備註:等待事件是cache buffers chains,但這裡是有library cache lock引起的,所以給我們的感覺是cache buffer chains,這裡並不能透過p1,p2來定位問題。


4. Find out the holders:
select t.instance_number,
 t.sample_time,
 t.sample_id,
 t.session_id,
 t.sql_id,
 t.session_type,
 t.event,
 t.session_state,
 --t.blocking_session,
 --t.blocking_inst_id,
 --t.blocking_session_status,
 --t.lv,
 --t.r,
 t.c
 from (select t.*,
 row_number() over(partition by instance_number, sample_id order by c desc) r
 --rank() over(partition by instance_number, sample_id order by c desc) r
 from (select t.*,
 count(*) over(partition by instance_number, sample_id, session_id) c,
 row_number() over(partition by instance_number, sample_id, session_id order by 1) r1
 from (select /*+ parallel 8 */
 level lv, connect_by_isleaf isleaf, t.*
 from  mao_ash   t
 start with blocking_session is not null
 connect by nocycle
 prior blocking_session = session_id
 and prior t.blocking_session_serial# =
 session_serial#
 and ((prior sample_time) - sample_time between
 interval '-3' second and interval '3' second)) t
 where t.isleaf = 1) t
 where r1 = 1) t
 where r < 3
 order by sample_time, r;

INSTANCE_NUMBER SAMPLE_TIME SAMPLE_ID SESSION_ID SQL_ID SESSION_TYPE EVENT SESSION_STATE C
1 2013-12-27 11:09:47.982 72762320 2697 62h7yux977dmw FOREGROUND db file parallel read WAITING 1
1 2013-12-27 11:09:58.082 72762330 2697 62h7yux977dmw FOREGROUND gc cr multi block request WAITING 1
1 2013-12-27 11:10:08.183 72762340 2697 62h7yux977dmw FOREGROUND ON CPU 1
1 2013-12-27 11:10:18.282 72762350 2697 62h7yux977dmw FOREGROUND ON CPU 1
1 2013-12-27 11:10:28.382 72762360 2697 62h7yux977dmw FOREGROUND gc current block 2-way WAITING 1
1 2013-12-27 11:10:38.482 72762370 2697 62h7yux977dmw FOREGROUND ON CPU 1
......sid為2697正在執行62h7yux977dmw的sql,與此同時在11:09:47這個時間點,有一個session正在等待

1 2013-12-27 11:28:39.723 72763440 2720 dts1t1fjha4m2 FOREGROUND gc current block 2-way WAITING 1
1 2013-12-27 11:43:18.608 72764310 2753 BACKGROUND log file parallel write WAITING 1
備註:這個sql很強大,可以找出是罪魁禍首的那個sql,這裡就是62h7yux977dmw了,因為它引起的session等待最多。


5. Find out the which SQL cause the most CPU usage:

 select sql_id,count(*)
 from  mao_ash   t
 where sample_time >
 to_timestamp('2013-12-27 11:30:40', 'yyyy-mm-dd hh24:mi:ss')
 and session_state = 'ON CPU'
 group by sql_id order by 2 desc;

SQL_ID COUNT(*)
58xvzzydq83f1 350
4fk8mz3jx2898 63
6zwy49juu8wxa 52
ayvngp9bb3dum 48
a3v2gkv5r4gj6 47
451xth6g96cx7 35

結果:
1.調整58xvzzydq83f1,讓sql儘快執行完畢,而不是一直執行著,消耗著cpu
2.找出62h7yux977dmw的sql_text,再做調整。其實這裡62h7yux977dmw可能只執行了一次,可能由於shared_pool比較忙,所以很有可能在v$sql裡找不到。


 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/25462274/viewspace-2148893/,如需轉載,請註明出處,否則將追究法律責任。

相關文章