一次awr報告分析(密碼錯誤引發sql執行時間過長)
預備知識:
1)row cache lock事件
--記憶體share pool分為library cache,dictionary cache;而row cache lock物件分佈在dictionary cache中,是由於對於字典緩衝的訪問造成的;
--這類屬於latch類的資源競爭,相當耗CPU,如果併發量大的話,容易down機
如果發現這個等待十分高,一般來說可能由於2種原因,一是共享池太小了,需要增加共享池,另外一種情況是SQL分析過於頻繁,對於共享池的併發訪問量過大。對於任何一種情況,絕大多數情況下加大共享池會有助於降低該等待,不過加大共享池的時候也要注意,並不一定所有的情況下增加共享池都會有明顯的效果。特別是對於第二種情況,精確的分析十分重要。另外進一步分析,弄清楚哪些ROW CACHE的等待最為嚴重,有助於解決問題。
row cache lock事件的調優基於每一個佇列鎖型別的行為,常見的佇列鎖型別有:
-------DC_SEQUENCES:在使用序列的時候將發生該行緩衝佇列鎖。調優方式是檢查序列是否指定了緩衝選項並確定這個緩衝值可以承受預期的併發insert操作。
Check for appropriate caching of sequences for the application requirements.
-------DC_USED_EXTENTS和DC_FREE_EXTENTS:該行緩衝佇列鎖可能在空間管理碰到表空間分裂或者沒有足夠區大小時發生。調優方法是檢查表空間是否分裂了、區大小是否太小或者表空間是人工管理。
-------DC_TABLESPACES:該行緩衝佇列鎖會在分配新區是發生。如果區大小設定得過小,程式將經常申請新區,這將導致衝突。調優方法是快速地增加區的數量。
Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
-------DC_OBJECTS:該行緩衝佇列鎖會在重編譯物件的時候發生。當物件編譯時將申請一個排他鎖阻塞其他行為。透過檢查非法物件和依賴關係來調優。
-------DC_SEGMENTS:該行緩衝佇列鎖會在段分配的時候發生,觀察持有這個佇列鎖的會話在做什麼。
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.
-------DC_USERS:Deadlock and resulting “WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!” can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
-------DB_ROLLBACK_SEGMENTS:This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.
-------DC_AWR_CONTROL:This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these.
2)相關檢視欄位說明:
ROW CACHE LOCK基礎說明
P1 – Cache Id
P2 – Mode Held
P3 – Mode Requested
mode 和REQUEST的取值:
KQRMNULL 0 null mode – not locked
KQRMS 3 share mode
KQRMX 5 exclusive mode
KQRMFAIL 10 fail to acquire instance lock
3)SQL查詢
-----查詢row cache lock等待
select * from v$session_wait where wait_class = 'row cache lock';
查出p1的值
-----查詢rowcache 名稱
根據p1的值來查詢
select * from v$rowcache where cache# = &p1;
其他:
select event,p1 from v$session a where a.username is not null and a.status='ACTIVE';
4)dba_hist_active_sess_history檢視
dba_hist_active_sess_history檢視記錄了記憶體中活動會話的歷史資訊,而動態效能檢視V$ACTIVE_SESSION_HISTORY則記錄了當前活動會話資訊。 透過dba_hist_active_sess_history檢視關聯v$sqlarea和DBA_HIST_SNAPSHOT可以跟蹤到某個時間段內的sql。當然能跟蹤到的sql多少取決於v$sqlarea,畢竟只有還保留在v$sqlarea中的sql才能跟蹤到。
AWR報告分析
資料庫出現狀況是8月30號下午3點到4點,於是拉取了兩份AWR報告,一份是8月30號下午3點到4點,另一份是31號下午3點到4點,將兩份報告比對著看,不同的地方應該就是有問題的地方。
1)發現top 5的前臺事件中,排第一的不是DB CPU。預設正常情況下,DB CPU會排第一。
2)繼續看Wait Events Statistics ,發現排第一也不是DB CPU,而是connection management call elapsed time,看名字感覺和資料庫連線有關。
3)繼續比對,發現dc_users的Pct Miss比較大,超過了20%
4)檢視可疑的sql語句:
標紅的sql執行速度很慢,有些異常。
SQL> select to_char(sample_time, 'YY-MM-DD HH24:MI:SS') sample_time,
2 instance_number,
3 sql_id,
4 P1,
5 event,
6 wait_class
7 from dba_hist_active_sess_history
8 where sample_time between
9 to_date('16-08-29 13:00:00', 'YY-MM-DD HH24:MI:SS') and
10 to_date('16-08-29 13:30:00', 'YY-MM-DD HH24:MI:SS')
11 and sql_id in ('bvtu633rnwrwv',
12 '4a6uhr508t0p6',
13 'fpm2zazkfqhy6',
14 'bhrcaykh5tzsw',
15 '9fxv1px768bd5')
16 order by 1
17 ;
SAMPLE_TIME INSTANCE_NUMBER SQL_ID P1 EVENT WAIT_CLASS
----------------- --------------- ------------- ---------- --------------------- ---------------------
16-08-29 13:00:59 1 4a6uhr508t0p6 10 row cache lock Concurrency
16-08-29 13:01:09 1 4a6uhr508t0p6 10 row cache lock Concurrency
16-08-29 13:01:19 1 4a6uhr508t0p6 10 row cache lock Concurrency
16-08-29 13:01:29 1 4a6uhr508t0p6 7 row cache lock Concurrency
16-08-29 13:01:39 1 bvtu633rnwrwv 10 row cache lock Concurrency
16-08-29 13:01:49 1 bvtu633rnwrwv 10 row cache lock Concurrency
16-08-29 13:01:59 1 bvtu633rnwrwv 10 row cache lock Concurrency
16-08-29 13:02:10 1 bvtu633rnwrwv 7 row cache lock Concurrency
16-08-29 13:02:20 1 fpm2zazkfqhy6 10 row cache lock Concurrency
16-08-29 13:02:30 1 fpm2zazkfqhy6 10 row cache lock Concurrency
16-08-29 13:02:40 1 fpm2zazkfqhy6 10 row cache lock Concurrency
16-08-29 13:02:41 2 bvtu633rnwrwv 7 row cache lock Concurrency
16-08-29 13:02:50 1 fpm2zazkfqhy6 7 row cache lock Concurrency
16-08-29 13:04:31 1 bhrcaykh5tzsw 10 row cache lock Concurrency
16-08-29 13:04:41 1 bhrcaykh5tzsw 10 row cache lock Concurrency
16-08-29 13:04:51 1 bhrcaykh5tzsw 7 row cache lock Concurrency
16-08-29 13:06:32 1 9fxv1px768bd5 10 row cache lock Concurrency
16-08-29 13:06:42 1 9fxv1px768bd5 10 row cache lock Concurrency
18 rows selected
根據上面的sql,發現sql導致的等待事件是row cache lock。再根據P1=7或者10,查出等待事件是發生在具體哪個類別上
SQL> select cache#,parameter from v$rowcache where cache# in ('7','10');
CACHE# PARAMETER
---------- --------------------------------
10 dc_users
7 dc_users
7 dc_users
7 dc_users
根據上面的sql,發現引數類別是dc_users,網上說dc_users是和使用者用錯誤密碼登陸有關:
In 11g there is an intentional delay between allowing failed logon attempts to retry. For some specific application types this can cause a problem as the row cache entry is locked for the duration of the delay . This can lead to excessive row cache lock waits for DC_USERS for specific users / schemas.
…
After 3 successive failures a sleep delay is introduced starting at 3 seconds and extending to 10 seconds max. During each delay the user X row cache lock is held in exclusive mode preventing any concurrent logon attempt as user X (and preventing any other operation which would need the row cache lock for user X).
驗證一下,看是否在發生row cache lock期間,有使用者登陸失敗。
SQL> select username,
2 userhost,
3 to_char(timestamp, 'YY-MM-DD HH24:MI:SS') timestamp,
4 action_name
5 from dba_audit_trail
6 where action_name = 'LOGON'
7 and priv_used is null
8 and timestamp between
9 to_date('16-08-29 13:00:00', 'YY-MM-DD HH24:MI:SS') and
10 to_date('16-08-29 13:30:00', 'YY-MM-DD HH24:MI:SS');
USERNAME USERHOST TIMESTAMP ACTION_NAME
------------------------------ --------------------- ----------------- ----------------------------
MAPP_PLATFORM IDC-APP-02 16-08-29 13:03:51 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:03:41 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:03:31 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:03:21 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:03:11 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:03:01 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:02:51 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:02:41 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:02:31 LOGON
。。。。
發現使用者MAPP_PLATFORM從8.14到8.31,一直以錯誤密碼嘗試登陸資料庫,驗證了以錯誤密碼登陸會導致row cache lock同時sql執行慢的現象發生。
進一步查詢登入審計資訊:
select * from
(
select os_username,userhost,terminal,username,count(*) failures
from dba_audit_trail
where returncode = 1017 and timestamp between to_date ('2016-8-29 11:30:00','yyyy-mm-dd hh24:mi:ss') and to_date ('2016-8-29 13:30:00','yyyy-mm-dd hh24:mi:ss')
group by os_username,userhost,username,terminal
order by 5 desc );
returncode = 1017 這裡的1017就是oracle內部定義的錯誤返回程式碼值。
實際上不管使用者提供的密碼是否正確,Oracle都會為新的connection分配一個shadow process,該服務程式為了進一步驗證使用者資訊都不得不獲取少量資源;如果以此為契機,即便在無法正常登入的情況下依然能在短期內造成例項僵死。
1)row cache lock事件
--記憶體share pool分為library cache,dictionary cache;而row cache lock物件分佈在dictionary cache中,是由於對於字典緩衝的訪問造成的;
--這類屬於latch類的資源競爭,相當耗CPU,如果併發量大的話,容易down機
如果發現這個等待十分高,一般來說可能由於2種原因,一是共享池太小了,需要增加共享池,另外一種情況是SQL分析過於頻繁,對於共享池的併發訪問量過大。對於任何一種情況,絕大多數情況下加大共享池會有助於降低該等待,不過加大共享池的時候也要注意,並不一定所有的情況下增加共享池都會有明顯的效果。特別是對於第二種情況,精確的分析十分重要。另外進一步分析,弄清楚哪些ROW CACHE的等待最為嚴重,有助於解決問題。
row cache lock事件的調優基於每一個佇列鎖型別的行為,常見的佇列鎖型別有:
-------DC_SEQUENCES:在使用序列的時候將發生該行緩衝佇列鎖。調優方式是檢查序列是否指定了緩衝選項並確定這個緩衝值可以承受預期的併發insert操作。
Check for appropriate caching of sequences for the application requirements.
-------DC_USED_EXTENTS和DC_FREE_EXTENTS:該行緩衝佇列鎖可能在空間管理碰到表空間分裂或者沒有足夠區大小時發生。調優方法是檢查表空間是否分裂了、區大小是否太小或者表空間是人工管理。
-------DC_TABLESPACES:該行緩衝佇列鎖會在分配新區是發生。如果區大小設定得過小,程式將經常申請新區,這將導致衝突。調優方法是快速地增加區的數量。
Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
-------DC_OBJECTS:該行緩衝佇列鎖會在重編譯物件的時候發生。當物件編譯時將申請一個排他鎖阻塞其他行為。透過檢查非法物件和依賴關係來調優。
-------DC_SEGMENTS:該行緩衝佇列鎖會在段分配的時候發生,觀察持有這個佇列鎖的會話在做什麼。
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.
-------DC_USERS:Deadlock and resulting “WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!” can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
-------DB_ROLLBACK_SEGMENTS:This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.
-------DC_AWR_CONTROL:This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these.
2)相關檢視欄位說明:
ROW CACHE LOCK基礎說明
P1 – Cache Id
P2 – Mode Held
P3 – Mode Requested
mode 和REQUEST的取值:
KQRMNULL 0 null mode – not locked
KQRMS 3 share mode
KQRMX 5 exclusive mode
KQRMFAIL 10 fail to acquire instance lock
3)SQL查詢
-----查詢row cache lock等待
select * from v$session_wait where wait_class = 'row cache lock';
查出p1的值
-----查詢rowcache 名稱
根據p1的值來查詢
select * from v$rowcache where cache# = &p1;
其他:
select event,p1 from v$session a where a.username is not null and a.status='ACTIVE';
4)dba_hist_active_sess_history檢視
dba_hist_active_sess_history檢視記錄了記憶體中活動會話的歷史資訊,而動態效能檢視V$ACTIVE_SESSION_HISTORY則記錄了當前活動會話資訊。 透過dba_hist_active_sess_history檢視關聯v$sqlarea和DBA_HIST_SNAPSHOT可以跟蹤到某個時間段內的sql。當然能跟蹤到的sql多少取決於v$sqlarea,畢竟只有還保留在v$sqlarea中的sql才能跟蹤到。
AWR報告分析
資料庫出現狀況是8月30號下午3點到4點,於是拉取了兩份AWR報告,一份是8月30號下午3點到4點,另一份是31號下午3點到4點,將兩份報告比對著看,不同的地方應該就是有問題的地方。
1)發現top 5的前臺事件中,排第一的不是DB CPU。預設正常情況下,DB CPU會排第一。
2)繼續看Wait Events Statistics ,發現排第一也不是DB CPU,而是connection management call elapsed time,看名字感覺和資料庫連線有關。
3)繼續比對,發現dc_users的Pct Miss比較大,超過了20%
4)檢視可疑的sql語句:
標紅的sql執行速度很慢,有些異常。
SQL> select to_char(sample_time, 'YY-MM-DD HH24:MI:SS') sample_time,
2 instance_number,
3 sql_id,
4 P1,
5 event,
6 wait_class
7 from dba_hist_active_sess_history
8 where sample_time between
9 to_date('16-08-29 13:00:00', 'YY-MM-DD HH24:MI:SS') and
10 to_date('16-08-29 13:30:00', 'YY-MM-DD HH24:MI:SS')
11 and sql_id in ('bvtu633rnwrwv',
12 '4a6uhr508t0p6',
13 'fpm2zazkfqhy6',
14 'bhrcaykh5tzsw',
15 '9fxv1px768bd5')
16 order by 1
17 ;
SAMPLE_TIME INSTANCE_NUMBER SQL_ID P1 EVENT WAIT_CLASS
----------------- --------------- ------------- ---------- --------------------- ---------------------
16-08-29 13:00:59 1 4a6uhr508t0p6 10 row cache lock Concurrency
16-08-29 13:01:09 1 4a6uhr508t0p6 10 row cache lock Concurrency
16-08-29 13:01:19 1 4a6uhr508t0p6 10 row cache lock Concurrency
16-08-29 13:01:29 1 4a6uhr508t0p6 7 row cache lock Concurrency
16-08-29 13:01:39 1 bvtu633rnwrwv 10 row cache lock Concurrency
16-08-29 13:01:49 1 bvtu633rnwrwv 10 row cache lock Concurrency
16-08-29 13:01:59 1 bvtu633rnwrwv 10 row cache lock Concurrency
16-08-29 13:02:10 1 bvtu633rnwrwv 7 row cache lock Concurrency
16-08-29 13:02:20 1 fpm2zazkfqhy6 10 row cache lock Concurrency
16-08-29 13:02:30 1 fpm2zazkfqhy6 10 row cache lock Concurrency
16-08-29 13:02:40 1 fpm2zazkfqhy6 10 row cache lock Concurrency
16-08-29 13:02:41 2 bvtu633rnwrwv 7 row cache lock Concurrency
16-08-29 13:02:50 1 fpm2zazkfqhy6 7 row cache lock Concurrency
16-08-29 13:04:31 1 bhrcaykh5tzsw 10 row cache lock Concurrency
16-08-29 13:04:41 1 bhrcaykh5tzsw 10 row cache lock Concurrency
16-08-29 13:04:51 1 bhrcaykh5tzsw 7 row cache lock Concurrency
16-08-29 13:06:32 1 9fxv1px768bd5 10 row cache lock Concurrency
16-08-29 13:06:42 1 9fxv1px768bd5 10 row cache lock Concurrency
18 rows selected
根據上面的sql,發現sql導致的等待事件是row cache lock。再根據P1=7或者10,查出等待事件是發生在具體哪個類別上
SQL> select cache#,parameter from v$rowcache where cache# in ('7','10');
CACHE# PARAMETER
---------- --------------------------------
10 dc_users
7 dc_users
7 dc_users
7 dc_users
根據上面的sql,發現引數類別是dc_users,網上說dc_users是和使用者用錯誤密碼登陸有關:
In 11g there is an intentional delay between allowing failed logon attempts to retry. For some specific application types this can cause a problem as the row cache entry is locked for the duration of the delay . This can lead to excessive row cache lock waits for DC_USERS for specific users / schemas.
…
After 3 successive failures a sleep delay is introduced starting at 3 seconds and extending to 10 seconds max. During each delay the user X row cache lock is held in exclusive mode preventing any concurrent logon attempt as user X (and preventing any other operation which would need the row cache lock for user X).
驗證一下,看是否在發生row cache lock期間,有使用者登陸失敗。
SQL> select username,
2 userhost,
3 to_char(timestamp, 'YY-MM-DD HH24:MI:SS') timestamp,
4 action_name
5 from dba_audit_trail
6 where action_name = 'LOGON'
7 and priv_used is null
8 and timestamp between
9 to_date('16-08-29 13:00:00', 'YY-MM-DD HH24:MI:SS') and
10 to_date('16-08-29 13:30:00', 'YY-MM-DD HH24:MI:SS');
USERNAME USERHOST TIMESTAMP ACTION_NAME
------------------------------ --------------------- ----------------- ----------------------------
MAPP_PLATFORM IDC-APP-02 16-08-29 13:03:51 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:03:41 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:03:31 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:03:21 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:03:11 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:03:01 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:02:51 LOGON
MAPP_PLATFORM IDC-APP-01 16-08-29 13:02:41 LOGON
MAPP_PLATFORM IDC-APP-02 16-08-29 13:02:31 LOGON
。。。。
發現使用者MAPP_PLATFORM從8.14到8.31,一直以錯誤密碼嘗試登陸資料庫,驗證了以錯誤密碼登陸會導致row cache lock同時sql執行慢的現象發生。
進一步查詢登入審計資訊:
select * from
(
select os_username,userhost,terminal,username,count(*) failures
from dba_audit_trail
where returncode = 1017 and timestamp between to_date ('2016-8-29 11:30:00','yyyy-mm-dd hh24:mi:ss') and to_date ('2016-8-29 13:30:00','yyyy-mm-dd hh24:mi:ss')
group by os_username,userhost,username,terminal
order by 5 desc );
returncode = 1017 這裡的1017就是oracle內部定義的錯誤返回程式碼值。
實際上不管使用者提供的密碼是否正確,Oracle都會為新的connection分配一個shadow process,該服務程式為了進一步驗證使用者資訊都不得不獲取少量資源;如果以此為契機,即便在無法正常登入的情況下依然能在短期內造成例項僵死。
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28497416/viewspace-2124662/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 執行awrrpt.sql生成awr報告報ora-06502錯誤SQL
- 執行AWR報告出現ORA-6502錯誤
- Oracle 11.2.0.3.0中執行awrrpt.sql生成awr報告報ora-06502錯誤OracleSQL
- 在Oracle 11.2.0.3.0中執行awrrpt.sql生成awr報告報ora-06502錯誤OracleSQL
- 【AWR】通過AWR報告中記錄的 SQL Id獲得SQL語句的執行計劃SQL
- Oracle AWR報告分析之–SQL ordered byOracleSQL
- 記錄一次錯誤的使用當前時間new Date()引發的錯誤
- 定位rac環境中某條sql執行時間過長SQL
- AWR報告的收集和分析執行計劃的方式
- 指令碼:定時生成awr報告指令碼
- 執行SQL發生錯誤!錯誤:disk I/O errorSQLError
- 生成 Linux 執行時間報告的 Bash 指令碼Linux指令碼
- AWR 報告修改moving window 出錯分析
- 通過shell指令碼抓取awr報告中的問題sql指令碼SQL
- 透過shell指令碼抓取awr報告中的問題sql指令碼SQL
- AWR解析報告分析
- 通過AWR報告處理故障一次心得
- springboot整合Batis,執行報錯,資料庫連線密碼錯誤Spring BootBAT資料庫密碼
- PbootCMS執行SQL發生錯誤!錯誤:no such column: def1bootSQL
- PbootCMS錯誤提示:執行SQL發生錯誤!錯誤:no such column: def1bootSQL
- 手工生成AWR分析報告
- oracle AWR報告提取分析Oracle
- 執行指令碼寫入中間表錯誤返回錯誤資訊指令碼
- AWR報告分析之二:ges inquiry response 過高UI
- Job長時間執行不能退出的分析
- 一次定時任務配置錯誤引發的思考
- 由密碼錯誤引發的ORA-03135密碼
- 記一次utlrp.sql指令碼執行引發的結果SQL指令碼
- PbootCMS 執行SQL發生錯誤!錯誤: no such table:ay_configbootSQL
- 升級提示 執行SQL發生錯誤!錯誤:duplicate column name: picstitleSQL
- sql執行過程分析SQL
- AWR報告分析之二:ges inquiry response 過高-eygleUI
- 【AWR】Oracle批量生成awr報告指令碼Oracle指令碼
- ORACLE AWR報告詳細分析Oracle
- 透過pl/sql計算程式的執行時間SQL
- 通過pl/sql計算程式的執行時間SQL
- oracle查詢sql執行耗時、執行時間、sql_idOracleSQL
- pbootcms升級提示 執行SQL發生錯誤!錯誤:duplicate column name: picstitlebootSQL