undo表空間使用率過高解決
文章內容詳實,收集至筆記本中供需要者和自己後續探究學習,轉載自http://blog.itpub.net/69996316/viewspace-2886746/
1 在巡檢的過程中,多次發現UNDOTBS12表空間使用率過高,多次查詢資料庫的alert日誌,未發現資料庫有undo表空間告警的相關資訊。作為一個從業多年的DBA,已經認為這有可能有問題了,需要持續關注。
TABLESPACE_NAME TOT_SIZE TOT_FREE PCT_FREE MAX_FREE CHUNKS_FREE
------------------------------ ---------------- ---------------- ---------------- ---------------- ----------------
UNDOTBS11 184,320 134,760 73 3,968 1,697
UNDOTBS12 184,320 42 0 8 7
2 檢視那個使用者在使用undo資料塊,發現只使用了一個資料庫塊,當前會話沒有影響undo表空間資源耗盡。
set linesize 300
col username for a20
col machine for a20
col PROGRAM for a20
col name for a30
SELECT S.SID, S.USERNAME, S.PROGRAM, S.MACHINE, U.NAME, T.USED_UBLK
FROM V$TRANSACTION T, V$ROLLSTAT R, V$ROLLNAME U, V$SESSION S
WHERE S.TADDR = T.ADDR
AND T.XIDUSN = R.USN
AND R.USN = U.USN
ORDER BY S.USERNAME;
SID USERNAME PROGRAM MACHINE NAME USED_UBLK
---------- -------------------- -------------------- -------------------- ------------------------------ ---------
183 TEST_SN JDBC Thin Client yxnewapp6 _SYSSMU1787_2063356590$ 1
3 檢視undo表空間UNEXPIRED、EXPIRED、ACTIVE 使用情況,發現過期的Undo資料塊有177G,但undo表空間使用率依舊過高。
set linesize 300
col TABLESPACE_NAME for a20
SELECT TABLESPACE_NAME,STATUS,TRUNC(SUM(BLOCKS) * 8 / 1024) AS "Size M",COUNT(*) Undo_Extent_Num FROM DBA_UNDO_EXTENTS
where TABLESPACE_NAME='UNDOTBS12'
GROUP BY TABLESPACE_NAME, STATUS order by TABLESPACE_NAME,STATUS;
TABLESPACE_NAME STATUS Size M UNDO_EXTENT_NUM
-------------------- --------------------------- ---------- ---------------
UNDOTBS12 EXPIRED 177223 44795
UNDOTBS12 UNEXPIRED 6275 942
4 檢視是否由於undo自動除錯,引起的undo不足,經檢查,沒有自動除錯,故認為是正常的
SELECT BEGIN_TIME, END_TIME, UNDOTSN, UNDOBLKS, TXNCOUNT,TUNED_UNDORETENTION,
MAXCONCURRENCY AS "MAXCON" FROM V$UNDOSTAT;
BEGIN_TIME END_TIME UNDOTSN UNDOBLKS TXNCOUNT TUNED_UNDORETENTION MAXCON
------------------ ------------------ ---------- ---------- ---------- ------------------- ----------
21-JAN-21 30-NOV-21 8 3418760733 2073276409 0 821
5 查詢哪些undo段被用於回滾,查詢結果為空
select b.name "UNDO Segment Name", b.inst# "Instance ID", b.status$ STATUS, a.ktuxesiz "UNDO Blocks", a.ktuxeusn, a.ktuxeslt xid_slot, a.ktuxesqn xid_seq
from x$ktuxe a, undo$ b
where a.ktuxesta = 'ACTIVE' and a.ktuxecfl like '%DEAD%' and a.ktuxeusn = b.us#;
6 檢視資料庫的版本,為11.2.0.4.171017 (26392168),故認為不可能是由於PSU引起的,
由於檢視Undo相關引數設定是否正常,經檢查,認為都是正常的。
檢視Undo相關的隱含引數
NAME VALUE DESCRIB
------------------------------ -------------------- ------------------------------------------------------------
_flush_undo_after_tx_recovery TRUE if TRUE, flush undo buffers after TX recovery
_gc_undo_affinity FALSE if TRUE, enable dynamic undo affinity
_gc_undo_block_disk_reads TRUE if TRUE, enable undo block disk reads
_undo_block_compression TRUE enable undo block compression
undo_management AUTO instance runs in SMU mode if TRUE, else in RBU mode
undo_tablespace UNDOTBS12 use/switch undo tablespace
_collect_undo_stats TRUE Collect Statistics v$undostat
_undo_debug_mode 0 debug flag for undo related operations
_verify_undo_quota FALSE TRUE - verify consistency of undo quota statistics
_undo_autotune FALSE enable auto tuning of undo_retention
_highthreshold_undoretention 4294967294 high threshold undo_retention in seconds
undo_retention 7200 undo retention in seconds
_undo_debug_usage 0 invoke undo usage functions for testing
_disable_undo_tablespace_alerts FALSE disable tablespace alerts for UNDO tablespaces
_smon_undo_seg_rescan_limit 10 limit of SMON continous undo segments re-scan
_optimizer_undo_cost_change 11.2.0.4 optimizer undo cost change
_optimizer_undo_changes FALSE undo changes to query optimizer
_enable_default_undo_threshold TRUE Enable Default Tablespace Utilization Threshold for TEMPORARY Tablespaces
7 檢視資料檔案是否因為關閉自動擴充套件引起的undo除錯時間變化,根據以上引數及設定,認為沒有
SQL> set lines 300
SQL> col file_name for a60
SQL> select file_id,file_name,tablespace_name,bytes/1024/1024,MAXBYTES/1024/1024,autoextensible from dba_data_files
where tablespace_name in ('UNDOTBS12') order by 1;
FILE_ID FILE_NAME TABLESPACE_NAME BYTES/1024/1024 MAXBYTES/1024/1024 AUTOEXTEN
---------- ------------------------------------------------------------ ------------------------------ --------------- ------------------ ---------
218 +DATA_S/test/datafile/undotbs12.269.1022257191 UNDOTBS12 30720 0 NO
220 +DATA_S/test/datafile/undotbs12.268.1022257243 UNDOTBS12 30720 0 NO
271 +DATAVG/test/datafile/undotbs12.1972.1027500943 UNDOTBS12 30720 0 NO
272 +DATAVG/test/datafile/undotbs12.1973.1027500985 UNDOTBS12 30720 0 NO
344 +DATAVG/test/datafile/undotbs12.2056.1087226619 UNDOTBS12 30720 0 NO
345 +DATAVG/test/datafile/undotbs12.2055.1087226655 UNDOTBS12 30720 0 NO
6 rows selected.
8 檢視undo表空間是否為NOGUARANTEE
SQL> col tablespace_name for a20
SQL> select tablespace_name,block_size,extent_management,segment_space_management,contents,
retention from dba_tablespaces where tablespace_name='UNDOTBS12';
TABLESPACE_NAME BLOCK_SIZE EXTENT_MANAGEMENT SEGMENT_SPACE_MANA CONTENTS RETENTION
-------------------- ---------- ------------------------------ ------------------ --------------------------- ---------------------------------
UNDOTBS12 8192 LOCAL MANUAL UNDO NOGUARANTEE
9 透過此檢視,驗證Undo是否有爭用,發現undo header等待事件過長。
SQL> select * from v$waitstat;
CLASS COUNT TIME
------------------------------------------------------ ---------- ----------
data block 1054253604 977861564
sort block 0 0
save undo block 0 0
segment header 1256109 339064
save undo header 0 0
free list 0 0
extent map 3536 196
1st level bmb 15523793 1738257
2nd level bmb 3159275 17816519
3rd level bmb 847 102
bitmap block 0 0
bitmap index block 57 0
file header block 1630526 11230885
unused 0 0
system undo header 1225 183
system undo block 23 0
undo header 60728323 13951843
undo block 14025728 760038
18 rows selected.
10 檢視回滾段的統計資訊,透過統計資訊,確定是否有異常。XACTS 為0,代表事務已經提交。不為0,則代表活動的事務。
Ratio的百分比如果過高,則代表undo資料頭有爭用。
透過此統計資訊,認為資料庫在undo使用上沒有問題的
set linesize 300
col name for a25
col status for a15
SELECT n.name, s.extents, s.rssize, s.optsize, s.hwmsize, s.xacts,s.waits,s.gets,100*(s.waits/s.gets) "Ratio%", s.status FROM v$rollname n, v$rollstat s WHERE n.usn = s.usn;
NAME EXTENTS RSSIZE OPTSIZE HWMSIZE XACTS WAITS GETS Ratio% STATUS
------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------
SYSTEM 7 450560 450560 0 1072 186417 .57505485 ONLINE
_SYSSMU1291_1397830725$ 33 260169728 3860979712 0 50354 287278824 .017527919 ONLINE
_SYSSMU1292_252107360$ 181 199942144 2438062080 0 47865 280260848 .017078732 ONLINE
_SYSSMU1293_1495173289$ 51 190963712 3893485568 0 45848 295071129 .015537948 ONLINE
_SYSSMU1294_3251668230$ 58 214032384 2281824256 0 41967 260615769 .016103016 ONLINE
_SYSSMU1295_4122107009$ 34 225566720 4284604416 0 39520 255348602 .015476881 ONLINE
_SYSSMU1296_3699510013$ 39 178380800 4231127040 0 36448 270844107 .013457188 ONLINE
_SYSSMU1297_3481746686$ 45 228712448 1849483264 0 35460 250105401 .014178022 ONLINE
...
_SYSSMU2234_569768215$ 54 277995520 328327168 0 3029 34127401 .008875566 ONLINE
_SYSSMU2235_3141883639$ 37 183623680 316137472 0 3370 35601358 .009465931 ONLINE
823 rows selected.
11 以下查詢用於透過查詢,來確定將undo設定多大比較合適,經驗證,認為將undo設定為7G,就滿足目前資料庫的使用。
SQL> SELECT (UR * (UPS * DBS)) AS "Bytes"
2 FROM (SELECT value AS UR FROM v$parameter WHERE name = 'undo_retention'),
3 (SELECT undoblks/((end_time-begin_time)*86400) AS UPS
4 FROM v$undostat
5 WHERE undoblks = (SELECT MAX(undoblks) FROM v$undostat)),
6 (SELECT block_size AS DBS
7 FROM dba_tablespaces
8 WHERE tablespace_name = (SELECT UPPER(value) FROM v$parameter WHERE name = 'undo_tablespace'));
Bytes
----------
7558168221
12 仔細考慮Oracle的架構,回收資源是mmon程式,既然alert日誌沒有相關資訊,那麼檢視mmon程式是否有有用的資訊, 果然發現undo 服務處於停滯狀態,這估計就是導致undo表空間使用率降不下來的根本原因
*** 2021-11-30 16:49:56.094
Unable to schedule a MMON slave at: Auto DBFUS Main Slave action has been temporarily suspended
- Slave action had prior policy violations.
Unknown return code: 101
*** 2021-12-01 00:50:16.197
Unable to schedule a MMON slave at: Auto DBFUS Main
Slave action has been temporarily suspended
- Slave action had prior policy violations.
Unknown return code: 101
檢視相關相關引數, _swrf_mmon_dbfus為true,是開啟的,為正常。
NAME VALUE DESCRIB
------------------------------ -------------------- ------------------------------------------------------------
_swrf_mmon_dbfus TRUE Enable/disable SWRF MMON DB Feature Usage
_swrf_test_dbfus FALSE Enable/disable DB Feature Usage Testing
13 透過如下資訊,可以確定在2020年7月16日後,undo就沒有正常的工作過
col version for a20
col description for a30
col FIRST for a20
col LAST for a20
select a.name,a.version,a.detected_usages,to_char(a.first_usage_date,'yyyy-mm-dd hh24:mi:ss') "FIRST",
to_char(a.last_sample_date,'yyyy-mm-dd hh24:mi:ss') "LAST",a.description from dba_feature_usage_statistics a
where a.detected_usages >0 and a.name like '%Undo%' order by a.name,a.version;
NAME VERSION DETECTED_USAGES FIRST LAST DESCRIPTION
------------------------------ -------------------- --------------- -------------------- -------------------- ------------------------------
Automatic Undo Management 11.2.0.4.0 95 2018-09-26 11:29:52 2020-07-16 19:19:24 Oracle automatically manages u
ndo data using an UNDO tablesp
14 觸發UNDO自動回收的機制為:一為undo表空間過高,二為調整undo的時間,但此係統已經有176G的undo資料塊為
過期,故認為透過調整undo保留時間,不一定有效果。
但undo過高,按理會觸發一次,按照目前系統的情況,認為資料庫沒有進行回收,故準備透過修改undo表空間閾值,
來觸發資料庫自動回收undo。
系統Undo表空間閾值預設值
SQL> SELECT OBJECT_NAME,warning_value, critical_value, status FROM dba_thresholds WHERE metrics_name = 'Tablespace Space Usage' AND object_name like 'UNDO%';
OBJECT_NAME WARNING_VALUE CRITICAL_VALUE STATUS
-------------------- ------------------------------ -------------------- ---------------------
UNDOTBS1 <SYSTEM-GENERATED THRESHOLD> 0 VALID
UNDOTBS11 <SYSTEM-GENERATED THRESHOLD> 0 VALID
UNDOTBS2 <SYSTEM-GENERATED THRESHOLD> 0 VALID
UNDOTBS12 <SYSTEM-GENERATED THRESHOLD> 0 VALID
執行如下語句,設定UNDOTBS12表空間閾值為50%收集一次,70%收集一次,
SQL> begin
2 DBMS_SERVER_ALERT.SET_THRESHOLD(
3 metrics_id => dbms_server_alert.tablespace_pct_full,
4 warning_operator => dbms_server_alert.operator_ge,
5 warning_value => '50',
6 critical_operator => dbms_server_alert.operator_ge,
7 critical_value => '70',
8 observation_period => 1,
9 consecutive_occurrences => 1,
10 instance_name => NULL,
11 object_type => dbms_server_alert.object_type_tablespace,
12 object_name => 'UNDOTBS12');
13 end;
14 /
OBJECT_NAME WARNING_VALUE CRITICAL_VALUE STATUS
-------------------- ------------------------------ -------------------- ---------------------
UNDOTBS1 <SYSTEM-GENERATED THRESHOLD> 0 VALID
UNDOTBS11 <SYSTEM-GENERATED THRESHOLD> 0 VALID
UNDOTBS2 <SYSTEM-GENERATED THRESHOLD> 0 VALID
UNDOTBS12 50 70 VALID
經過調整,發現資料庫已經緩慢的收集undo,但Unable to schedule a MMON slave at: Auto DBFUS Main
Slave action has been temporarily suspended 還是每過幾個小時觸發一次。
15 DBFUS不正常,根據多年經驗,認為有可能同資料庫例項相關檢視的統計資訊有關,於是檢視相關統計資訊,
發現從2020年7月20日後就沒有進行更新過,而undo統計資訊是從2020-7-16日為最後一次。
故認為有可能是資料庫例項相關檢視統計資訊過期,效能下降,導致undo自動回收工作異常。
15.1 檢視資料庫例項相關統計資訊,發現最後一次為2020-7-20日。
OWNER TABLE_NAME NUM_ROWS LAST_ANALYZED
------------------------------ ------------------------------ ---------- -------------
SYS X$KSPPO 131 2020/7/20
SYS X$KSOLSSTAT 22 2020/7/20
SYS X$KSOLSFTS 566126 2020/7/20
SYS X$KSPVLD_VALUES 617 2020/7/20
SYS X$KSPSPFH 1 2020/7/20
SYS X$KSPPI 2914 2020/7/20
SYS X$KSPPSV2 2919 2020/7/20
SYS X$KSPPCV2 2919 2020/7/20
SYS X$KSPPSV 2914 2020/7/20
SYS X$KSPPCV 2914 2020/7/20
SYS X$KSPSPFILE 2932 2020/7/20
SYS X$KSUCPUSTAT 12 2020/7/20
SYS X$KSULL 1 2020/7/20
SYS X$KSUCF 10 2020/7/20
SYS X$KSUINSTSTAT 0 2020/7/20
SYS X$KSQST 320 2020/7/20
SYS X$KSQDN 1 2020/7/20
SYS X$KSQRS 24272 2020/7/20
SYS X$KSQEQTYP 210 2020/7/20
SYS X$KSUCLNDPCC 0 2020/7/20
15.2 手動收集相關統計資訊,收集dbms_stats.gather_fixed_objects_stats統計資訊,執行幾天都沒有結束,
對會話進行追蹤,發現一直卡在X$KSOLSFTS表上,
查詢此表的資料,使用select count(1) from X$KSOLSFTS 執行7天都沒有結束,初步估計表的資料有可能有上百億條資料,
故導致收集統計資訊執行不成功。
SQL> exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats('SYS', 'X$KEWRTSQLPLAN');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats('SYS', 'X$KEWRATTRNEW')
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_table_stats('SYS', 'X$KEWRSQLIDTAB');
PL/SQL procedure successfully completed.
SQL> exec dbms_stats.gather_fixed_objects_stats; --執行2天沒有完成
15.3 經過觀察,自從收集統計資訊後,Unable to schedule a MMON slave at: Auto DBFUS Main
Slave action has been temporarily suspended 告警從每4小時
發生一次,變為到目前為止在沒有發生過,故認為Unable to schedule a MMON slave at:
Auto DBFUS Main Slave action has been temporarily suspended 告警
確實是由於統計資訊過期導致的。
15.4 自從收集統計資訊後,發現undo表空間的使用率已經慢慢的降下來,目前剩餘79G,使用率為57% 。
TABLESPACE_NAME TOT_SIZE TOT_FREE PCT_FREE MAX_FREE CHUNKS_FREE
------------------------------ ---------------- ---------------- ---------------- ---------------- ----------------
UNDOTBS12 184,320 79,552 43 2,343 6,799
15.5 查詢Undo自動管理的JOB,發現在2021-12-12日再次執行過,證明透過收集統計資訊,
Automatic Undo Management工作已經變為正常,雖然由於X$KSOLSFTS太大,
導致統計資訊還有部分過期,但資料庫已經緩慢恢復正常。
NAME VERSION DETECTED_USAGES FIRST LAST DESCRIPTION
------------------------------ -------------------- --------------- -------------------- -------------------- ------------------------------
Automatic Undo Management 11.2.0.4.0 97 2018-09-26 11:29:52 2021-12-12 19:17:17 Oracle automatically manages undo data using an UNDO tablespace.
16 本計劃使用如下命令,針對Auto-DBFUS Action事件生成相關trace檔案,找到明確的資訊,
來證明是由於統計資訊過期引起的異常,但自從收集統計資訊後,DBFUS不在告警了,
故心裡那個恨呀,為什麼就手動執行了收集統計資訊的操作呢,破壞了現場環境,
導致沒有明確的資訊來證明是由於統計資訊過期引起的。
begin
dbms_monitor.serv_mod_act_trace_enable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Auto-DBFUS Action');
end;
/
begin
dbms_monitor.serv_mod_act_trace_disable(service_name=>'SYS$BACKGROUND',
module_name=>'MMON_SLAVE',
action_name=>'Auto-DBFUS Action');
end;
/
17 重啟資料庫,釋放相關資源,收集統計資訊,經過驗證,undo表空間已經恢復正常。
重啟後第3天,表空間的使用情況
SQL> select a.tablespace_name,sum(a.tots) Tot_Size, sum(a.sumb) Tot_Free, sum(a.sumb)*100/sum(a.tots) Pct_Free, sum(a.largest) Max_Free,sum(a.chunks) Chunks_Free from ( select tablespace_name,0 tots,sum(bytes/1024/1024) sumb, max(bytes/1024/1024) largest,count(*) chunks from dba_free_space a group by tablespace_name union select tablespace_name,sum(bytes/1024/1024) tots,0,0,0 from dba_data_files group by tablespace_name) a group by a.tablespace_name order by Pct_Free desc;
TABLESPACE_NAME TOT_SIZE TOT_FREE PCT_FREE MAX_FREE CHUNKS_FREE
------------------------------ ---------------- ---------------- ---------------- ---------------- ----------------
UNDOTBS11 184,320 172,153 93 3,968 1,686
UNDOTBS12 184,320 149,383 81 1,381 2,623
9 rows selected.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/69994536/viewspace-2939472/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 【UNDO】使用重建UNDO表空間方法解決UNDO表空間過大問題
- 【UNDO】Oracle undo表空間使用率過高,因為一個查詢Oracle
- 臨時表空間使用率過高的解決辦法
- UNDO表空間不足解決方法
- ORACLE 臨時表空間使用率過高的原因及解決方案Oracle
- ORACLE臨時表空間使用率過高的原因及解決方法Oracle
- Oracle 10g UNDO表空間過大的解決方案Oracle 10g
- undo表空間不能回收的解決方法
- Oracle undo表空間爆滿的解決Oracle
- 10g+ undo表空間使用率計算
- ORACLE 中undo表空間爆滿的解決方法Oracle
- undo表空間資源緊張的解決方案
- oracle 11g sysaux表空間使用率非常高的問題解決OracleUX
- oracle undo 表空間Oracle
- 理解UNDO表空間
- undo表空間太大解決辦法
- oracle UNDO表空間一個bug——undo表空間快速擴充套件Oracle套件
- 自動undo表空間模式下切換新的undo表空間模式
- SYSAUX表空間使用率高問題處理UX
- Oracle undo 表空間管理Oracle
- oracle重建UNDO表空間Oracle
- oracle undo表空間管理Oracle
- undo表空間總結
- Oracle 11gR2 Database UNDO表空間使用率居高不下OracleDatabase
- 【Oracle】-oracle 10g undo表空間使用率居高不下bugOracle 10g
- undo表空間損壞的處理過程
- Oracle 釋放過度使用的Undo表空間Oracle
- 18_深入解析Oracle undo原理(2)_undo表空間使用率100%問題處理Oracle
- ORACLE 臨時表空間使用率過高分析Oracle
- MySQL InnoDB Undo表空間配置MySql
- undo表空間故障處理
- oracle 表空間,臨時表空間使用率查詢Oracle
- Oracle SYSAUX表空間使用率超過警戒閥值OracleUX
- 還原表空間過大的解決方案
- 臨時表空間過大的解決方法
- UNDO表空間的ORA-1122錯誤解決(三)
- ORACLE線上切換undo表空間Oracle
- Oracle undo表空間切換(ZT)Oracle