Oracle SYSAUX 表空間使用率100% 導致的DB 故障

盛源線上16228719999發表於2021-08-25

1 故障現象
資料庫例項當機,檢視日誌資訊如下:

Sun Aug 22 09:02:17 2021
SMCO started with pid=27, OS id=12247
Sun Aug 22 09:03:41 2021
ORA-1688: unable to extend table SYS.WRH$_ACTIVE_SESSION_HISTORY partition WRH$_ACTIVE_2459712562_0 by 128 in tablespace SYSAUX
MMON Flush encountered SYSAUX out of space error(1688).
MMON (emergency) purge of WR snapshots (5684) and older
ORA-1688: unable to extend table SYS.WRH$_ACTIVE_SESSION_HISTORY partition WRH$_ACTIVE_2459712562_0 by 128 in tablespace SYSAUX
MMON Flush encountered SYSAUX out of space error(1688).
MMON (emergency) purge - nothing to purge, pgsid=0.
Sun Aug 22 09:09:39 2021
Errors in file /u01/app/oracle/diag/rdbms/llt_pd/llt/trace/llt_ora_16131.trc:
Errors in file /u01/app/oracle/diag/rdbms/llt_pd/llt/trace/llt_ora_16131.trc:
Errors in file /u01/app/oracle/diag/rdbms/llt_pd/llt/trace/llt_ora_16131.trc:
因為SYSAUX 是輔助表空間,所以直接啟動例項後檢視錶空間資訊:

[dave@www.cndba.cn ~]# orz tsfree
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| TABLESPACE_NAME | TOTAL_SIZE(Mb) | FREE_SIZE(Mb) | USED_SIZE(Mb) | USED_RATE(%) |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| UNDOTBS1 | 287 | 284.56 | 2.44 | .85% |
| USERS | 500 | 499 | 1 | .2% |
| SYSTEM | 325 | 59.06 | 265.94 | 81.83% |
| JLB | 8192 | 8190 | 2 | .02% |
| TEMPTS1 | 20 | 14 | 6 | 30% |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
但是沒有檢視到SYSAUX 表空間。 檢視等待事件:

[dave@www.cndba.cn ~]# orz event
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| EVENT | COUNT |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| resmgr:cpu quantum | 63 |
| read by other session | 56 |
| SQLNet message from client | 18 |
| rdbms ipc message | 16 |
| direct path write temp | 10 |
| direct path read | 3 |
| Space Manager: slave idle wait | 3 |
| DIAG idle wait | 2 |
| Streams AQ: qmn slave idle wait | 1 |
| smon timer | 1 |
| SQL
Net message to client | 1 |
| VKTM Logical Idle Wait | 1 |
| pmon timer | 1 |
| LGWR real time apply sync | 1 |
| Streams AQ: waiting for time management or cleanup tasks | 1 |
| VKRM Idle | 1 |
| Streams AQ: qmn coordinator idle wait | 1 |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[dave@www.cndba.cn ~]#
排第一的是:resmgr:cpu quantum 這個在MOS中有說明,在11.2.0.3 之前有bug會導致這個問題,但我們這裡是11.2.0.4。 所以不應該是bug。 但關聯的物件是一致的,都和資源管理有關,問題的根源還是在SYSAUX 表空間滿了有關。 但奇怪的是通過命令,查詢不到SYSAUX 表空間。
因為這個例項部署在雲主機上,是通過靜默安裝的,當時並沒有配置SYSAUX 表空間的自動擴充套件。 先配置自動擴充套件:

[oracle@db1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Sun Aug 22 09:51:19 2021

Copyright (c) 1982, 2013, Oracle. All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> alter database datafile 2 autoextend on;

Database altered.
修改後還是無法檢視到SYSAUX 表空間的使用率。 手工刪除表空間中的一些物件。

truncate table WRH$_ACTIVE_SESSION_HISTORY;
truncate table WRH$_EVENT_HISTOGRAM;
truncate table WRH$_SQLSTAT;
truncate table WRH$_LATCH_MISSES_SUMMARY;
truncate table WRH$_LATCH;
truncate table WRH$_SYSSTAT;
truncate table WRH$_SEG_STAT;
truncate table WRH$_PARAMETER;
truncate table WRH$_SYSTEM_EVENT;
truncate table WRH$_SQL_PLAN;
truncate table WRH$_DLM_MISC;
truncate table WRH$_SERVICE_STAT;
truncate table WRH$_TABLESPACE_STAT;
truncate table WRH$_ROWCACHE_SUMMARY;
truncate table WRH$_MVPARAMETER;

SQL> truncate table SYS.WRH$_ACTIVE_SESSION_HISTORY;

Table truncated.
在SQL 命令中也可以查詢到表空間的資訊了:
[dave@www.cndba.cn ~]# orz tsfree
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| TABLESPACE_NAME | TOTAL_SIZE(Mb) | FREE_SIZE(Mb) | USED_SIZE(Mb) | USED_RATE(%) |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| SYSAUX | 325 | .88 | 324.13 | 99.73% |
| UNDOTBS1 | 287 | 284.56 | 2.44 | .85% |
| USERS | 500 | 499 | 1 | .2% |
| SYSTEM | 325 | 59.06 | 265.94 | 81.83% |
| JLB | 8192 | 8190 | 2 | .02% |
| TEMPTS1 | 20 | 14 | 6 | 30% |
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[dave@www.cndba.cn ~]#

[dave@www.cndba.cn ~]# orz datafile sysaux
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| FILE# | NAME | SIZE(Mb) | AUTOEXTENSIBLE | MAXBYTES(Mb) | STATUS |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| 2 | /u01/app/oracle/oradata/jlb/sysaux01.dbf | 325 | YES | 32767.98 | ONLINE |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[dave@www.cndba.cn ~]#
等待事件也恢復正常。
[dave@www.cndba.cn llt]# orz event
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| EVENT | COUNT |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| SQLNet message from client | 34 |
| rdbms ipc message | 16 |
| direct path read | 6 |
| DIAG idle wait | 2 |
| Space Manager: slave idle wait | 2 |
| SQL
Net message to client | 1 |
| VKTM Logical Idle Wait | 1 |
| Streams AQ: qmn coordinator idle wait | 1 |
| Streams AQ: waiting for time management or cleanup tasks | 1 |
| Streams AQ: qmn slave idle wait | 1 |
| pmon timer | 1 |
| smon timer | 1 |
| LGWR real time apply sync | 1 |
| VKRM Idle | 1 |
| log file sync | 1 |
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[dave@www.cndba.cn llt]#
其實問題的根源還是SYSAUX 表空間滿了導致的後續問題。 小問題隨筆記之。

本作品採用《CC 協議》,轉載必須註明作者和本文連結

相關文章