Oracle X7一體機儲存節點重啟導致u01使用率不斷增大

xfhuangfu發表於2023-03-09

背景:儲存節點發生儲存,資料檔案狀態不支援,產生trace過多導致u01使用率不斷增高

基礎環境:Oracle X7一體機 資料庫版本:12.2.0.1.180717

以下是分析和處理過程:

## 1、資料庫日誌

...

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

2023-02-15T16:00:58.833384+08:00

Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl18_107783.trc:

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

2023-02-15T16:00:58.833398+08:00

Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl11_267126.trc:

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

2023-02-15T16:00:58.833460+08:00

Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl07_6783.trc:

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

2023-02-15T16:00:58.833478+08:00

Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl17_163546.trc:

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

2023-02-15T16:00:58.833498+08:00

Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl32_372326.trc:

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

2023-02-15T16:00:58.833511+08:00

Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl00_372255.trc:

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

2023-02-15T16:00:58.833515+08:00

Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl24_54355.trc:

ORA-00376: file 7 cannot be read at this time

...

資料庫日誌忠報錯為ORA-00376,ORA-0110,從日誌看讀取undo表空間的'+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'資料檔案沒有成功,產生多個trace。

##  2、trace日誌


```skedcdb2_cl00_372255.trc

*** 2023-02-15T15:25:46.303098+08:00 (CDB$ROOT(1))

kssxdl signal 376: 0xbcd85410 = transaction (78) @ false_loc 0

  ----------------------------------------

  SO: 0xbcd85410, type: 78, owner: 0x79849860, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3

   proc=0xe81373f0, name=transaction, file=ktccts.h LINE:451, pg=0, conuid=1

   SGA version=(1,0)

ORA-00376: file 7 cannot be read at this time

ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'

kssxdl success: so 0x3fb9b5a00 = session

PMON waiting for 1000 csecs


trace日誌的內容和alert一致,讀書資料檔案失敗,PMON程式有等待。

## 3、ASM日誌

2023-02-06T01:05:00.756765+08:00

Exadata cell: o/..... is no longer accessible. I/O errors to disks on this might get suppressed

2023-02-06T01:05:00.868575+08:00

Some DDE async actions failed or were cancelled

NOTE: failed resync of disk group 1 disks

DATAC1_CD_02_CRMPCELADM03 (4)

DATAC1_CD_00_CRMPCELADM03 (8)

DATAC1_CD_03_CRMPCELADM03 (10)

DATAC1_CD_04_CRMPCELADM03 (16)

DATAC1_CD_09_CRMPCELADM03 (22)

DATAC1_CD_10_CRMPCELADM03 (33)

DATAC1_CD_05_CRMPCELADM03 (39)

DATAC1_CD_11_CRMPCELADM03 (40)


ORA-15032: not all alterations performed

ORA-15080: synchronous I/O operation failed to write block 0 of disk 22 in disk group DATAC1

ORA-15080: synchronous I/O operation failed to write block 0 of disk 4 in disk group DATAC1

ORA-15080: synchronous I/O operation failed to write block 0 of disk 33 in disk group DATAC1

ORA-15080: synchronous I/O operation failed to write block 0 of disk 39 in disk group DATAC1


2023-02-06T01:05:01.106577+08:00

ERROR: /* Exadata Auto Mgmt: ONLINE ASM Disk */

alter diskgroup DATAC1 online disk DATAC1_CD_04_CRMPCELADM03

  , DATAC1_CD_02_CRMPCELADM03

  , DATAC1_CD_05_CRMPCELADM03

  , DATAC1_CD_03_CRMPCELADM03

  , DATAC1_CD_00_CRMPCELADM03

  , DATAC1_CD_10_CRMPCELADM03

  , DATAC1_CD_11_CRMPCELADM03

  , DATAC1_CD_09_CRMPCELADM03

nowait


WARNING: Exadata Auto Management: OS PID: 331356 Operation ID: 911602: ONLINE disk RECOC1_CD_04_CRMPCELADM03 in diskgroup RECOC1 Failed

ASM日誌中有I/O錯誤,自動新增磁碟沒有成功。 ORA-15080 Synchronous I/O Operation Failed With Exadata error:'HARD check failed' (Doc ID 2412871.1)

對於該錯誤按官方建議可以打補丁,也可以透過以下方式

Workaround:-


關閉ASM例項的磁碟檢測

```sql

SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'FALSE';


ALTER DISKGROUP RECOC1 REBALANCE POWER 32;

完成rebalance後再開啟磁碟檢測

SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'TRUE';

```

## 4、處理過程

(1)對資料庫進行rman全備。

(2)檢測資料檔案狀態

```sql

SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';                                                      


FILE#                                  NAME               STATUS     ENABLED

------             --------------------------------     ---------   ---------

3  +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393       RECOVER   READ WRITE

7  +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405      RECOVER   READ WRITE

10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165    RECOVER   READ WRITE

```

有3個資料檔案狀態為RECOVER狀態,其中CDB中2個sysaux和udnotbs2表空間各1個,pdb中sysaux表空間1個資料檔案。

無法查詢表空間使用率

```sql

SQL> select a.tablespace_name,

        round(a.s,2) "CURRENT_TOTAL(MB)",

        round(a.s - f.s,2) "USED(MB)",

        f.s "FREE(MB)",

        round((a.s-f.s)/ a.s * 100, 2) "USED%",

        g.autoextensible,

        round(a.ms,2) "MAX_TOTAL(MB)",

        round((a.s-f.s)/ a.ms * 100, 2) "MAX USED%"

   from (select d.tablespace_name,

                sum(bytes / 1024 / 1024) s,

                sum(decode(maxbytes, 0, bytes, maxbytes) / 1024 / 1024) ms

           from dba_data_files d

          group by d.tablespace_name) a,

        (select f.tablespace_name, sum(f.bytes / 1024 / 1024) s

           from dba_free_space f

          group by f.tablespace_name) f,

        (select distinct tablespace_name, autoextensible

           from DBA_DATA_FILES

          where autoextensible = 'YES'

         union

         select distinct tablespace_name, autoextensible

           from DBA_DATA_FILES

          where autoextensible = 'NO'

            and tablespace_name not in

                (select distinct tablespace_name

                   from DBA_DATA_FILES

                  where autoextensible = 'YES')) g

    where a.tablespace_name = f.tablespace_name

    and g.tablespace_name = f.tablespace_name order by "MAX USED%";

         from dba_data_files d

              *

```

ERROR at line 12:

ORA-00376: file 3 cannot be read at this time

ORA-01110: data file 3: '+DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393'

(3)重建undo表空間

```sql

 SQL> create undo tablespace UNDOTBS3 datafile '+DATAC1/SKEDCDB/DATAFILE/undotbs3_01.dbf' size 5G autoextend on;

```

(4)修復資料檔案

```sql

rman target /

Recovery Manager: Release 12.2.0.1.0 - Production on Mon Feb 20 22:22:50 2023

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.

connected to target database: PLANNING (DBID=2741129465)

RMAN> 

RMAN> recover datafile 3;

Starting recover at 20-FEB-23

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=2185 instance=SKEDCDB1 device type=DISK


starting media recovery


archived log for thread 1 with sequence 14340 is already on disk as file +DATAC1/SKEDCDB/ARCHIVELOG/2023_02_20/thread_1_seq_14340.794.1129321103

...

```

透過rman修復3個資料檔案,然後將資料檔案onlin

在CDB中進行3號資料檔案online

```sql

SQL> alter database datafile 3 online;



SQL>  alter database datafile 10 online;

 alter database datafile 10 online

*

ERROR at line 1:

ORA-01516: nonexistent log file, data file, or temporary file "10" in the

current container

PDB的資料檔案需要在pdb中進行online

```

(5)檢查資料檔案狀態

```sql

SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';                                                      


FILE#                                  NAME               STATUS     ENABLED

------             --------------------------------     ---------   ---------

3  +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393       ONLINE   READ WRITE

7  +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405      ONLINE   READ WRITE

10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165    ONLINE   READ WRITE

```

資料檔案狀態正常

(6)刪除undotbs2表空間

```sql

SQL> drop tablespace undotbs2 including contents and datafiles;

drop tablespace undotbs2 including contents and datafiles

*

ERROR at line 1:

ORA-01548: active rollback segment '_SYSSMU11_759994052$' found, terminate

dropping tablespace



SQL> Select segment_name, status, tablespace_name

  2    from dba_rollback_segs

  3   where status not in ('ONLINE', 'OFFLINE');


SEGMENT_NAME                   STATUS           TABLESPACE_NAME

------------------------------ ---------------- ------------------------------

_SYSSMU11_759994052$           NEEDS RECOVERY   UNDOTBS2

_SYSSMU12_3581412579$          NEEDS RECOVERY   UNDOTBS2

_SYSSMU13_398964664$           NEEDS RECOVERY   UNDOTBS2

_SYSSMU14_2753826883$          NEEDS RECOVERY   UNDOTBS2

_SYSSMU15_1616211766$          NEEDS RECOVERY   UNDOTBS2

_SYSSMU16_2927909258$          NEEDS RECOVERY   UNDOTBS2

_SYSSMU17_4161369910$          NEEDS RECOVERY   UNDOTBS2

_SYSSMU18_2116035987$          NEEDS RECOVERY   UNDOTBS2

_SYSSMU19_2637726138$          NEEDS RECOVERY   UNDOTBS2

_SYSSMU20_3386602257$          NEEDS RECOVERY   UNDOTBS2


10 rows selected.

```

引數檔案加入以下內容

_CORRUPTED_ROLLBACK_SEGMENTS=(_SYSSMU11_759994052$,_SYSSMU12_3581412579$,_SYSSMU13_398964664$,_SYSSMU14_2753826883$,_SYSSMU15_1616211766$,_SYSSMU16_2927909258$,_SYSSMU17_4161369910$,_SYSSMU18_2116035987$,_SYSSMU19_2637726138$,_SYSSMU20_3386602257$)

使用pfile重新啟動資料庫

```sql

SQL> startup pfile='/home/oracle/pfile_skedcdb_0220.ora';

ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated

ORACLE instance started.


Total System Global Area 2.6844E+10 bytes

Fixed Size                 26502656 bytes

Variable Size            1.2176E+10 bytes

Database Buffers         1.4563E+10 bytes

Redo Buffers               77959168 bytes

Database mounted.

Database opened.

SQL> 

SQL> 

SQL> show pdbs    


    CON_ID CON_NAME                       OPEN MODE  RESTRICTED

---------- ------------------------------ ---------- ----------

         2 PDB$SEED                       READ ONLY  NO

         3 SKEDCDB                       MOUNTED

刪除undotbs2表空間

SQL> drop tablespace undotbs2 including contents and datafiles;


Tablespace dropped.


SQL> 

SQL> 

重啟資料庫

SQL>  shutdown immediate

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL> 

SQL> 

SQL> 

SQL> 

SQL> 

SQL> 

SQL> startup 

ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance

ORA-29707: inconsistent value 0 for initialization parameter _lm_use_tx_tsn with other instances


undo引數2個節點不一致導致該報錯

SQL> 

SQL> 

SQL> create pfile='/tmp/pfile.ora' from spfile;

create pfile='/tmp/pfile.ora' from spfile

*

ERROR at line 1:

ORA-01565: error in identifying file '?=/dbs/spfile@.ora'

ORA-27037: unable to obtain file status

Linux-x86_64 Error: 2: No such file or directory

Additional information: 7



SQL> startup pfile='/home/oracle/pfileskedcdb_0220.ora';

ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated

ORACLE instance started.


Total System Global Area 2.6844E+10 bytes

Fixed Size                 26502656 bytes

Variable Size            1.2176E+10 bytes

Database Buffers         1.4563E+10 bytes

Redo Buffers               77959168 bytes

Database mounted.

Database opened.

SQL> 

【小結】資料檔案狀態修復後,資料庫alert日誌不在產生讀取資料檔案失敗相關的trace,u01目錄使用率正常;建議生產環境資料庫日誌和資料檔案狀態等關鍵指標應該加強監控和巡檢,發現問題及時處理。

-the end-


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28373936/viewspace-2938782/,如需轉載,請註明出處,否則將追究法律責任。

相關文章