Oracle X7一體機儲存節點重啟導致u01使用率不斷增大
背景:儲存節點發生儲存,資料檔案狀態不支援,產生trace過多導致u01使用率不斷增高
基礎環境:Oracle X7一體機 資料庫版本:12.2.0.1.180717
以下是分析和處理過程:
## 1、資料庫日誌
...
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833384+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl18_107783.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833398+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl11_267126.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833460+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl07_6783.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833478+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl17_163546.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833498+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl32_372326.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833511+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl00_372255.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833515+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl24_54355.trc:
ORA-00376: file 7 cannot be read at this time
...
資料庫日誌忠報錯為ORA-00376,ORA-0110,從日誌看讀取undo表空間的'+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'資料檔案沒有成功,產生多個trace。
## 2、trace日誌
```skedcdb2_cl00_372255.trc
*** 2023-02-15T15:25:46.303098+08:00 (CDB$ROOT(1))
kssxdl signal 376: 0xbcd85410 = transaction (78) @ false_loc 0
----------------------------------------
SO: 0xbcd85410, type: 78, owner: 0x79849860, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
proc=0xe81373f0, name=transaction, file=ktccts.h LINE:451, pg=0, conuid=1
SGA version=(1,0)
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
kssxdl success: so 0x3fb9b5a00 = session
PMON waiting for 1000 csecs
trace日誌的內容和alert一致,讀書資料檔案失敗,PMON程式有等待。
## 3、ASM日誌
2023-02-06T01:05:00.756765+08:00
Exadata cell: o/..... is no longer accessible. I/O errors to disks on this might get suppressed
2023-02-06T01:05:00.868575+08:00
Some DDE async actions failed or were cancelled
NOTE: failed resync of disk group 1 disks
DATAC1_CD_02_CRMPCELADM03 (4)
DATAC1_CD_00_CRMPCELADM03 (8)
DATAC1_CD_03_CRMPCELADM03 (10)
DATAC1_CD_04_CRMPCELADM03 (16)
DATAC1_CD_09_CRMPCELADM03 (22)
DATAC1_CD_10_CRMPCELADM03 (33)
DATAC1_CD_05_CRMPCELADM03 (39)
DATAC1_CD_11_CRMPCELADM03 (40)
ORA-15032: not all alterations performed
ORA-15080: synchronous I/O operation failed to write block 0 of disk 22 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 4 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 33 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 39 in disk group DATAC1
2023-02-06T01:05:01.106577+08:00
ERROR: /* Exadata Auto Mgmt: ONLINE ASM Disk */
alter diskgroup DATAC1 online disk DATAC1_CD_04_CRMPCELADM03
, DATAC1_CD_02_CRMPCELADM03
, DATAC1_CD_05_CRMPCELADM03
, DATAC1_CD_03_CRMPCELADM03
, DATAC1_CD_00_CRMPCELADM03
, DATAC1_CD_10_CRMPCELADM03
, DATAC1_CD_11_CRMPCELADM03
, DATAC1_CD_09_CRMPCELADM03
nowait
WARNING: Exadata Auto Management: OS PID: 331356 Operation ID: 911602: ONLINE disk RECOC1_CD_04_CRMPCELADM03 in diskgroup RECOC1 Failed
ASM日誌中有I/O錯誤,自動新增磁碟沒有成功。 ORA-15080 Synchronous I/O Operation Failed With Exadata error:'HARD check failed' (Doc ID 2412871.1)
對於該錯誤按官方建議可以打補丁,也可以透過以下方式
Workaround:-
關閉ASM例項的磁碟檢測
```sql
SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'FALSE';
ALTER DISKGROUP RECOC1 REBALANCE POWER 32;
完成rebalance後再開啟磁碟檢測
SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'TRUE';
```
## 4、處理過程
(1)對資料庫進行rman全備。
(2)檢測資料檔案狀態
```sql
SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';
FILE# NAME STATUS ENABLED
------ -------------------------------- --------- ---------
3 +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393 RECOVER READ WRITE
7 +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405 RECOVER READ WRITE
10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165 RECOVER READ WRITE
```
有3個資料檔案狀態為RECOVER狀態,其中CDB中2個sysaux和udnotbs2表空間各1個,pdb中sysaux表空間1個資料檔案。
無法查詢表空間使用率
```sql
SQL> select a.tablespace_name,
round(a.s,2) "CURRENT_TOTAL(MB)",
round(a.s - f.s,2) "USED(MB)",
f.s "FREE(MB)",
round((a.s-f.s)/ a.s * 100, 2) "USED%",
g.autoextensible,
round(a.ms,2) "MAX_TOTAL(MB)",
round((a.s-f.s)/ a.ms * 100, 2) "MAX USED%"
from (select d.tablespace_name,
sum(bytes / 1024 / 1024) s,
sum(decode(maxbytes, 0, bytes, maxbytes) / 1024 / 1024) ms
from dba_data_files d
group by d.tablespace_name) a,
(select f.tablespace_name, sum(f.bytes / 1024 / 1024) s
from dba_free_space f
group by f.tablespace_name) f,
(select distinct tablespace_name, autoextensible
from DBA_DATA_FILES
where autoextensible = 'YES'
union
select distinct tablespace_name, autoextensible
from DBA_DATA_FILES
where autoextensible = 'NO'
and tablespace_name not in
(select distinct tablespace_name
from DBA_DATA_FILES
where autoextensible = 'YES')) g
where a.tablespace_name = f.tablespace_name
and g.tablespace_name = f.tablespace_name order by "MAX USED%";
from dba_data_files d
*
```
ERROR at line 12:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '+DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393'
(3)重建undo表空間
```sql
SQL> create undo tablespace UNDOTBS3 datafile '+DATAC1/SKEDCDB/DATAFILE/undotbs3_01.dbf' size 5G autoextend on;
```
(4)修復資料檔案
```sql
rman target /
Recovery Manager: Release 12.2.0.1.0 - Production on Mon Feb 20 22:22:50 2023
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
connected to target database: PLANNING (DBID=2741129465)
RMAN>
RMAN> recover datafile 3;
Starting recover at 20-FEB-23
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2185 instance=SKEDCDB1 device type=DISK
starting media recovery
archived log for thread 1 with sequence 14340 is already on disk as file +DATAC1/SKEDCDB/ARCHIVELOG/2023_02_20/thread_1_seq_14340.794.1129321103
...
```
透過rman修復3個資料檔案,然後將資料檔案onlin
在CDB中進行3號資料檔案online
```sql
SQL> alter database datafile 3 online;
SQL> alter database datafile 10 online;
alter database datafile 10 online
*
ERROR at line 1:
ORA-01516: nonexistent log file, data file, or temporary file "10" in the
current container
PDB的資料檔案需要在pdb中進行online
```
(5)檢查資料檔案狀態
```sql
SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';
FILE# NAME STATUS ENABLED
------ -------------------------------- --------- ---------
3 +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393 ONLINE READ WRITE
7 +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405 ONLINE READ WRITE
10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165 ONLINE READ WRITE
```
資料檔案狀態正常
(6)刪除undotbs2表空間
```sql
SQL> drop tablespace undotbs2 including contents and datafiles;
drop tablespace undotbs2 including contents and datafiles
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU11_759994052$' found, terminate
dropping tablespace
SQL> Select segment_name, status, tablespace_name
2 from dba_rollback_segs
3 where status not in ('ONLINE', 'OFFLINE');
SEGMENT_NAME STATUS TABLESPACE_NAME
------------------------------ ---------------- ------------------------------
_SYSSMU11_759994052$ NEEDS RECOVERY UNDOTBS2
_SYSSMU12_3581412579$ NEEDS RECOVERY UNDOTBS2
_SYSSMU13_398964664$ NEEDS RECOVERY UNDOTBS2
_SYSSMU14_2753826883$ NEEDS RECOVERY UNDOTBS2
_SYSSMU15_1616211766$ NEEDS RECOVERY UNDOTBS2
_SYSSMU16_2927909258$ NEEDS RECOVERY UNDOTBS2
_SYSSMU17_4161369910$ NEEDS RECOVERY UNDOTBS2
_SYSSMU18_2116035987$ NEEDS RECOVERY UNDOTBS2
_SYSSMU19_2637726138$ NEEDS RECOVERY UNDOTBS2
_SYSSMU20_3386602257$ NEEDS RECOVERY UNDOTBS2
10 rows selected.
```
引數檔案加入以下內容
_CORRUPTED_ROLLBACK_SEGMENTS=(_SYSSMU11_759994052$,_SYSSMU12_3581412579$,_SYSSMU13_398964664$,_SYSSMU14_2753826883$,_SYSSMU15_1616211766$,_SYSSMU16_2927909258$,_SYSSMU17_4161369910$,_SYSSMU18_2116035987$,_SYSSMU19_2637726138$,_SYSSMU20_3386602257$)
使用pfile重新啟動資料庫
```sql
SQL> startup pfile='/home/oracle/pfile_skedcdb_0220.ora';
ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated
ORACLE instance started.
Total System Global Area 2.6844E+10 bytes
Fixed Size 26502656 bytes
Variable Size 1.2176E+10 bytes
Database Buffers 1.4563E+10 bytes
Redo Buffers 77959168 bytes
Database mounted.
Database opened.
SQL>
SQL>
SQL> show pdbs
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 SKEDCDB MOUNTED
刪除undotbs2表空間
SQL> drop tablespace undotbs2 including contents and datafiles;
Tablespace dropped.
SQL>
SQL>
重啟資料庫
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORA-29707: inconsistent value 0 for initialization parameter _lm_use_tx_tsn with other instances
undo引數2個節點不一致導致該報錯
SQL>
SQL>
SQL> create pfile='/tmp/pfile.ora' from spfile;
create pfile='/tmp/pfile.ora' from spfile
*
ERROR at line 1:
ORA-01565: error in identifying file '?=/dbs/spfile@.ora'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
SQL> startup pfile='/home/oracle/pfileskedcdb_0220.ora';
ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated
ORACLE instance started.
Total System Global Area 2.6844E+10 bytes
Fixed Size 26502656 bytes
Variable Size 1.2176E+10 bytes
Database Buffers 1.4563E+10 bytes
Redo Buffers 77959168 bytes
Database mounted.
Database opened.
SQL>
【小結】資料檔案狀態修復後,資料庫alert日誌不在產生讀取資料檔案失敗相關的trace,u01目錄使用率正常;建議生產環境資料庫日誌和資料檔案狀態等關鍵指標應該加強監控和巡檢,發現問題及時處理。
-the end-
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28373936/viewspace-2938782/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Linux,Network manager 導致節點異常重啟Linux
- 儲存裝置許可權不對導致crs啟動出錯
- service network restart 命令使用時導致叢集該節點重啟REST
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- 一次oracle 節點重啟問題的定位Oracle
- 電源紋波偏高導致產品不斷自動重啟經驗案例分析
- ORACLE 11.2.0.4 for solaris更換硬體後主機時間改變導致一節點叢集服務無法啟動Oracle
- oracle 11.2.0.2 版本產生的私網地址不通會導致第2個節點自動重啟Oracle
- asm例項自動dismount導致rac一個節點當機ASM
- vmware配置的共享儲存_節點間fdisk資訊同步不一致
- Microsoft Visual Studio 2010導致系統C盤不斷增大問題處理。ROS
- 同時開啟節點導致資料DDL操作慢 ??
- 一節點啟動不起來(ocssd)CSS
- ORACLE windows伺服器斷電重啟後計劃任務變更導致rman備份中斷OracleWindows伺服器
- EVA4400儲存斷電導致資料丟失如何恢復
- 【儲存資料恢復案例】儲存斷電後無法成功重啟,虛擬機器無法啟動-資料恢復資料恢復虛擬機
- 記php-fpm重啟導致的一個bugPHP
- RAC資料庫一節點更換HBA卡導致emc儲存裝置序號變動處理記錄資料庫
- 關於沒有熔斷降級導致服務重啟問題
- 伺服器意外斷電導致無法重啟資料恢復伺服器資料恢復
- 關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行Oracle防火牆
- 儲存意外斷電導致raid資訊丟失的解決過程AI
- ORACLE 11.2.0.4 rac for linux 鏈路宕導致的單節點異常當機OracleLinux
- Oracle sysman.mgmt_jobs導致資料庫自動重啟Oracle資料庫
- 區塊鏈3.0儲存礦機-YottaChain芝麻雲節點區塊鏈AI
- SandStone HuaYan安全儲存一體機
- Linux主機名修改後導致mysql重啟失敗LinuxMySql
- 記一次儲存問題導致的rac故障案例
- 效能分析(5)- 軟中斷導致 CPU 使用率過高的案例
- RAC節點hang住, oracle bug導致了cpu過高,無法啟動叢集隔離Oracle
- ORACLE一體機pcie 快取卡損壞導致資料庫dang機Oracle快取資料庫
- multicast導致節點無法加入叢集AST
- SQL SERVER 臨時表導致儲存過程重編譯(recompile)的一些探討SQLServer儲存過程編譯Compile
- Exadata虛擬機器模擬--安裝儲存節點虛擬機
- exadata vmwate 安裝儲存節點
- IP地址被清空導致例項重啟
- 【Oracle】11gR2 grid單機使用asm儲存修改主機名後導致的故障處理OracleASM
- 【RAC】Oracle19.13之後的grid,節點重啟後不會自動驅動Oracle