Oracle X7一體機儲存節點重啟導致u01使用率不斷增大
背景:儲存節點發生儲存,資料檔案狀態不支援,產生trace過多導致u01使用率不斷增高
基礎環境:Oracle X7一體機 資料庫版本:12.2.0.1.180717
以下是分析和處理過程:
## 1、資料庫日誌
...
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833384+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl18_107783.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833398+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl11_267126.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833460+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl07_6783.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833478+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl17_163546.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833498+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl32_372326.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833511+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl00_372255.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
2023-02-15T16:00:58.833515+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl24_54355.trc:
ORA-00376: file 7 cannot be read at this time
...
資料庫日誌忠報錯為ORA-00376,ORA-0110,從日誌看讀取undo表空間的'+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'資料檔案沒有成功,產生多個trace。
## 2、trace日誌
```skedcdb2_cl00_372255.trc
*** 2023-02-15T15:25:46.303098+08:00 (CDB$ROOT(1))
kssxdl signal 376: 0xbcd85410 = transaction (78) @ false_loc 0
----------------------------------------
SO: 0xbcd85410, type: 78, owner: 0x79849860, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
proc=0xe81373f0, name=transaction, file=ktccts.h LINE:451, pg=0, conuid=1
SGA version=(1,0)
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
kssxdl success: so 0x3fb9b5a00 = session
PMON waiting for 1000 csecs
trace日誌的內容和alert一致,讀書資料檔案失敗,PMON程式有等待。
## 3、ASM日誌
2023-02-06T01:05:00.756765+08:00
Exadata cell: o/..... is no longer accessible. I/O errors to disks on this might get suppressed
2023-02-06T01:05:00.868575+08:00
Some DDE async actions failed or were cancelled
NOTE: failed resync of disk group 1 disks
DATAC1_CD_02_CRMPCELADM03 (4)
DATAC1_CD_00_CRMPCELADM03 (8)
DATAC1_CD_03_CRMPCELADM03 (10)
DATAC1_CD_04_CRMPCELADM03 (16)
DATAC1_CD_09_CRMPCELADM03 (22)
DATAC1_CD_10_CRMPCELADM03 (33)
DATAC1_CD_05_CRMPCELADM03 (39)
DATAC1_CD_11_CRMPCELADM03 (40)
ORA-15032: not all alterations performed
ORA-15080: synchronous I/O operation failed to write block 0 of disk 22 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 4 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 33 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 39 in disk group DATAC1
2023-02-06T01:05:01.106577+08:00
ERROR: /* Exadata Auto Mgmt: ONLINE ASM Disk */
alter diskgroup DATAC1 online disk DATAC1_CD_04_CRMPCELADM03
, DATAC1_CD_02_CRMPCELADM03
, DATAC1_CD_05_CRMPCELADM03
, DATAC1_CD_03_CRMPCELADM03
, DATAC1_CD_00_CRMPCELADM03
, DATAC1_CD_10_CRMPCELADM03
, DATAC1_CD_11_CRMPCELADM03
, DATAC1_CD_09_CRMPCELADM03
nowait
WARNING: Exadata Auto Management: OS PID: 331356 Operation ID: 911602: ONLINE disk RECOC1_CD_04_CRMPCELADM03 in diskgroup RECOC1 Failed
ASM日誌中有I/O錯誤,自動新增磁碟沒有成功。 ORA-15080 Synchronous I/O Operation Failed With Exadata error:'HARD check failed' (Doc ID 2412871.1)
對於該錯誤按官方建議可以打補丁,也可以透過以下方式
Workaround:-
關閉ASM例項的磁碟檢測
```sql
SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'FALSE';
ALTER DISKGROUP RECOC1 REBALANCE POWER 32;
完成rebalance後再開啟磁碟檢測
SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'TRUE';
```
## 4、處理過程
(1)對資料庫進行rman全備。
(2)檢測資料檔案狀態
```sql
SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';
FILE# NAME STATUS ENABLED
------ -------------------------------- --------- ---------
3 +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393 RECOVER READ WRITE
7 +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405 RECOVER READ WRITE
10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165 RECOVER READ WRITE
```
有3個資料檔案狀態為RECOVER狀態,其中CDB中2個sysaux和udnotbs2表空間各1個,pdb中sysaux表空間1個資料檔案。
無法查詢表空間使用率
```sql
SQL> select a.tablespace_name,
round(a.s,2) "CURRENT_TOTAL(MB)",
round(a.s - f.s,2) "USED(MB)",
f.s "FREE(MB)",
round((a.s-f.s)/ a.s * 100, 2) "USED%",
g.autoextensible,
round(a.ms,2) "MAX_TOTAL(MB)",
round((a.s-f.s)/ a.ms * 100, 2) "MAX USED%"
from (select d.tablespace_name,
sum(bytes / 1024 / 1024) s,
sum(decode(maxbytes, 0, bytes, maxbytes) / 1024 / 1024) ms
from dba_data_files d
group by d.tablespace_name) a,
(select f.tablespace_name, sum(f.bytes / 1024 / 1024) s
from dba_free_space f
group by f.tablespace_name) f,
(select distinct tablespace_name, autoextensible
from DBA_DATA_FILES
where autoextensible = 'YES'
union
select distinct tablespace_name, autoextensible
from DBA_DATA_FILES
where autoextensible = 'NO'
and tablespace_name not in
(select distinct tablespace_name
from DBA_DATA_FILES
where autoextensible = 'YES')) g
where a.tablespace_name = f.tablespace_name
and g.tablespace_name = f.tablespace_name order by "MAX USED%";
from dba_data_files d
*
```
ERROR at line 12:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '+DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393'
(3)重建undo表空間
```sql
SQL> create undo tablespace UNDOTBS3 datafile '+DATAC1/SKEDCDB/DATAFILE/undotbs3_01.dbf' size 5G autoextend on;
```
(4)修復資料檔案
```sql
rman target /
Recovery Manager: Release 12.2.0.1.0 - Production on Mon Feb 20 22:22:50 2023
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
connected to target database: PLANNING (DBID=2741129465)
RMAN>
RMAN> recover datafile 3;
Starting recover at 20-FEB-23
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2185 instance=SKEDCDB1 device type=DISK
starting media recovery
archived log for thread 1 with sequence 14340 is already on disk as file +DATAC1/SKEDCDB/ARCHIVELOG/2023_02_20/thread_1_seq_14340.794.1129321103
...
```
透過rman修復3個資料檔案,然後將資料檔案onlin
在CDB中進行3號資料檔案online
```sql
SQL> alter database datafile 3 online;
SQL> alter database datafile 10 online;
alter database datafile 10 online
*
ERROR at line 1:
ORA-01516: nonexistent log file, data file, or temporary file "10" in the
current container
PDB的資料檔案需要在pdb中進行online
```
(5)檢查資料檔案狀態
```sql
SQL> select file#,name,status,enabled from v$datafile where status='RECOVER';
FILE# NAME STATUS ENABLED
------ -------------------------------- --------- ---------
3 +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393 ONLINE READ WRITE
7 +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405 ONLINE READ WRITE
10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165 ONLINE READ WRITE
```
資料檔案狀態正常
(6)刪除undotbs2表空間
```sql
SQL> drop tablespace undotbs2 including contents and datafiles;
drop tablespace undotbs2 including contents and datafiles
*
ERROR at line 1:
ORA-01548: active rollback segment '_SYSSMU11_759994052$' found, terminate
dropping tablespace
SQL> Select segment_name, status, tablespace_name
2 from dba_rollback_segs
3 where status not in ('ONLINE', 'OFFLINE');
SEGMENT_NAME STATUS TABLESPACE_NAME
------------------------------ ---------------- ------------------------------
_SYSSMU11_759994052$ NEEDS RECOVERY UNDOTBS2
_SYSSMU12_3581412579$ NEEDS RECOVERY UNDOTBS2
_SYSSMU13_398964664$ NEEDS RECOVERY UNDOTBS2
_SYSSMU14_2753826883$ NEEDS RECOVERY UNDOTBS2
_SYSSMU15_1616211766$ NEEDS RECOVERY UNDOTBS2
_SYSSMU16_2927909258$ NEEDS RECOVERY UNDOTBS2
_SYSSMU17_4161369910$ NEEDS RECOVERY UNDOTBS2
_SYSSMU18_2116035987$ NEEDS RECOVERY UNDOTBS2
_SYSSMU19_2637726138$ NEEDS RECOVERY UNDOTBS2
_SYSSMU20_3386602257$ NEEDS RECOVERY UNDOTBS2
10 rows selected.
```
引數檔案加入以下內容
_CORRUPTED_ROLLBACK_SEGMENTS=(_SYSSMU11_759994052$,_SYSSMU12_3581412579$,_SYSSMU13_398964664$,_SYSSMU14_2753826883$,_SYSSMU15_1616211766$,_SYSSMU16_2927909258$,_SYSSMU17_4161369910$,_SYSSMU18_2116035987$,_SYSSMU19_2637726138$,_SYSSMU20_3386602257$)
使用pfile重新啟動資料庫
```sql
SQL> startup pfile='/home/oracle/pfile_skedcdb_0220.ora';
ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated
ORACLE instance started.
Total System Global Area 2.6844E+10 bytes
Fixed Size 26502656 bytes
Variable Size 1.2176E+10 bytes
Database Buffers 1.4563E+10 bytes
Redo Buffers 77959168 bytes
Database mounted.
Database opened.
SQL>
SQL>
SQL> show pdbs
CON_ID CON_NAME OPEN MODE RESTRICTED
---------- ------------------------------ ---------- ----------
2 PDB$SEED READ ONLY NO
3 SKEDCDB MOUNTED
刪除undotbs2表空間
SQL> drop tablespace undotbs2 including contents and datafiles;
Tablespace dropped.
SQL>
SQL>
重啟資料庫
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
SQL>
SQL>
SQL>
SQL>
SQL>
SQL> startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORA-29707: inconsistent value 0 for initialization parameter _lm_use_tx_tsn with other instances
undo引數2個節點不一致導致該報錯
SQL>
SQL>
SQL> create pfile='/tmp/pfile.ora' from spfile;
create pfile='/tmp/pfile.ora' from spfile
*
ERROR at line 1:
ORA-01565: error in identifying file '?=/dbs/spfile@.ora'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7
SQL> startup pfile='/home/oracle/pfileskedcdb_0220.ora';
ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated
ORACLE instance started.
Total System Global Area 2.6844E+10 bytes
Fixed Size 26502656 bytes
Variable Size 1.2176E+10 bytes
Database Buffers 1.4563E+10 bytes
Redo Buffers 77959168 bytes
Database mounted.
Database opened.
SQL>
【小結】資料檔案狀態修復後,資料庫alert日誌不在產生讀取資料檔案失敗相關的trace,u01目錄使用率正常;建議生產環境資料庫日誌和資料檔案狀態等關鍵指標應該加強監控和巡檢,發現問題及時處理。
-the end-
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/28373936/viewspace-2938782/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- oracle兩節點RAC,由於gipc導致某節點crs無法啟動問題分析Oracle
- ORACLE 11.2.0.4 for solaris更換硬體後主機時間改變導致一節點叢集服務無法啟動Oracle
- 電源紋波偏高導致產品不斷自動重啟經驗案例分析
- ORACLE 11.2.0.4 rac for linux 鏈路宕導致的單節點異常當機OracleLinux
- EVA4400儲存斷電導致資料丟失如何恢復
- Oracle sysman.mgmt_jobs導致資料庫自動重啟Oracle資料庫
- 關於Oracle 11G RAC雙節點之間存在防火牆導致只能一個節點執行Oracle防火牆
- Oracle SYSAUX 表空間使用率100% 導致的DB 故障OracleUX
- 【儲存資料恢復案例】儲存斷電後無法成功重啟,虛擬機器無法啟動-資料恢復資料恢復虛擬機
- 【RAC】Oracle19.13之後的grid,節點重啟後不會自動驅動Oracle
- runc hang 導致 Kubernetes 節點 NotReady
- 儲存意外斷電導致raid資訊丟失的解決過程AI
- RAC節點hang住, oracle bug導致了cpu過高,無法啟動叢集隔離Oracle
- ORACLE一體機pcie 快取卡損壞導致資料庫dang機Oracle快取資料庫
- 關於沒有熔斷降級導致服務重啟問題
- 伺服器意外斷電導致無法重啟資料恢復伺服器資料恢復
- 記php-fpm重啟導致的一個bugPHP
- 效能分析(5)- 軟中斷導致 CPU 使用率過高的案例
- SQL SERVER 臨時表導致儲存過程重編譯(recompile)的一些探討SQLServer儲存過程編譯Compile
- 記一次儲存問題導致的rac故障案例
- SandStone HuaYan安全儲存一體機
- 區塊鏈3.0儲存礦機-YottaChain芝麻雲節點區塊鏈AI
- Android之點選Home鍵後再次開啟導致APP重啟問題AndroidAPP
- alicdn邊緣節點不穩定導致頁面崩潰問題
- Solaris叢集節點重啟
- mstar因裝置讀不到導致,待機重啟問題
- 【ASK_ORACLE】因process用盡導致的rac重啟的解決方法Oracle
- PyTorch儲存模型斷點以及載入斷點繼續訓練PyTorch模型斷點
- 記一次Oracle RAC for aix 儲存雙控鎖盤導致ASM控制檔案損壞恢復OracleAIASM
- 記一次oracle 19c RAC叢集重啟單節點DB啟動異常(二)Oracle
- ORACLE RAC 兩節點db_32k_cache_size設定不當導致表truncate失敗之ORA-00379Oracle
- 記一次 Mac 意外重啟導致的 Homestead 問題Mac
- 為什麼說UFS儲存晶片是手機儲存的未來?宏旺半導體一文解析晶片
- 多路徑配置問題和ACFS啟用原因導致rac二節點不能正常啟動
- Oracle RAC啟動因CTSS導致的異常Oracle
- Oracle儲存過程乾貨(一):儲存過程基礎Oracle儲存過程
- ORACLE RAC 11.2.0.4 ASM加盤導致叢集重啟之ASM sga設定過小OracleASM
- 網站主機CPU或記憶體使用率過高導致網站無法訪問網站記憶體