oracle 案例-控制檔案丟失故障處理過程

paulyibinyi發表於2009-06-20

      今天下午在酒店休息時,下午5點左右接到使用者電話,說資料庫出問題了,導致應用不能訪問
環境 aix 5.3+oracle 9.2.0.8+adtx儲存 ,就趕緊用電腦遠端連線上去
一.出現以下現象:
1.sqlplus "/as sysdba"  一直hang在那裡
2.more alert_stjj.log  一直hang在那裡
3.errpt|more 一直hang在那裡
做任何操作都不行

二.查詢故障
沒辦法,只有重啟主機了,
重啟主機後,馬上啟動資料庫
su - oracle
sqlplus "/as sysdba"
startup
ORACLE instance started.
Total System Global Area                       4449853568 bytes
Fixed Size                                          62848 bytes
Variable Size                                    65683456 bytes
Database Buffers                                2621440000 bytes
Redo Buffers                                       172032 bytes
ORA-00205: error in identifying controlfile, check alert log for more info

檢查資料庫日誌,報以下錯誤
Sat Jun 20 18:30:20 2009
ARC0: Becoming the heartbeat ARCH
ARC0: Becoming the heartbeat ARCHARC0: Thread not mounted
Sat Jun 20 18:30:20 2009
ALTER DATABASE   MOUNT
Sat Jun 20 18:30:20 2009
ORA-00202: controlfile: '/ora9idata02/oradata/stjj/control02.ctl'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3

說明/ora9idata02/oradata/stjj/control02.ctl不可訪問
檔案系統/ora9idata02沒有正常mount起來,所屬vg不能正常varyon
透過檢查/ora9idata02vg 有幾個硬碟丟失,導致不能正常訪問
其他邏輯備份和物理備份vg 都有硬碟丟失,導致備份不可用。
幸運的是存放oracle程式碼vg和ora9idata01 vg可以正常mount,
ora9idata01 vg存放的是資料檔案,控制檔案,和redo。
ora9idata02 vg存放的是控制檔案的鏡象和redo組的成員。
問題確定是adtx儲存部分硬碟丟失,導致存放在/ora9idata02檔案系統下的控制檔案丟失,而引起
資料庫例項down掉。


三.問題解決
sqlplus "/as sysdba"
create pfile from spfile;
編輯$ORACLE_HOME/dbs/initstjj.ora檔案,去掉丟失的控制檔案,用好的控制檔案啟動。
儲存如下:
[oracle@p550:/oracle/app/oracle/product/9.2.0/dbs]$more initstjj.ora
*.background_dump_dest='/oracle/app/oracle/admin/stjj/bdump'
*.compatible='9.2.0.0.0'
*.control_files='/ora9idata01/oradata/stjj/control01.ctl'    
*.core_dump_dest='/oracle/app/oracle/admin/stjj/cdump'
*.db_block_size=8192
*.db_cache_size=2621440000
*.db_domain=''
*.db_file_multiblock_read_count=16
*.db_files=800
*.db_name='stjj'
*.fast_start_mttr_target=300
*.hash_join_enabled=TRUE
*.instance_name='stjj'
*.java_pool_size=10485760
*.job_queue_processes=10
*.large_pool_size=20971520
*.log_archive_dest_1='location=/oraarch/arch'
*.log_archive_start=TRUE
*.open_cursors=800
*.pga_aggregate_target=1073741824
*.processes=800
*.query_rewrite_enabled='FALSE'
*.remote_login_passwordfile='EXCLUSIVE'
*.shared_pool_size=268435456
*.sort_area_size=524288
*.star_transformation_enabled='FALSE'
*.timed_statistics=TRUE
*.undo_management='AUTO'
*.undo_retention=10800
*.undo_tablespace='UNDOTBS1'
*.user_dump_dest='/oracle/app/oracle/admin/stjj/udump'

然後用pfile啟動
startup pfile=/oracle/app/oracle/product/9.2.0/dbs/initstjj.ora
ORACLE instance started.
Total System Global Area                       4449853568 bytes
Fixed Size                                          62848 bytes
Variable Size                                    65683456 bytes
Database Buffers                               2621440000 bytes
Redo Buffers                                       172032 bytes
database mounted
database open
資料庫正常啟動
日誌切換也正常

資料庫日誌中還有日誌組成員不能更新的錯誤提示
Errors in file /oracle/app/oracle/admin/stjj/bdump/stjj_arc1_168176.trc:
ORA-00313: open failed for members of log group 4 of thread 1
ARC1: Beginning to archive log 4 thread 1 sequence 1584
Creating archive destination LOG_ARCHIVE_DEST_1: '/oraarch/arch/1_1584.dbf'
ARC1: Completed archiving  log 4 thread 1 sequence 1584
Sat Jun 20 19:16:39 2009
Errors in file /oracle/app/oracle/admin/stjj/bdump/stjj_lgwr_282680.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/ora9idata02/oradata/stjj/redo22.log'
ORA-27037: unable to obtain file status
IBM AIX RISC System/6000 Error: 2: No such file or directory
Additional information: 3
Sat Jun 20 19:16:39 2009
Errors in file /oracle/app/oracle/admin/stjj/bdump/stjj_lgwr_282680.trc:
ORA-00321: log 2 of thread 1, cannot update log file header
ORA-00312: online log 2 thread 1: '/ora9idata02/oradata/stjj/redo22.log'
Sat Jun 20 19:16:39 2009
Errors in file /oracle/app/oracle/admin/stjj/bdump/stjj_lgwr_282680.trc:
ORA-00313: open failed for members of log group 2 of thread 1
Sat Jun 20 19:16:39 2009

採用如下方法,刪除有問題的日誌組成員

alter database drop logfile member '/ora9idata02/oradata/stjj/redo22.log';
alter database drop logfile member '/ora9idata02/oradata/stjj/redo12.log';
alter database drop logfile member '/ora9idata02/oradata/stjj/redo32.log';
alter database drop logfile member '/ora9idata02/oradata/stjj/redo42.log';

就趕緊做了個資料庫邏輯備份匯出,透過ftp傳送到pc機器上,確保一份有效的備份,

就建議使用者找主機工程師去解決硬碟丟失的故障和原因。

總結:這次資料庫比較幸運的能正常開啟,主要是存放資料檔案,控制檔案,日誌檔案的vg能正常起來,
但做好控制檔案和redo的鏡象也是非常有必要的。備份也最好放幾份在不同的儲存上。

 

 

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7199859/viewspace-607020/,如需轉載,請註明出處,否則將追究法律責任。

相關文章