10203 Rman執行delete obsolete報錯Segmentation fault

yangtingkun發表於2010-12-09

Oracle出現各種錯誤很常見,但是直接Segmentation fault,還真是不常見。

 

 

資料庫版本10203 for Linux x86-64

公司的其他部分一個資料庫沒有響應,讓我幫忙檢查一下。登陸資料庫後簡單檢查後,發現歸檔目錄滿了,導致所有寫操作都必須等待歸檔的完成。

檢查發現整個$ORACLE_BASE目錄已經沒有空間了。

[oracle@sqdata backupset]$ env|grep ORACLE
ORACLE_SID=bjsqdb
ORACLE_BASE=/data/oracle
ORACLE_HOME=/data/oracle/product/10.2.0/db_1
[oracle@sqdata ~]$ df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2             24797412    560700  22956736   3% /
/dev/sda1               194442     15937    168466   9% /boot
/dev/sda9            313414864 297188540     48872 100% /data
tmpfs                  4064128         0   4064128   0% /dev/shm
/dev/sda5             17856888    176896  16758268   2% /opt
/dev/sda8              9920592    153884   9254640   2% /tmp
/dev/sda7             11904588   4089392   7200712  37% /usr
/dev/sda3             19840924    334428  18482356   2% /var

打算利用RMAN清除一些備份,沒想到碰到了錯誤:

[oracle@sqdata backupset]$ rman target /

Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 09:25:50 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: BJSQDB (DBID=657759334)

RMAN> delete obsolete;

using target database control file instead of recovery catalog
Segmentation fault

這個問題很難確定,首先因為空間被佔滿,因此沒有任何的core檔案產生。而且這個錯誤不是每次都能重現:

[oracle@sqdata backupset]$ df -k
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda2             24797412    560700  22956736   3% /
/dev/sda1               194442     15937    168466   9% /boot
/dev/sda9            313414864 297238912         0 100% /data
tmpfs                  4064128         0   4064128   0% /dev/shm
/dev/sda5             17856888    176896  16758268   2% /opt
/dev/sda8              9920592    153884   9254640   2% /tmp
/dev/sda7             11904588   4089392   7200712  37% /usr
/dev/sda3             19840924    334468  18482316   2% /var
[oracle@sqdata backupset]$ rman target /

Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 10:00:43 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: BJSQDB (DBID=657759334)

RMAN> delete obsolete;

using target database control file instead of recovery catalog
RMAN retention policy will be applied to the command
RMAN retention policy is set to recovery window of 21 days
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=424 devtype=DISK
no obsolete backups found

而且問題不是空間滿了造成的,剛才報錯的時候,/data目錄還有剩餘空間,而現在已經沒有空間,可是rman卻執行成功。

檢查了metalink,發現9.2上有一些Segmentation faultbug,不過在10.1中都被fixed了,沒有看到10.2上有類似的情況產生。

好在這個問題只是偶爾出現,對系統使用沒有太大的影響。

檢查了上次自動備份的log檔案:

Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 00:30:03 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

RMAN-00571: ================================================connected to target database: BJSQDB (DBID=657759334)

RMAN> 2> 3> 4> 5> 6> 7>
using target database control file instead of recovery catalog
allocated channel: d1
channel d1: sid=481 devtype=DISK

sql statement: alter system archive log current
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of sql command on default channel at 12/09/2010 00:30:08
RMAN-11003: failure during parse/execution of SQL statement: alter system archive log current
ORA-16014: log 3 sequence# 1674 not archived, no available destinations
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/bjsqdb/REDOC01.LOG'
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/bjsqdb/REDOC02.LOG'
ORA-00312: online log 3 thread 1: '/data/oracle/oradata/bjsqdb/REDOC03.LOG'

RMAN>

Recovery Manager complete.
---rman_archivelog and controlfile end---
---rman delete obsolete backupset---

Recovery Manager: Release 10.2.0.3.0 - Production on Thu Dec 9 00:30:08 2010

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-00554: initialization of internal recovery manager package failed
RMAN-04005: error from target database:
ORA-09945: Unable to initialize the audit trail file
Linux-x86_64 Error: 28: No space left on device
---rman delete obsolete backupset end---
---ftp file to 172.0.2.85---
---ftp end---

由於空間不足,導致日誌切換失敗,而隨後的操作在連線資料庫的時候就因為沒有空間而導致audit trail檔案寫失敗,從而連線rman報錯。

現在只能懷疑是最後一次連線Rman是狀態不正常,導致這次Segmentation fault

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/4227/viewspace-681468/,如需轉載,請註明出處,否則將追究法律責任。

相關文章