ORACLE_07445[_kill()+48][SIGIOT]錯誤_附巢狀表研究

seagull76發表於2008-01-29
使用者環境: hp-ux B.11.23 + oracle9206 rac
1)錯誤的發現
今天在一個客戶這裡巡檢,發現日誌檔案有以下錯誤:

Thu Jan 24 02:12:36 2008

Errors in file /opt/app/oracle/admin/gzgov/udump/gzgov1_ora_8417.trc:

ORA-07445: exception encountered: core dump [_kill()+48] [SIGIOT] [unknown code] [0x0000020E1] [] []

相應的trc檔案中,有一些資訊:

*** 2008-01-24 02:12:36.394

*** SESSION ID:(20.39311) 2008-01-24 02:12:36.355

Exception signal: 6 (SIGIOT), code: -1 (unknown code), addr: 0x20e1, exception issued by pid: 8417, uid: 200, PC: [0xc00000000031a89

0, _kill()+48]

r1: 9fffffffbf470128 r20: 0 br5: 0

r2: 0 r21: 0 br6: 0

r3: 0 r22: 0 br7: 0

r4: c00000006bad1598 r23: 0 ip: c00000000031a890

r5: 9fffffffffff669c r24: 0 iipa: 0

r6: e r25: 0 cfm: 2

r7: 9fffffffbf166590 r26: 0 um: 1a

r8: 25 r27: 0 rsc: 1f

r9: 2 r28: 0 bsp: 9fffffffbf800d50

r10: 20 r29: 0 bspstore: 9fffffffbf800d50

r11: c000000000000185 r30: 0 rnat: 0

r12: 9fffffffffff21b0 r31: 0 ccv: 0

r13: 9fffffffbf7ed420 NaTs: ffffc00c unat: 0

r14: 0 PRs: 29 fpsr: 9804c8277433f

r15: 0 br0: c00000000023f1d0 pfs: c000000000000002

r16: 0 br1: 4000000001205520 lc: 0

r17: 0 br2: 0 ec: 0

r18: 0 br3: 0 isr: 9fffffffbf800d50

r19: 0 br4: 0 ifa: 0

Reason code: 0000

*** 2008-01-24 02:12:36.421

ksedmp: internal or fatal error

ORA-07445: exception encountered: core dump [_kill()+48] [SIGIOT] [unknown code] [0x0000020E1] [] []

Current SQL statement for this session:

SELECT /*+NESTED_TABLE_GET_REFS+*/ "GZGOVII"."GZGOV_NEWSCLICK".* FROM "GZGOVII"."GZGOV_NEWSCLICK"

………..

LIBRARY OBJECT LOCK: lock=c00000006d17d8a0 handle=c000000073a51cd0 mode=N

2)分析sql
trc檔案中的
SELECT /*+NESTED_TABLE_GET_REFS+*/ "GZGOVII"."GZGOV_NEWSCLICK".* FROM "GZGOVII"."GZGOV_NEWSCLICK"
語句,有NESTED_TABLE_GET_REFS提示,後來發現,這個提示是oracle處理巢狀表時需要的,有時候,在exp,imp.load資料時,oracle會自動加上這個hint.
This hint specifies that nested tables should be retrieved as refs.It is used for exporting, importing and loading. It should never be issued by user queries, and in fact is unlikely to make a difference if tried.

This is also consistent with seeing large amounts of queries with this hint, and that those queries are not reused. If this is a problem for you, try to export at another time of day. If the exports are really long-running or the system load is constant, try to find out what you are using exports for and suggest alternatives (eg. backup and recovery purposes, etc). If this is a DSS system, you probably have a smaller shared pool and use import/export heavily. If so, it might be a good idea to use a bigger shared pool when doing this activity.
3)分析使用者的邏輯備份
經過檢視使用者的邏輯備份日誌,最近的邏輯備份都是失敗的,日誌最後如下:

. . exporting table GZGOV_MESSAGE 11789 rows exported . . exporting table GZGOV_NEWSCLICK EXP-00056: ORACLE error 3113 encountered ORA-03113: end-of-file on communication channel EXP-00056: ORACLE error 24324 encountered ORA-24324: service handle not initialized EXP-00056: ORACLE error 24324 encountered ORA-24324: service handle not initialized EXP-00056: ORACLE error 24324 encountered ORA-24324: service handle not initialized EXP-00056: ORACLE error 24324 encountered ORA-24324: service handle not initialized EXP-00000: Export terminated unsuccessfully ORA-24324: service handle not initialized EXP-00000: Export terminated unsuccessfully $

4)第一步結論

正式由於在執行exp的過程中,當開始匯出GZGOV_NEWSCLICK到一定時間後,資料庫就報了上面的07445的錯誤。

經過手工執行使用者的邏輯備份命令:

exp gzgovii/db110g file=`date +%Y%m%d`.dmp buffer=204800 compress=y grants=y indexes=yes constraints=yes log=exp.log &

確實發現GZGOV_NEWSCLICK到一定時間後,exp就異常退出了,且資料庫日誌裡報了以上的07445的錯誤。

5)巢狀表?

根據官方文件的說明,這個oracle自動加上的hints跟巢狀表有關,於是研究了一下巢狀表,過程如下:

CREATE TABLE dept
(deptno NUMBER(2) PRIMARY KEY,
dname VARCHAR2(14),
loc VARCHAR2(13)
);
CREATE TABLE emp
(empno NUMBER(4) PRIMARY KEY,
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4) REFERENCES emp,
hiredate DATE,
sal NUMBER(7, 2),
comm NUMBER(7, 2),
deptno NUMBER(2) REFERENCES dept
);

INSERT INTO dept SELECT * FROM scott.dept;
INSERT INTO emp SELECT * FROM scott.emp;
commit;
CREATE OR REPLACE TYPE emp_type
AS OBJECT
(empno NUMBER(4),
ename VARCHAR2(10),
job VARCHAR2(9),
mgr NUMBER(4),
hiredate DATE,
sal NUMBER(7, 2),
comm NUMBER(7, 2)
);
/
CREATE OR REPLACE TYPE emp_tab_type
AS TABLE OF emp_type;
使用巢狀表:
CREATE TABLE dept_and_emp
(deptno NUMBER(2) PRIMARY KEY,
dname VARCHAR2(14),
loc VARCHAR2(13),
emps emp_tab_type
)
NESTED TABLE emps STORE AS emps_nt;
可以在巢狀表上增加約束:
ALTER TABLE emps_nt ADD CONSTRAINT emps_empno_unique
UNIQUE(empno) ;
巢狀表不支援參照完整性約束,不能參考任何其他表甚至自己:
ALTER TABLE emps_nt ADD CONSTRAINT mgr_fk
FOREIGN KEY(mgr) REFERENCES emps_nt(empno);
會產生錯誤ORA-30730。
INSERT INTO dept_and_emp
SELECT dept.*,
CAST( MULTISET( SELECT empno, ename, job, mgr, hiredate, sal, comm
FROM emp
WHERE emp.deptno = dept.deptno ) AS emp_tab_type )
FROM dept;
commit;
MULTISET用來告訴Oracle子查詢返回不止一行,CAST用來告訴Oracle將返回設定為一個集合型別。
查詢時,巢狀表中的資料將在同一列中:
SELECT deptno, dname, loc, d.emps AS employees
FROM dept_and_emp d
WHERE deptno = 10;
Oracle同樣提供方法去掉集合的巢狀,像關係型表一樣處理(能夠將EMPS列當作一個表,並自然連線且不需要連線條件):
SELECT d.deptno, d.dname, emp.* FROM dept_and_emp d, TABLE(d.emps) emp;
按照“每行實際是一張表”的思想來更新:
UPDATE
TABLE( SELECT emps
FROM dept_and_emp
WHERE deptno = 10
)
SET comm = 100;
但如果返回SELECT emps FROM dept_and_emp WHERE deptno = 10少於一行,更新將失敗(普通情況下更新0行是許可的),並返回ORA-22908錯誤——如同更新語句沒有寫表名一樣;如果返回多於一行,更新也會失敗,返回ORA-01427錯誤。這說明Oracle在使用了巢狀表後認為每一行指向另一個表,而不是如同關係型模型那樣認為是另一個行集。
插入與刪除的語法:
INSERT INTO TABLE
(SELECT emps FROM dept_and_emps WHERE deptno=10)
VALUES
(1234,'NewEmp','Clerk',7782,SYSDATE,1200,NULL);
DELETE FROM TABLE
(SELECT emps FROM dept_and_emps WHERE deptno=20)
WHERE ename='SCOTT';
一般而言,必須總是連線,而不能單獨查詢巢狀表(如EMPS)中的資料,但是如果確實需要,是可以的。提示NESTED_TABLE_GET_REFS被用於EXP和IMP處理巢狀表。
SELECT /*+NESTED_TABLE_GET_REFS+*/
NESTED_TABLE_ID, SYS_NC_ROWINFO$
FROM "EMPS_NT";
而我們察看EMPS_NT的表結構是看不到NESTED_TABLE_ID,SYS_NC_ROWINFO$兩列的。對父表DEPT_AND_EMP來說NESTED_TABLE_ID是一個外來鍵。
使用這個提示就可以直接操作巢狀表了:
UPDATE /*+NESTED_TABLE_GET_REFS+*/ emps_nt
SET ename=INITCAP(ename);
6)進一步分析

查詢該表的定義:

-- Create table create table GZGOVII.GZGOV_NEWSCLICK ( CONTENT_ID NUMBER, CLICK_DATE DATE, CLICK_COUNT NUMBER ) tablespace FSDATA pctfree 10 pctused 40 initrans 1 maxtrans 255 storage ( initial 200K minextents 1 maxextents unlimited );

這個表根本不是巢狀表,oracle為什麼會加上這個hint?呢,不得而知。

於是開始懷疑exp的引數有問題,因為這個表的記錄非常多,但buffer=204800似乎太小了,

所以修改命令,進行單獨對這個表匯出的測試:

$ exp gzgovii/db110g file=exptest.dmp buffer=4096000 tables=GZGOV_NEWSCLICK log=exptest.log feedback=5000

匯出過程中,監控v$session_longops:

SID TARGET MESSAGE START_TIME LAST_UPDATE_TIME
--- ------------------------ -------------------------------------------------------------------- ----------- ----------------
138 GZGOVII.GZGOV_NEWSCLICK Table Scan: GZGOVII.GZGOV_NEWSCLICK: 76410 out of 76410 Blocks done 2008-1-23 0 2008-1-23 2:26:2
138 GZGOVII.GZGOV_NEWSCLICK Table Scan: GZGOVII.GZGOV_NEWSCLICK: 73843 out of 77529 Blocks done 2008-1-29 1 2008-1-29 15:29:

發現當前正在的匯出的操作記錄還在,還有23號的匯出成功的記錄還在,23號後面的記錄都沒有了,說明都不正常執行完畢就退出了。

繼續監控匯出過程,最後又報錯:

Assertion failed: bd->seqno_skgxprbd == bid->seqno_skgxpbid, file skgxpu.c, line 2859

EXP-00056: ORACLE error 3113 encountered
ORA-03113: end-of-file on communication channel
EXP-00056: ORACLE error 24324 encountered
ORA-24324: service handle not initialized
EXP-00056: ORACLE error 24324 encountered
ORA-24324: service handle not initialized
EXP-00056: ORACLE error 24324 encountered
ORA-24324: service handle not initialized
EXP-00056: ORACLE error 24324 encountered
ORA-24324: service handle not initialized
EXP-00000: Export terminated unsuccessfully
$

此時v$session_ops顯示,其實已經快匯出完了:75943 out of 77529 Blocks done

同事分析,該表可能有不合法記錄,或者資料檔案有壞塊,於是檢查資料庫檔案:

$ dbv file=/dev/vgfsdata/rfsdata1 blocksize=8192

DBVERIFY: Release 9.2.0.6.0 - Production on Tue Jan 29 15:56:32 2008

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/vgfsdata/rfsdata1


DBVERIFY - Verification complete

Total Pages Examined : 254720
Total Pages Processed (Data) : 76530
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 26362
Total Pages Failing (Index): 0
Total Pages Processed (Other): 177
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 151651
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 109856332 (0.109856332)

$ dbv file=/dev/vgfsdata/rfsdata2 blocksize=8192

DBVERIFY: Release 9.2.0.6.0 - Production on Tue Jan 29 16:00:43 2008

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/vgfsdata/rfsdata2


DBVERIFY - Verification complete

Total Pages Examined : 254720
Total Pages Processed (Data) : 78363
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 28169
Total Pages Failing (Index): 0
Total Pages Processed (Other): 414
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 147774
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 109857157 (0.109857157)
$

可能是有不合法記錄,只能改天對錶做遊標一步一步的挪移了????

7)第二天,繼續檢查

發現該天的備份居然又成功了。

用bdf命令,發現exp的dump檔案所存目錄為nfs:

172.16.1.165:/data/gzgov_db_bak
85728728 66324200 15049712 82% /root/db_exp

登陸到172.16.1.165機器,發現該機器是linux,且用df -m命令程式會掛掉,並且檢視

chkconfig --list nfs,發現nfs服務顯示是關閉的(其實顯示不正常的)。

重啟了一下nfs服務

service nfs stop

service nfs start

最後建議使用者重啟該機器,使用者決定暫時不重啟,等適當時候重啟該nfs伺服器。

[@more@]

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/8861952/viewspace-1034694/,如需轉載,請註明出處,否則將追究法律責任。

相關文章