【故障診斷】cr塊slot notfound解決過程

yellowlee發表於2014-11-12
在一次嚴重故障中控制檔案重建,強行拉起庫,前面的問題解決了(HA引發的血案,涉及他人誤操作,具體略)
資料庫和應用起來後卻
遇到大量cr塊slot notfound的問題,時間正是業務高峰的白天,一陣頭皮發麻,具體報錯內容:
alert:
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
相關trace
tpioadb1$ls -trl *.trc
-rw-r-----    1 oracle   oinstall  181401269 Oct 14 12:27 oa_ora_17825870.trc
-rw-r-----    1 oracle   oinstall  181462560 Oct 14 12:28 oa_ora_49872920.trc
-rw-r-----    1 oracle   oinstall  181776046 Oct 14 12:29 oa_ora_52166770.trc
-rw-r-----    1 oracle   oinstall  182010875 Oct 14 12:29 oa_ora_47317200.trc
-rw-r-----    1 oracle   oinstall  182080391 Oct 14 12:30 oa_ora_198714.trc
-rw-r-----    1 oracle   oinstall  182205907 Oct 14 12:30 oa_ora_47448310.trc
-rw-r-----    1 oracle   oinstall  182159439 Oct 14 12:31 oa_ora_48234660.trc
-rw-r-----    1 oracle   oinstall  182186772 Oct 14 12:31 oa_ora_41617502.trc
-rw-r-----    1 oracle   oinstall  182186742 Oct 14 12:31 oa_ora_48824328.trc
-rw-r-----    1 oracle   oinstall  182186771 Oct 14 12:31 oa_ora_52166772.trc
-rw-r-----    1 oracle   oinstall  182258339 Oct 14 12:32 oa_ora_39913614.trc
-rw-r-----    1 oracle   oinstall  182258369 Oct 14 12:32 oa_ora_30148840.trc
-rw-r-----    1 oracle   oinstall  182375997 Oct 14 12:32 oa_ora_17825876.trc
-rw-r-----    1 oracle   oinstall  182408523 Oct 14 12:32 oa_ora_66256924.trc
-rw-r-----    1 oracle   oinstall  182408580 Oct 14 12:32 oa_ora_20121758.trc
-rw-r-----    1 oracle   oinstall  182490385 Oct 14 12:32 oa_ora_6424806.trc
-rw-r-----    1 oracle   oinstall  182490387 Oct 14 12:33 oa_ora_15272010.trc
-rw-r-----    1 oracle   oinstall  182490356 Oct 14 12:33 oa_ora_4851758.trc
-rw-r-----    1 oracle   oinstall  182516313 Oct 14 12:33 oa_ora_16713804.trc
-rw-r-----    1 oracle   oinstall  182594432 Oct 14 12:33 oa_ora_28510210.trc
-rw-r-----    1 oracle   oinstall  182616536 Oct 14 12:33 oa_ora_30015720.trc
-rw-r-----    1 oracle   oinstall  182700158 Oct 14 12:33 oa_ora_25102374.trc
-rw-r-----    1 oracle   oinstall  182734746 Oct 14 12:34 oa_ora_28641506.trc
-rw-r-----    1 oracle   oinstall  182734715 Oct 14 12:34 oa_ora_3016860.trc
-rw-r-----    1 oracle   oinstall  182841440 Oct 14 12:34 oa_ora_52429006.trc
-rw-r-----    1 oracle   oinstall  182865907 Oct 14 12:34 oa_ora_47185978.trc
-rw-r-----    1 oracle   oinstall  182865909 Oct 14 12:34 oa_ora_57212934.trc
-rw-r-----    1 oracle   oinstall  182865907 Oct 14 12:34 oa_ora_23267338.trc
-rw-r-----    1 oracle   oinstall  182865938 Oct 14 12:34 oa_ora_14354628.trc
-rw-r-----    1 oracle   oinstall   70326197 Oct 14 12:34 oa_ora_7276572.trc
-rw-r-----    1 oracle   oinstall  152186229 Oct 14 12:34 oa_ora_52953126.trc
-rw-r-----    1 oracle   oinstall   96766948 Oct 14 12:34 oa_ora_47448312.trc
-rw-r-----    1 oracle   oinstall   25950203 Oct 14 12:34 oa_ora_44040304.trc
-rw-r-----    1 oracle   oinstall     124525 Oct 14 12:34 oa_ora_30015724.trc
-rw-r-----    1 oracle   oinstall   11891981 Oct 14 12:34 oa_ora_23332896.trc
-rw-r-----    1 oracle   oinstall   67996842 Oct 14 12:34 oa_ora_20580500.trc
-rw-r-----    1 oracle   oinstall   50813235 Oct 14 12:34 oa_ora_17301742.trc
-rw-r-----    1 oracle   oinstall  171017368 Oct 14 12:34 oa_ora_14811386.trc


tpioadb1$more oa_ora_52429006.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_52429006.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name:    AIX
Node name:      tpioadb1
Release:        1
Version:        6
Machine:        00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 169
Unix process pid: 52429006, image: oracle@tpioadb1




*** 2014-10-14 12:33:21.941
*** SESSION ID:(1289.36435) 2014-10-14 12:33:21.941
*** CLIENT ID:() 2014-10-14 12:33:21.941
*** SERVICE NAME:(SYS$USERS) 2014-10-14 12:33:21.941
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 12:33:21.941
*** ACTION NAME:() 2014-10-14 12:33:21.941
 
* kdsgrp1-1: *************************************************
            row 0x024e73b1.97 continuation at
            0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump:  0x024e73b1
 Object id on Block? Y
 seg/obj: 0x1ce1e  csc: 0x638.28d23ba0  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x24e7281 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x003b.012.0000246a  0x00c048a5.011d.32  --U-    1  fsc 0x0016.28d23ba1
0x02   0x0002.006.00021b71  0x00c03c7f.1478.18  C---    0  scn 0x0638.28d21c9c
bdba: 0x024e73b1
data_block_dump,data header at 0x70001028890e064
===============

通過語句找到該物件和塊:
Select owner, segment_name, segment_type, partition_name,tablespace_name 
From dba_extents 
Where relative_fno = 9
And  947121 between block_id and (block_id+blocks-1);

dump該塊,發現該塊裡面只有1個table的216行,由於系統已經帶病執行,且無可用備份,無停機時間,意味著最多可能丟失這個表的216行資料。
dump內容如下
tpioadb1$more oa_ora_38142130.trc
Trace file /home/oracle/app/diag/rdbms/oa/oa/trace/oa_ora_38142130.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /home/oracle/app/product/11.2.0/db_1
System name:    AIX
Node name:      tpioadb1
Release:        1
Version:        6
Machine:        00CB77B54C00
Instance name: oa
Redo thread mounted by this instance: 1
Oracle process number: 177
Unix process pid: 38142130, image: oracle@tpioadb1




*** 2014-10-14 14:39:49.105
*** SESSION ID:(167.64629) 2014-10-14 14:39:49.105
*** CLIENT ID:() 2014-10-14 14:39:49.105
*** SERVICE NAME:(SYS$USERS) 2014-10-14 14:39:49.105
*** MODULE NAME:(JDBC Thin Client) 2014-10-14 14:39:49.105
*** ACTION NAME:() 2014-10-14 14:39:49.105
 
* kdsgrp1-1: *************************************************
            row 0x024e73b1.97 continuation at
            0x024e73b1.97 file# 9 block# 947121 slot 151 not found
KDSTABN_GET: 0 ..... ntab: 1
curSlot: 151 ..... nrows: 216
kdsgrp - dump CR block dba=0x024e73b1
Block header dump:  0x024e73b1
 Object id on Block? Y
 seg/obj: 0x1ce1e  csc: 0x638.29bb4112  itc: 2  flg: E  typ: 1 - DATA
     brn: 0  bdba: 0x24e7281 ver: 0x01 opc: 0
     inc: 0  exflg: 0
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x002d.01a.0000a6d7  0x04c08774.04a9.29  --U-    1  fsc 0x0017.29bb4115
0x02   0x002f.007.0000945e  0x00c02f56.0412.0d  C---    0  scn 0x0638.29bb08d6
bdba: 0x024e73b1
data_block_dump,data header at 0x7000105876a6064
===============
tsiz: 0x1f98
hsiz: 0x1c2
pbl: 0x7000105876a6064
     76543210
flag=--------
ntab=1
nrow=216
frre=29
fsbo=0x1c2
fseo=0x7d4
avsp=0xbe9
tosp=0xc02
0xe:pti[0]      nrow=216        offs=0



後通過業務分析,找到白天有dml的行,通過create as select查詢者部分行,發現會話hang,報錯依舊,採用逐行insert的方式,最終定位出有問題的只有一行,通過dump的行資料,將關鍵欄位轉化出來給業務確認,確認為該行為無效行,剔除該行後,安排在下班後維護視窗內重建該表,問題解決,業務未受到影響。


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/16179598/viewspace-1329351/,如需轉載,請註明出處,否則將追究法律責任。

相關文章