alert日誌中的兩種ORA錯誤分析

jeanron100發表於2015-08-07
今天在巡檢系統的時候,發現alert日誌中有兩種型別的ora錯誤。
Errors in file /U01/app/oracle/diag/rdbms/XX/XX/trace/xxdb_j002_20401.trc:
ORA-12012: error on auto execute of job "XXDATA"."S_XXXX_HIST_OPS_SERINFO_K"
ORA-12170: TNS:Connect timeout occurred
ORA-06512: at "XXDATA.L_XXXX_HIST_OPS_SERINFO_K", line 6
...
DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.
Errors in file /U01/app/oracle/diag/rdbms/XX/XX/trace/xxdb_j003_21375.trc:
ORA-20011: Approximate NDV failed: ORA-06564: object IMPDP20130506 does not exist
而且時間錯誤的間隔還很短,初步感覺這兩種ORA錯誤似乎是有關聯的,我們一個一個來分解這些ora錯誤。
首先檢視第一種錯誤的trace日誌,根據提示是job執行有問題,甚至指向了對應的程式碼部分,顯示是超時錯誤。而在對應的程式碼裡面可以看到其實是用到了db link,但是連線資訊發生了變化,導致db link對應的資料庫不可訪問,結果就出現了超時的問題,最後在執行的時候拋錯。
*** 2015-08-04 06:01:00.448
*** SESSION ID:(389.3953) 2015-08-04 06:01:00.448
*** CLIENT ID:() 2015-08-04 06:01:00.448
*** SERVICE NAME:(SYS$USERS) 2015-08-04 06:01:00.448
*** MODULE NAME:(DBMS_SCHEDULER) 2015-08-04 06:01:00.448
*** ACTION NAME:(S_XXXX_HIST_OPS_SERINFO_K) 2015-08-04 06:01:00.448
ORA-12012: error on auto execute of job "XXDATA"."S_XXXX_HIST_OPS_SERINFO_K"
ORA-12170: TNS:Connect timeout occurred
ORA-06512: at "XXDATA.L_XXXX_HIST_OPS_SERINFO_K", line 6
明白了問題,解決的思路相對來說就容易了很多,一種是解決db link的連線問題,另外一種是把job給禁用或者刪除,經過確認選擇第二種方法。
使用dba_jobs來檢視對應的job資訊,竟然查不到對應的job,其實需要檢視的是scheduler部分,在10g有了重大的改變。
select job_name ,status,owner from DBA_SCHEDULER_JOB_LOG where owner='xxxxx'
根據條件能夠找到對應的job了,然後在sys下直接呼叫dbms_scheduler來禁用Job.
SQL>   exec dbms_scheduler.DISABLE('S_XXXX_HIST_OPS_SERINFO_INUSE',force=>true);
BEGIN dbms_scheduler.DISABLE('S_XXXX_HIST_OPS_SERINFO_INUSE',force=>true); END;
*
ERROR at line 1:
ORA-27476: "SYS.S_XXXX_HIST_OPS_SERINFO_INUSE" does not exist
ORA-06512: at "SYS.DBMS_ISCHED", line 4407
ORA-06512: at "SYS.DBMS_SCHEDULER", line 2737
ORA-06512: at line 1
報出的錯誤還是有些奇怪,仔細檢視日誌,其實預設是會從當前的schema下查詢對應的job. 指定對應的schema就可以了。
SQL>   exec dbms_scheduler.DISABLE('XXDATA.S_XXXX_HIST_OPS_SERINFO_INUSE',force=>true);
PL/SQL procedure successfully completed.
第一類問題的解決告一段落,我們來看看第二種問題,是不是和第一類相關。
第二類中的trace也比較有限,但是能夠看出來是在做統計資訊收集的時候報出了錯誤。所以從這一點來看應該和第一類問題沒有直接的聯絡,根據錯誤提示是有一個物件找不到,透過字面意思可以看出來似乎和datapump有關。
DBMS_STATS: GATHER_STATS_JOB encountered errors.  Check the trace file.
Errors in file /U01/app/oracle/diag/rdbms/xxxx/xxxx/trace/bidb_j003_21375.trc:
ORA-20011: Approximate NDV failed: ORA-06564: object IMPDP20130506 does not exist
對於這個物件,問題還是能夠簡單復現的。

SQL> select count(*) from "ET$00E73C1D0001";
select count(*) from "ET$00E73C1D0001"
                     *
ERROR at line 1:
ORA-06564: object IMPDP20130506 does not exist
物件既然不存在,那就使用desc來看看,到底可以不,但是desc又可以。
從這一點來說,這個物件還是有點特別。
SQL> desc "ET$00E73C1D0001";
 Name                                                                                                              Null?    Type
 ----------------------------------------------------------------------------------------------------------------- -------- ----------------------------------------------------------------------------
 ID                                                                                                                         NUMBER(15)
 SN                                                                                                                         VARCHAR2(24)
 GROUP_ID                                                                                                                   NUMBER(6)
 SERVER_IP                                                                                                                  VARCHAR2(15)
 SERVER_NAME                                                                                                                VARCHAR2(40)
 WORD                                                                                                                       NUMBER(4)
 SERVER                                                                                                                     NUMBER(4)
 SCENE                                                                                                                      NUMBER(4)
 CN_GUID                                                                                                                    VARCHAR2(30)
 BUY_TIME                                                                                                                   DATE
 JEWEL_TOTAL                                                                                                                NUMBER(7)
 CN                                                                                                                         VARCHAR2(80)
 CHARACTER_PUT                                                                                                              VARCHAR2(50)
 IP                                                                                                                         VARCHAR2(15)
 WEAPONID                                                                                                                   NUMBER(15)
 PUT_DATE                                                                                                                   DATE
 WEAPONID_NEW                                                                                                               NUMBER(15)
 COUNT                                                                                                                      NUMBER
 USER_CLASS                                                                                                                 NUMBER
 CONSUME_WAY                                                                                                                VARCHAR2(40)
透過上面的資訊,可以很容易聯想到應該是datapump中的臨時表之類的,可能在上次datapump做expdp或者Impdp的時候出現了問題,結果這個臨時表保留了下來。在做統計資訊收集的時候就報出了錯誤。
但是上面還僅僅是個猜想,怎麼驗證呢,還是透過一個資料字典表dba_external_tables
select *from dba_external_tables where table_name='ET$00E73C1D0001';

OWNER                          TABLE_NAME                     TYP TYPE_NAME                      DEF DEFAULT_DIRECTORY_NAME         REJECT_LIMIT                      ACCESS_
------------------------------ ------------------------------ --- ------------------------------ --- ------------------------------ ---------------------------------------- -------
ACCESS_PARAMETERS                                                                PROPERTY
-------------------------------------------------------------------------------- ----------
SYS                            ET$00E73C1D0001                SYS ORACLE_DATAPUMP                SYS IMPDP20130506                  UNLIMITED                         CLOB
DEBUG = (0 , 0) DATAPUMP INTERNAL TABLE "XXDATA"."CONSUME_LOG_XXXX_BEFORE201201   ALL
可以清晰的看到是在之前做impdp的時候丟擲了錯誤,這個表是Impdp過程中產生的臨時表。
還有一個思路就是在expdp/impdp等操作時,在資料庫日誌中也會有一定的資訊標識,但是嘗試檢視資料庫日誌,這個問題是好幾年前的了,幾年前的alert日誌已經被清空了,所以也無法求證在當時問題發生的時候到底是什麼樣的一個情況。
解決問題的步驟就很簡單了,需要直接刪除這個外部表即可。

SQL> drop table "ET$00E73C1D0001";
Table dropped.
透過這個案例可以看到,對於這些ORA錯誤還是需要透過日誌來一步一步分析,逐個擊破,可以大膽猜想,但是要小心求證,問題了解清楚了,解決起來都是很容易的。

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/23718752/viewspace-1765141/,如需轉載,請註明出處,否則將追究法律責任。

相關文章