0624使用10035事件跟蹤無法執行的sql語句

lfree發表於2016-06-24

[20160624]使用10035事件跟蹤無法執行的sql語句.txt

--昨天看一份awr報表,連結如下:
http://www.itpub.net/thread-2061952-1-1.html

--摘要如下:

Top 10 Foreground Events by Total Wait Time

Event                              Waits     Total Wait Time (sec)    Wait Avg(ms)  % DB time    Wait Class
library cache lock               840,813                     19.9K             24        68.6    Concurrency
library cache: mutex X         1,098,698                    5199.6              5        17.9    Concurrency
DB CPU                                                      3138.3           10.8    
cursor: pin S wait on X           32,646                     302.1              9         1.0    Concurrency
direct path write temp           127,841                     270.1              2          .9    User I/O
SQL*Net break/reset to client    827,306                     208.6              0          .7    Application
...

Time Model Statistics

Total time in database user-calls (DB Time): 29052.3s
Statistics including the word "background" measure background process time, and so do not contribute to the DB time statistic
Ordered by % or DB time desc, Statistic name

Statistic Name               Time (s)    % of DB Time
parse time elapsed           25,768.03    88.70
failed parse elapsed time    25,750.45    88.63
DB CPU                        3,138.32    10.80
sql execute elapsed time      2,729.15    9.39
hard parse elapsed time         167.67    0.58
PL/SQL execution elapsed time    16.13    0.06

--如果結合兩個看可以發現failed parse elapsed time時間異常高,說明有sql語句無法執行或者執行報錯.而且這個有問題的語句sql語句執行很頻繁,導致還出現了
--library cache lock ,library cache: mutex X.

--如何定位有問題的sql語句是解決這個問題的關鍵,實際上查詢v$active_session_history的event='SQL*Net break/reset to client'能夠知道sql_id,但是查詢
--v$sql以及v$sqlarea是無法找到有問題的sql語句的.我以前提高使用x$kglob的kglobt03 = '&sql_id' 可以定位問題,作者也提出透過10035事件可以定位sql語句.
--自己也做一些測試:

1.環境:
SCOTT@book> @ &r/ver1

PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

2.測試1:
SCOTT@book> select sysdate from xxxxxxxxxxxxxx;
select sysdate from xxxxxxxxxxxxxx
                    *
ERROR at line 1:
ORA-00942: table or view does not exist

--我的系統不存在這個表,sql語句分析沒有錯誤.
--如果呼叫很頻繁,event='SQL*Net break/reset to client'會有記錄.我以前遇到一例就是共享池設太小,在1個小時的週期裡面已經從共享池清除.連結:
http://blog.itpub.net/267265/viewspace-1749265/
http://blog.itpub.net/267265/viewspace-1760052/
http://blog.itpub.net/267265/viewspace-1760158/

--這3個連結記錄3種可能性:1.共享池設定太小 2. lob欄位 3.語句執行有問題.語義正確,但是分析錯誤無法執行.
--也就是知道sql_id是無法找到sql語句,但是查詢基表x$kglob都可以查詢到對應的sql語句.

--獲取sql_id,我這裡執行次數太小,在檢視v$active_session_history的event='SQL*Net break/reset to client'無法找到.
SYS@book> select sys.dbms_sqltune_util0.sqltext_to_sqlid('select sysdate from xxxxxxxxxxxxxx'||chr(0)) c20 from dual;
C20
--------------------
bb98w6dj7wz8p

SYS@book> @ &r/sharepool/shp4 bb98w6dj7wz8p 0
old  17:  WHERE kglobt03 = '&1'  or kglhdpar='&1' or kglhdadr='&1' or KGLNAHSH= &2
new  17:  WHERE kglobt03 = 'bb98w6dj7wz8p'  or kglhdpar='bb98w6dj7wz8p' or kglhdadr='bb98w6dj7wz8p' or KGLNAHSH= 0
TEXT           KGLHDADR         KGLHDPAR         C40                                      KGLOBHD0         KGLOBHD6           KGLOBHS0   KGLOBHS6   KGLOBT16   N0_6_16        N20   KGLNAHSH KGLOBT03        KGLOBT09
-------------- ---------------- ---------------- ---------------------------------------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
子游標控制程式碼地址 000000007CA54F80 000000007C96DE08 select sysdate from xxxxxxxxxxxxxx       00               00                        0          0       3067      3067       3067 1652456725 bb98w6dj7wz8p          0
父遊標控制程式碼地址 000000007C96DE08 000000007C96DE08 select sysdate from xxxxxxxxxxxxxx       000000007B9B1A88 00                     4752          0          0      4752       4752 1652456725 bb98w6dj7wz8p      65535

--作者提到kglnaobj欄位僅僅看到一部分,實際上1000個位元組,實際上查詢KGLFNOBJ 欄位是可以看到全部的.至少我的測試可以.

SYS@book> select kglnaobj c60 from x$kglob where kglobt03='bb98w6dj7wz8p' and rownum=1;
C60
------------------------------------------------------------
select sysdate from xxxxxxxxxxxxxx

3.測試2:使用10035事件:
$ oerr ora 10035
10035, 00000, "Write parse failures to alert log file"
// *Cause:

SCOTT@book> alter session set events '10035 trace name context forever, level 12' ;
Session altered.

SCOTT@book> select sysdate from xxxxxxxxxxxxxx;
select sysdate from xxxxxxxxxxxxxx
                    *
ERROR at line 1:
ORA-00942: table or view does not exist

--檢查alert檔案發現:
Fri Jun 24 10:14:19 2016
PARSE ERROR: ospid=51356, error=942 for statement:
select sysdate from xxxxxxxxxxxxxx^@
Additional information: hd=0x7ca54f80 phd=0x7c96de08 flg=0x20 cisid=83 sid=83 ciuid=83 uid=83

--如果你仔細看裡面的資訊,可以發現hd=0x7ca54f80 對應的就是子游標控制程式碼地址 000000007CA54F80
--而phd=0x7c96de08 對應的就是 父遊標控制程式碼地址 000000007C96DE08.
--注意一個細節: ^@ 表示 chr(0).

--後面的cisid=83 sid=83 ciuid=83 uid=83,對應執行使用者的id.
SYS@book> select user# from sys.user$ where name='SCOTT';
     USER#
----------
        83

4.測試3:使用oradebug看看.
--一般應用的執行是另外的程式.使用oradebug看看.

--session 1:
SCOTT@book> @ &r/spid

       SID    SERIAL# SPID       PID  P_SERIAL# C50
---------- ---------- ------ ------- ---------- --------------------------------------------------
        46       2503 51552       28        118 alter system kill session '46,2503' immediate;

--登入sys使用者執行如下:
--session 2:
SYS@book> oradebug setospid 51552
Oracle pid: 28, Unix process pid: 51552, image: oracle@gxqyydg4 (TNS V1-V3)
SYS@book> oradebug event 10035 trace name context forever,level 12
Statement processed.

----session 1,注意修改分號前的字元y,與前面執行的不同:
SCOTT@book> select sysdate from xxxxxxxxxxxxxy;
select sysdate from xxxxxxxxxxxxxy
                    *
ERROR at line 1:
ORA-00942: table or view does not exist

--//再次檢查alert.log檔案,可以發現:
Fri Jun 24 10:22:00 2016
PARSE ERROR: ospid=51552, error=942 for statement:
select sysdate from xxxxxxxxxxxxxy^@
Additional information: hd=0x7cbefb30 phd=0x7b65aa28 flg=0x20 cisid=83 sid=83 ciuid=83 uid=83

SYS@book> select sys.dbms_sqltune_util0.sqltext_to_sqlid('select sysdate from xxxxxxxxxxxxxy'||chr(0)) c20 from dual;
C20
--------------------
3vqsf0t328a8x

SYS@book> @ &r/sharepool/shp4 3vqsf0t328a8x 0
old  17:  WHERE kglobt03 = '&1'  or kglhdpar='&1' or kglhdadr='&1' or KGLNAHSH= &2
new  17:  WHERE kglobt03 = '3vqsf0t328a8x'  or kglhdpar='3vqsf0t328a8x' or kglhdadr='3vqsf0t328a8x' or KGLNAHSH= 0
TEXT           KGLHDADR         KGLHDPAR         C40                                      KGLOBHD0         KGLOBHD6           KGLOBHS0   KGLOBHS6   KGLOBT16   N0_6_16        N20   KGLNAHSH KGLOBT03        KGLOBT09
-------------- ---------------- ---------------- ---------------------------------------- ---------------- ---------------- ---------- ---------- ---------- --------- ---------- ---------- ------------- ----------
子游標控制程式碼地址 000000007CBEFB30 000000007B65AA28 select sysdate from xxxxxxxxxxxxxy       00               00                        0          0       3067      3067       3067 1176774941 3vqsf0t328a8x          0
父遊標控制程式碼地址 000000007B65AA28 000000007B65AA28 select sysdate from xxxxxxxxxxxxxy       000000007C53F148 00                     4720          0          0      4720       4720 1176774941 3vqsf0t328a8x      65535

--跟蹤獲取的資訊一樣.這樣就能獲得一個定位問題的方法,透過10035事件查詢失敗的sql語句.
--另外如果程式大量使用非繫結變數,這些無法執行sql語句一樣耗費大量的共享池記憶體.

--附上shp4.sql的執行指令碼:

$ cat shp4.sql
column N0_6_16 format 99999999
SELECT DECODE (kglhdadr,
               kglhdpar, '父遊標控制程式碼地址',
               '子游標控制程式碼地址')
          text,
       kglhdadr,
       kglhdpar,
       substr(kglnaobj,1,40) c40,
       kglobhd0,
       kglobhd6,
       kglobhs0,kglobhs6,kglobt16,
       kglobhs0+kglobhs6+kglobt16 N0_6_16,
           kglobhs0+kglobhs1+kglobhs2+kglobhs3+kglobhs4+kglobhs5+kglobhs6+kglobt16 N20,
           kglnahsh,
           kglobt03 ,
           kglobt09
  FROM x$kglob
WHERE kglobt03 = '&1'  or kglhdpar='&1' or kglhdadr='&1' or KGLNAHSH= &2;

5.另外還有一個方法,透過觸發器記錄sql語句的執行錯誤:

--建立一張表SYS.ERROR_LOG.
CREATE TABLE SYS.ERROR_LOG
(
  IP_ADDRESS       VARCHAR2(30 BYTE),
  USERNAME         VARCHAR2(30 BYTE),
  INSTANCE_NUMBER  NUMBER(2),
  THEDATE          DATE,
  ERRORMSG         VARCHAR2(4000 BYTE),
  OTHERMSG         VARCHAR2(4000 BYTE),
  SQLSTMT          VARCHAR2(4000 BYTE)
)
TABLESPACE TOOLS;


CREATE OR REPLACE TRIGGER SYS.log_errors
   AFTER SERVERERROR ON DATABASE
DECLARE
   sqltext       ora_name_list_t;
   errmsg        VARCHAR2 (4000) := NULL;
   errstmt       VARCHAR2 (4000) := NULL;
   errothermsg   VARCHAR2 (4000) := NULL;
BEGIN
   IF ( -- NOT is_servererror (904)
-- AND NOT is_servererror (942)
--  AND NOT is_servererror (937)
--  AND NOT is_servererror (1002)
--  AND NOT is_servererror (1003)
--  AND
       NOT is_servererror (942) AND NOT is_servererror (1407)
--       AND NOT is_servererror (1430)
--       AND NOT is_servererror (54)
--       AND NOT is_servererror (6550)
-- AND NOT is_servererror (936)
--AND NOT (is_servererror (1) AND ora_dict_obj_owner = 'EQUIPMENT')
      )
   THEN
      FOR DEPTH IN 1 .. ora_server_error_depth
      LOOP
         errmsg := errmsg || ora_server_error_msg (DEPTH);
      END LOOP;

      FOR cntr IN 1 .. ora_sql_txt (sqltext)
      LOOP
         errstmt := errstmt || sqltext (cntr);
      END LOOP;

      SELECT 'sid:' || SID || ' SERIAL#: ' || serial# || ' terminal:' || terminal || ' program:' || program || ' module:'
             || module
        INTO errothermsg
        FROM v$session
       WHERE SID = (SELECT SID
                      FROM v$mystat
                     WHERE ROWNUM = 1);

      INSERT INTO error_log
                  (ip_address, username, instance_number, thedate, errormsg, othermsg,
                   sqlstmt
                  )
           VALUES (SYS_CONTEXT ('USERENV', 'IP_ADDRESS'), ora_login_user, ora_instance_num, SYSDATE, errmsg, errothermsg,
                   errstmt
                  );
   END IF;
END;
/
--ALTER TRIGGER SYS.LOG_ERRORS DISABLE;

--注我的指令碼錯誤條件排除了ora-942,ora-1407,你可以根據自己需要修改指令碼.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2120884/,如需轉載,請註明出處,否則將追究法律責任。

相關文章