[20221128]再談防水牆(檢視訪問效能問題).txt

lfree發表於2022-12-02

[20221128]再談防水牆(檢視訪問效能問題).txt

1.環境:
SYS> @ ver1
PORT_STRING                    VERSION        BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

2.問題分析:
--//上個星期看了生產系統awr報表,發現一些問題.原來防水牆不斷呼叫執行如下:
begin :con := "TASSETACL"."QUERYACL"(:sn, :on); end;
--//該語句已經消失,我問了同事,據說升級過該產品,當時awr的記錄如下:
SQL ordered by Executions

    %CPU - CPU Time as a percentage of Elapsed Time
    %IO - User I/O Time as a percentage of Elapsed Time
    Total Executions: 23,199,582
    Captured SQL account for 34.2% of Total

Executions      Rows Processed  Rows per Exec   Elapsed Time (s)%CPU    %IO   SQL Id          SQL Module      SQL Text
6,522,522       6,521,177       1.00            1,660.15        99.4    0     190q1sz3ywrd7   xxxxxx.eee      begin :con := "TASSETACL".QUE...
2,228,287       2,228,115       1.00            107.98          99.5    0     g7ytdh9mxt1s0   xxxxxx.eee      select count ( :"SYS_B_0" ) fr...

--//而現在變成了如下:
SQL ordered by Elapsed Time

    Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
    為PL/SQL程式碼報告的資源包括該程式碼呼叫的所有SQL語句所使用的資源。
    % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
    %Total - Elapsed Time as a percentage of Total DB time
    %CPU - CPU Time as a percentage of Elapsed Time
    %IO - User I/O Time as a percentage of Elapsed Time
    Captured SQL account for 65.1% of Total DB Time (s): 33,342
    Captured PL/SQL account for 41.7% of Total DB Time (s): 33,342

Elapsed Time (s)  Executions   Elapsed Time per Exec (s)  %Total  %CPU   %IO  SQL Id         SQL Module                                 SQL Text
5,156.39          33,682       0.15                       15.47   98.35  0.01 3pw59r6rkw9gb  oraagent.bin@dm01dbadm01.XXX.com (TNS V1-  declare PRIVS_ERROR exception;...
5,153.22          33,683       0.15                       15.46   98.34  0.01 0xwxau19hznj8  oraagent.bin@dm01dbadm01.XXX.com (TNS V1-  begin dbagent.try_login(); end...
5,089.67          134,713      0.04                       15.26   98.34  0.00 8vmu6k690g87k  oraagent.bin@dm01dbadm01.XXX.com (TNS V1-  SELECT UPPER(NVL(PROGRAM, 'nul...

--//原來的問題消失了,但是消耗的時間卻增加了,很明顯對方的產品沒有經過嚴格的測試,就給使用者使用了.
--//取出sql語句,主要便於檢視並且格式化如下:
SYS> @ sql_id 3pw59r6rkw9gb
--SQL_ID = 3pw59r6rkw9gb
declare
  PRIVS_ERROR exception; --raise error,if rule exception,will trigger privs_error
  pragma exception_init(PRIVS_ERROR, -1031);
begin
  execute immediate 'begin dbagent.try_login(); end;';
exception
  when PRIVS_ERROR then
    raise;
  when others then
    rollback;
end dbagent_logon;;
--//注意後面多了1個分號,是我寫的指令碼無法區分PL/SQL語句與sql語句造成的問題.

SYS> @sql_id 0xwxau19hznj8
--SQL_ID = 0xwxau19hznj8
begin dbagent.try_login(); end;;

SYS> @ sql_id 8vmu6k690g87k
--SQL_ID = 8vmu6k690g87k
SELECT UPPER (NVL (PROGRAM, 'null'))
      ,UPPER (MODULE)
      ,TYPE
      ,DECODE
       (
          NVL (INSTR (PROCESS, ':'), 0)
         ,0, NVL (PROCESS, 1234)
         ,SUBSTR (PROCESS, 1, INSTR (PROCESS, ':') - 1)
       )
      ,OSUSER
      ,MACHINE
      ,SCHEMANAME
      ,USERNAME
      ,SERVICE_NAME
      ,SID
      ,SERIAL#
  FROM SYS.V_$SESSION
 WHERE SID = SYS_CONTEXT ('userenv', 'sid');

--//134713/33682 = 3.99955.平均執行3pw59r6rkw9gb1次,執行8vmu6k690g87k 4次.
--//可以看出呼叫順序3pw59r6rkw9gb,0xwxau19hznj8,8vmu6k690g87k 分別執行次數的比例是1:1:4.
--//很明顯開發沒有寫好程式程式碼.僅僅需要呼叫1次8vmu6k690g87k就ok了.
--//如果修改程式碼可以提高效率4倍. 僅僅需要5156/4 = 1289秒.但是我的問題並不再這裡,細節很重要,為什麼8vmu6k690g87k的執行有點
--//慢.再仔細看sql_id='8vmu6k690g87k'每次執行需要0.04秒=40ms(估計存在四捨五入問題,因為0.04*4=0.16),有點慢,因為where條件
--//查詢使用sid,不應該存在這麼"慢"的情況.

--//開始以為都是Module=oraagent.bin執行的,後面我發現實際上幾乎全部模組都存在相似呼叫,看程式碼可以猜出是登陸資料庫時觸發器
--//呼叫執行的程式碼.這時我突然想起我自己寫的登陸觸發器用於設定cursor_sharing=force也存在類似的呼叫v$session檢視的情況,很
--//明顯我寫的程式碼呼叫次數應該也有33XXX次數.Elapsed Time至少應該 5089/4 = 1272秒,但是並沒有出現在awr報表中.另外我想到是
--//否因為select部分消耗cpu資源太多,我馬上否定我的猜測,因為僅僅返回1行,無論如何不應該這麼"慢".

--//當我看我寫的觸發器程式碼馬上明白問題在那裡.我是分開寫的,可以參考連結
--// http://blog.itpub.net/267265/viewspace-2768591/ => [20210418]查詢v$檢視問題.txt
--// connor-mcdonald.com/2021/04/12/better-performance-when-querying-the-v-views

3.繼續:
$ cat a.txt
SELECT UPPER (NVL (PROGRAM, 'null'))
      ,UPPER (MODULE)
      ,TYPE
      ,DECODE
       (
          NVL (INSTR (PROCESS, ':'), 0)
         ,0, NVL (PROCESS, 1234)
         ,SUBSTR (PROCESS, 1, INSTR (PROCESS, ':') - 1)
       )
      ,OSUSER
      ,MACHINE
      ,SCHEMANAME
      ,USERNAME
      ,SERVICE_NAME
      ,SID
      ,SERIAL#
  FROM SYS.V_$SESSION
 WHERE SID = SYS_CONTEXT ('userenv', 'sid');

--//檢視起對應的執行計劃:
SYS> @ sl all
alter session set statistics_level = all;
Session altered.

SYS> @ a.txt
SYS> @ dpc '' '' ''
PLAN_TABLE_OUTPUT
---------------------------
Plan hash value: 2422122865
---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |      1 |        |       |     1 (100)|      1 |00:00:00.02 |       |       |          |
|   1 |  MERGE JOIN CARTESIAN     |                 |      1 |      1 |    62 |     0   (0)|      1 |00:00:00.02 |       |       |          |
|   2 |   NESTED LOOPS            |                 |      1 |      1 |    12 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  3 |    FIXED TABLE FULL       | X$KSLWT         |      1 |      1 |     8 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  4 |    FIXED TABLE FIXED INDEX| X$KSLED (ind:2) |      1 |      1 |     4 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|   5 |   BUFFER SORT             |                 |      1 |      1 |    50 |     0   (0)|      1 |00:00:00.02 |  2048 |  2048 | 2048  (0)|
|*  6 |    FIXED TABLE FULL       | X$KSUSE         |      1 |      1 |    50 |     0   (0)|      1 |00:00:00.02 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$88122447
   3 - SEL$88122447 / W@SEL$4
   4 - SEL$88122447 / E@SEL$4
   6 - SEL$88122447 / S@SEL$4
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("W"."KSLWTSID"=TO_NUMBER(SYS_CONTEXT('userenv','sid')))
   4 - filter("W"."KSLWTEVT"="E"."INDX")
   6 - filter((BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."INDX"=TO_NUMBER(SYS_CONTEXT('userenv','sid'))
              AND "S"."INST_ID"=USERENV('INSTANCE')))
--//你可以發現ID=3,全表掃描X$KSLWT,而過濾條件是 filter("W"."KSLWTSID"=TO_NUMBER(SYS_CONTEXT('userenv','sid'))),也就是無法把
--//SYS_CONTEXT ('userenv', 'sid')當作常量處理.
--//而生產庫X$KSLWT 記錄的數量不是一般測試環境的數量.我的測試環境僅僅返回24條.而生產系統返回很多.
SYS> select count(*) from X$KSLWT;
  COUNT(*)
----------
      8793
--//訪問X$KSUSE也是出現類似的情況.

--//解決很簡單,分開寫:
--// Get user SID information
SELECT SYS_CONTEXT('userenv', 'sid')   INTO v_sid  FROM dual;

--// Get Program executable,OSUSER Details,Machine Details for this session
SELECT UPPER(NVL(PROGRAM, 'NULL')),
       UPPER(MODULE),
       DECODE(NVL(INSTR(PROCESS, ':'), 0), 0, NVL(PROCESS, 1234), SUBSTR(PROCESS, 1, INSTR(PROCESS, ':') - 1))
       TYPE, OSUSER, MACHINE, SCHEMANAME, USERNAME, SERVICE_NAME, SID, SERIAL#
  FROM SYS.V_$SESSION
  where sid = v_sid;

SYS> SELECT SYS_CONTEXT('userenv', 'sid')    FROM dual;
SYS_CONTEXT('USERENV','SID')
----------------------------
17017

$ cat a.txt
SELECT UPPER (NVL (PROGRAM, 'null'))
      ,UPPER (MODULE)
      ,TYPE
      ,DECODE
       (
          NVL (INSTR (PROCESS, ':'), 0)
         ,0, NVL (PROCESS, 1234)
         ,SUBSTR (PROCESS, 1, INSTR (PROCESS, ':') - 1)
       )
      ,OSUSER
      ,MACHINE
      ,SCHEMANAME
      ,USERNAME
      ,SERVICE_NAME
      ,SID
      ,SERIAL#
  FROM SYS.V_$SESSION
 WHERE
SID= 17017
--SID = SYS_CONTEXT ('userenv', 'sid')
;

SYS> @a.txt
..

SYS> @ dpc '' '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  gxdr0gq6nyums, child number 0
-------------------------------------
Plan hash value: 589956979
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name            | Starts | E-Rows |E-Bytes| Cost (%CPU)| A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                 |      1 |        |       |     1 (100)|      1 |00:00:00.01 |       |       |          |
|   1 |  NESTED LOOPS              |                 |      1 |      1 |    62 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|   2 |   MERGE JOIN CARTESIAN     |                 |      1 |      1 |    58 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  3 |    FIXED TABLE FIXED INDEX | X$KSUSE (ind:1) |      1 |      1 |    50 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|   4 |    BUFFER SORT             |                 |      1 |      1 |     8 |     0   (0)|      1 |00:00:00.01 |  2048 |  2048 | 2048  (0)|
|*  5 |     FIXED TABLE FIXED INDEX| X$KSLWT (ind:1) |      1 |      1 |     8 |     0   (0)|      1 |00:00:00.01 |       |       |          |
|*  6 |   FIXED TABLE FIXED INDEX  | X$KSLED (ind:2) |      1 |      1 |     4 |     0   (0)|      1 |00:00:00.01 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$88122447
   3 - SEL$88122447 / S@SEL$4
   5 - SEL$88122447 / W@SEL$4
   6 - SEL$88122447 / E@SEL$4
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("S"."INDX"=17017 AND BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0 AND
              "S"."INST_ID"=USERENV('INSTANCE')))
   5 - filter("W"."KSLWTSID"=17017)
   6 - filter("W"."KSLWTEVT"="E"."INDX")

--//connor-mcdonald.com 連結給出了另外的寫法,使用with+materialize提示:
$ cat b.txt
with mysid as ( select /*+ materialize */ userenv('SID') n from dual )
SELECT UPPER(NVL(PROGRAM, 'NULL')),
       UPPER(MODULE),
       DECODE(NVL(INSTR(PROCESS, ':'), 0), 0, NVL(PROCESS, 1234), SUBSTR(PROCESS, 1, INSTR(PROCESS, ':') - 1))
       TYPE, OSUSER, MACHINE, SCHEMANAME, USERNAME, SERVICE_NAME, SID, SERIAL#
  FROM SYS.V_$SESSION,mysid
 WHERE SID = mysid.n;

SYS> @ b.txt
..

Plan hash value: 2230424401
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                        | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                             |      1 |        |       |     4 (100)|          |      1 |00:00:00.03 |      15 |      1 |      1 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION     |                             |      1 |        |       |            |          |      1 |00:00:00.03 |      15 |      1 |      1 |       |       |          |
|   2 |   LOAD AS SELECT               |                             |      1 |        |       |            |          |      0 |00:00:00.02 |       4 |      0 |      1 |   270K|   270K|  270K (0)|
|   3 |    FAST DUAL                   |                             |      1 |      1 |       |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|   4 |   NESTED LOOPS                 |                             |      1 |      1 |    75 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   5 |    NESTED LOOPS                |                             |      1 |      1 |    71 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   6 |     NESTED LOOPS               |                             |      1 |      1 |    63 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   7 |      VIEW                      |                             |      1 |      1 |    13 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |       |       |          |
|   8 |       TABLE ACCESS STORAGE FULL| SYS_TEMP_0FD9D6D71_CF54F429 |      1 |      1 |    13 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       5 |      1 |      0 |  1025K|  1025K|          |
|*  9 |      FIXED TABLE FIXED INDEX   | X$KSUSE (ind:1)             |      1 |      1 |    50 |     0   (0)|          |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 10 |     FIXED TABLE FIXED INDEX    | X$KSLWT (ind:1)             |      1 |      1 |     8 |     0   (0)|          |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 11 |    FIXED TABLE FIXED INDEX     | X$KSLED (ind:2)             |      1 |      1 |     4 |     0   (0)|          |      1 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$71D7A081
   2 - SEL$1
   3 - SEL$1        / DUAL@SEL$1
   7 - SEL$D67CB2D2 / MYSID@SEL$2
   8 - SEL$D67CB2D2 / T1@SEL$D67CB2D2
   9 - SEL$71D7A081 / S@SEL$5
  10 - SEL$71D7A081 / W@SEL$5
  11 - SEL$71D7A081 / E@SEL$5
Predicate Information (identified by operation id):
---------------------------------------------------
   9 - filter((BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0 AND "S"."INST_ID"=USERENV('INSTANCE') AND "S"."INDX"="MYSID"."N"))
  10 - filter("S"."INDX"="W"."KSLWTSID")
  11 - filter("W"."KSLWTEVT"="E"."INDX")

4.效能對比:
--//我在toad下測試,多次執行觀察
--//呼叫b.txt 大約需要2Xms.
--//呼叫a.txt 大約需要5Xms.
--//有時候兩者差異不大,感覺這樣比較不行,沒有考慮網路的往返.另外寫一篇測試該語句效能的blog.
--//注:我後面測試with+materialize要產生redo日誌,不建議在生產系統應用這樣的寫法,主要原因是執行頻率太高了,累積起來redo很大!!

5.為什麼呼叫1次0xwxau19hznj8,執行4次8vmu6k690g87k.
--//我看了一下原始碼,原始碼這次沒有加密.
--//透過過程collect_app_info呼叫sql_id=8vmu6k690g87k
--//而try_login程式碼透過4個函式分別取得對應值

--get login app name
function get_app_name return varchar2 is
begin
  collect_app_info;
  return l_appname;
end;
--get login app module name
function get_app_module_name return varchar2 is
begin
  collect_app_info;
  return l_module;
end;

--get login app type
function get_app_type return varchar2 is
begin
  collect_app_info;
  return l_type;
end;

--get app process number
function get_app_process return number is
begin
  collect_app_info;
  return l_process;
end;
--//這樣就很好解析為什麼呼叫1次執行4次sql_id=8vmu6k690g87k.而實際上
  procedure collect_app_info is
  begin
    if not app_info_collect_status then
      select upper(nvl(program, 'null')),
             upper(module),
             type,
             decode(nvl(instr(process, ':'), 0),
                    0,
                    nvl(process, 1234),
                    substr(process, 1, instr(process, ':') - 1)),
             osuser,
             machine,
             SCHEMANAME,
             USERNAME,
             SERVICE_NAME,
             sid,
             serial#
        into l_appname,
             l_module,
             l_type,
             l_process,
             l_osuser,
             l_machine,
             l_SCHEMANAME,
             l_username,
             l_service_name,
             l_sid,
             l_serial#
        from sys.v_$session
       where sid = sys_context('userenv', 'sid');
    end if;

  end;

--//已經賦值給對應變數,根本不需要再透過什麼函式返回對應值.直接使用對應變數應該可以,當然我沒有測試.
--//最簡單修改:
  procedure collect_app_info is
  begin
    if not app_info_collect_status then
      select userenv('SID') into l_sid from dual;
      select upper(nvl(program, 'null')),
             upper(module),
             type,
             decode(nvl(instr(process, ':'), 0),
                    0,
                    nvl(process, 1234),
                    substr(process, 1, instr(process, ':') - 1)),
             osuser,
             machine,
             SCHEMANAME,
             USERNAME,
             SERVICE_NAME,
             sid,
             serial#
        into l_appname,
             l_module,
             l_type,
             l_process,
             l_osuser,
             l_machine,
             l_SCHEMANAME,
             l_username,
             l_service_name,
             l_sid,
             l_serial#
        from sys.v_$session
       where sid = l_sid;
    end if;

  end;
--//我的感覺可能分開寫效率更高!!因為使用with+materialize會生成一個臨時表,會產生日誌,我還給測試看看.

6.總結:
--//很明顯對方缺乏蠻力測試就把產品做了升級,我曾經對乙方工程師講過類似的問題,儘量不要讓客戶先於你發現產品的問題,這樣是很
--//丟臉的事情.你內部測試怎麼犯錯都可以,客戶是看不見的,一旦你犯錯,對方先發現,對方會........

--//實際上只要大量使用者連線資料庫,該語句的問題才明顯的暴露出來.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2926302/,如需轉載,請註明出處,否則將追究法律責任。

相關文章