[20200808]優化的困惑10.txt

lfree發表於2020-08-09

[20200808]優化的困惑10.txt

--//前一段時間遇到的問題,連結http://www.itpub.net/thread-2137868-1-1.html,CPU使用率100%.
--//我下載對方提供的awr報表,順便說一下.對方提供的awr html檔案無法正常跳轉.必須使用vim執行如下:
--// :%s+http://172.19.11.136/awr/dongguan/DGMESDB-mesdb/202007/mesdb_73613_73614_202007220700_202007220800.html++

--//很容易定位裡面的問題,邏輯讀太高,消耗大量CPU資源,而裡面sql語句使用rule提示.
--//但是我開始看時還是遇到1點點疑惑.
--//注:資料庫版本是10.2.0.5.0,作業系統:windows 64位版本的,實際上我個人建議不要在windows下跑資料庫,特別在生產環境,無形增
--//加問題定位的難度.

1.DB Time問題:
            Snap Id  Snap Time           Sessions  Cursors/Session
Begin Snap: 73613    22-Jul-20 07:00:36  206       24.9
End Snap:   73614    22-Jul-20 08:00:16  212       23.2
Elapsed:             59.66 (mins)                                     
DB Time:             1,070.16 (mins)         

--//取樣1個小時,而DB Time達到1070/60 = 17.83 小時.而實際的CPU數量是8.:
Operating System Statistics
Statistic               Total
AVG_BUSY_TIME           354,972
AVG_IDLE_TIME           2,432
AVG_SYS_TIME            4,388
AVG_USER_TIME           350,487
BUSY_TIME               2,840,695
IDLE_TIME               19,653
SYS_TIME                36,011
USER_TIME               2,804,684
RSRC_MGR_CPU_WAIT_TIME  0
VM_IN_BYTES             -1.5E+19
VM_OUT_BYTES            2.1E+16
PHYSICAL_MEMORY_BYTES   85,862,932,480
NUM_CPUS                8
NUM_CPU_CORES           8
--//實際的CPU數量僅僅8個,感覺不大可能DB time大於8小時.
--//再回過頭看.

Top 5 Timed Events
Event                           Waits    Time(s)  Avg Wait(ms) % Total Call Time  Wait Class
CPU time                                 29,020                45.2                
latch: cache buffers chains     101,977  4,455    44           6.9                Concurrency
db file sequential read         261,308  3,864    15           6.0                User I/O
log file sync                   124,032  3,540    29           5.5                Commit
direct path read temp           38,162   1,321    35           2.1                User I/O

--//注意觀察列 Total Call Time,CPU time僅僅佔45.2. 這些前5名相加僅僅45.2+6.9+6.0+5.5+2.1 = 65.7.
--//還有什麼其它事件佔3X%呢?
--//另外latch: cache buffers chains 的Avg Wait(ms) 達到 44ms,真心覺得有一些誇張...CPU已經消耗殆盡了.
--//順便說一下,我以前學習oracle一直認為CPU資源很難消耗100%.除非人為"破壞".直到隨著學習的深入...

2.繼續:
--//再看SQL ordered by Elapsed Time部分.

SQL ordered by Elapsed Time

    Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
    % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
    Total DB Time (s): 64,209
    Captured SQL account for 55.0% of Total
--//注意Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.

Elapsed Time (s)  CPU Time (s)  Executions  Elap per Exec (s)  % Total DB Time  SQL Id         SQL Module  SQL Text
6,346             2,655         164,771     0.04               9.88             gyc05t01rz49h  TGS2.exe    declare p_owner varchar(30) :=...
6,110             2,551         164,791     0.04               9.52             0tpg5r73s1146  TGS2.exe    SELECT /*+ RULE */ COUNT(*) FR...
5,389             2,601         164,835     0.03               8.39             8cd1y3p3rnsys  TGS2.exe    declare p_owner varchar(30) :=...
5,316             2,563         164,829     0.03               8.28             6x6aknt5p0uqd  TGS2.exe    SELECT /*+ RULE */ ARGUMENT_NA...

--//一定是sql_id=gyc05t01rz49h裡面呼叫sql_id=0tpg5r73s1146語句.
--//sql_id=gyc05t01rz49h格式化如下:
DECLARE
   p_owner         VARCHAR (30) := :owner;
   p_object_name   VARCHAR (30) := :object_name;
   p_proc_name     VARCHAR (30) := :procedure_name;
   p_overload      NUMBER := :overload;
   p_object_type   VARCHAR (19);
   p_count         NUMBER;
BEGIN
   IF p_owner IS NULL
   THEN
      BEGIN
         SELECT /*+ RULE */
               object_type, owner
           INTO p_object_type, p_owner
           FROM sys.all_objects
          WHERE     owner = SYS_CONTEXT ('USERENV', 'CURRENT_SCHEMA')
                AND object_type IN
                       ('PROCEDURE'
                       ,'FUNCTION'
                       ,'PACKAGE'
                       ,'PACKAGE BODY'
                       ,'SYNONYM')
                AND object_name = p_object_name
                AND ROWNUM <= 1;
      EXCEPTION
         WHEN NO_DATA_FOUND
         THEN
            BEGIN
               SELECT /*+ RULE */
                     object_type, owner
                 INTO p_object_type, p_owner
                 FROM sys.all_objects
                WHERE     owner = 'PUBLIC'
                      AND object_type IN
                             ('PROCEDURE'
                             ,'FUNCTION'
                             ,'PACKAGE'
                             ,'PACKAGE BODY'
                             ,'SYNONYM')
                      AND object_name = p_object_name
                      AND ROWNUM <= 1;
            EXCEPTION
               WHEN NO_DATA_FOUND
               THEN
                  IF p_proc_name IS NOT NULL
                  THEN
                     p_owner := p_object_name;
                     p_object_name := p_proc_name;
                     p_proc_name := NULL;
                  ELSE
                     RAISE;
                  END IF;
            END;
      END;
   END IF;

   IF p_object_type IS NULL
   THEN
      SELECT /*+ RULE */
            object_type
        INTO p_object_type
        FROM sys.all_objects
       WHERE     object_type IN
                    ('PROCEDURE'
                    ,'FUNCTION'
                    ,'PACKAGE'
                    ,'PACKAGE BODY'
                    ,'SYNONYM')
             AND owner = p_owner
             AND object_name = p_object_name
             AND ROWNUM <= 1;
   END IF;

   WHILE p_object_type = 'SYNONYM'
   LOOP
      SELECT /*+ RULE */
            table_owner, table_name
        INTO p_owner, p_object_name
        FROM sys.all_synonyms
       WHERE owner = p_owner AND synonym_name = p_object_name;

      SELECT /*+ RULE */
            object_type
        INTO p_object_type
        FROM sys.all_objects
       WHERE owner = p_owner AND object_name = p_object_name AND ROWNUM <= 1;
   END LOOP;

   SELECT /*+ RULE */
         COUNT (*)
     INTO p_count
     FROM sys.all_procedures
    WHERE     owner = p_owner
          AND object_name = p_object_name
          AND (   procedure_name = p_proc_name
               OR p_proc_name IS NULL AND procedure_name IS NULL);

   IF p_count < p_overload
   THEN
      RAISE NO_DATA_FOUND;
   END IF;

   :owner := p_owner;
   :object_name := p_object_name;
   :procedure_name := p_proc_name;

   SELECT /*+ RULE */
         COUNT (*)
     INTO :param_count
     FROM sys.all_arguments
    WHERE     owner = p_owner
          AND (       p_proc_name IS NULL
                  AND package_name IS NULL
                  AND object_name = p_object_name
               OR     p_proc_name IS NOT NULL
                  AND package_name = p_object_name
                  AND object_name = p_proc_name
                  AND NVL (overload, 1) = p_overload)
          AND data_type IS NOT NULL;
END;

--//sql_id = 0tpg5r73s1146.

SELECT /*+ RULE */
      COUNT (*)
  FROM SYS.ALL_ARGUMENTS
 WHERE     OWNER = :B4
       AND (   :B2 IS NULL AND PACKAGE_NAME IS NULL AND OBJECT_NAME = :B3
            OR     :B2 IS NOT NULL
               AND PACKAGE_NAME = :B3
               AND OBJECT_NAME = :B2
               AND NVL (OVERLOAD, 1) = :B1)
       AND DATA_TYPE IS NOT NULL;

--//仔細比較可以發現sql_id = 0tpg5r73s1146正是前面儲存過程最後要執行的語句.
--//我有點奇怪的是使用rule提示.這些儲存過程是開發寫的嗎?還是程式後臺自動生成執行的,這個也是我的疑問.
--//這樣就很好理解我為什麼看到db time很高了,db time的執行時間被重複計算了.

--//再看SQL ordered by Gets部分.
SQL ordered by Gets

Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
Total Buffer Gets: 8,982,272,506
Captured SQL account for 29.5% of Total

Buffer Gets  Executions  Gets per Exec  %Total  CPU Time (s)  Elapsed Time (s)  SQL Id                 SQL Module  SQL Text
945,581,260  164,771     5,738.76       10.53   2655.28       6345.51           gyc05t01rz49h  TGS2.exe    declare p_owner varchar(30) :=...
940,649,793  164,835     5,706.61       10.47   2600.58       5389.32           8cd1y3p3rnsys  TGS2.exe    declare p_owner varchar(30) :=...
940,453,081  164,791     5,706.94       10.47   2550.73       6109.69           0tpg5r73s1146  TGS2.exe    SELECT /*+ RULE */ COUNT(*) FR...
940,296,517  164,829     5,704.68       10.47   2562.72       5316.13           6x6aknt5p0uqd  TGS2.exe    SELECT /*+ RULE */ ARGUMENT_NA...
--//可以相互驗證.其實每次並不是很多,僅僅57XX 邏輯讀,但是執行頻率非常高.
--//164771/3600 = 45.76次/秒.

3.解決方案:
--//設定 alter system set "_optimizer_ignore_hints"=true看看.

SYS@test> alter system set "_optimizer_ignore_hints"=true scope=memory;
System altered.

--//也可以在session 設定"_optimizer_ignore_hints"=true;引數. 通過登陸觸發器設定.

--//分析資料字典以及固定物件.
execute sys.dbms_stats.GATHER_FIXED_OBJECTS_STATS()
execute sys.dbms_stats.GATHER_DICTIONARY_STATS()

--//可惜提出解決方案後,樓主一直沒給出反饋建議.也不知道我分析是否正確.
--//還有1個非常無賴的方案就是玩一點點欺騙,改名檢視sys.all_arguments,然後建立一張真實的表代替.
--//sys.all_arguments裡面實際上記錄呼叫函式,儲存過程的引數.

--//還有我記憶遇到程式大量呼叫SYS.ALL_ARGUMENTS是使用deliph程式開發的.
--//不過我以前遇到的情況不同,執行語句裡面沒owner,而是直接寫ALL_ARGUMENTS,這樣更加簡單,我直接在使用者的schema下建議一張
--//ALL_ARGUMENTS表,來欺騙應用程式,風險就是如果改儲存過程帶入引數發生變化,ALL_ARGUMENTS也要同步更新.不知道不更新情況會怎
--//樣??

--//當然解決完上述問題,還可能需要優化剩下的sql語句,不過lz沒反饋,只能分析到這裡了.比如:
a4g0jrkkr3ggn

--//格式化:
  SELECT COUNT (*) AS fail_QTY, T1.WORK_ORDER, T2.DEFECT_DESC
    FROM (SELECT a.recid
                ,a.DEFECT_id
                ,A.WORK_ORDER
                ,a.SERIAL_NUMBER
                ,a.rec_time
                ,B.Defect_Code
                ,ROW_NUMBER () OVER ( PARTITION BY A.WORK_ORDER, A.SERIAL_NUMBER ORDER BY A.rec_time ASC, B.Defect_Code DESC)
                    AS rowindex
            FROM sajet.G_SN_DEFECT a, sajet.sys_defect B
           WHERE     A.PROCESS_ID = :ProcessId
                 AND A.WORK_ORDER = :WO
                 AND a.defect_id = B.defect_id
                 AND a.SERIAL_NUMBER IN
                        (SELECT SERIAL_NUMBER
                           FROM (SELECT A.WORK_ORDER
                                       ,a.SERIAL_NUMBER
                                       ,a.CURRENT_STATUS
                                       ,a.OUT_PROCESS_TIME
                                       ,RANK () OVER ( PARTITION BY A.WORK_ORDER ,A.SERIAL_NUMBER ORDER BY a.OUT_PROCESS_TIME)
                                           AS rowindex
                                   FROM sajet.G_SN_TRAVEL a
                                  WHERE     A.WORK_ORDER = :WO
                                        AND A.PROCESS_ID = :ProcessId) b
                          WHERE     b.CURRENT_STATUS = 1
                                AND b.rowindex = 1
                                AND TO_CHAR ( b.OUT_PROCESS_TIME ,'yyyymmddHH24') BETWEEN :STime AND :ETime)) T1
        ,sajet.SYS_DEFECT T2
   WHERE T1.rowindex = 1 AND T1.DEFECT_id = T2.DEFECT_id
GROUP BY T1.WORK_ORDER, T2.DEFECT_DESC
ORDER BY COUNT (*) DESC;

--//使用TO_CHAR函式,以及內層IN使用分析函式是否多餘,理論講只要exists應該就可以了.看看是否有優化餘地.

4.繼續分析:
--//看看Load Profile部分:
Load Profile

                  Per Second    Per Transaction
Redo size:        338,989.88    2,438.14
Logical reads:    2,509,093.86  18,046.33
Block changes:    3,404.38      24.49
Physical reads:   284.63        2.05
Physical writes:  257.32        1.85
User calls:       975.01        7.01
Parses:           925.49        6.66
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Hard parses:      33.63         0.24
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sorts:            940.09        6.76
Logons:           0.64          0.00
Executes:         10,063.58     72.38
Transactions:     139.04                           

--//硬分析不是很大.但是在SQL ordered by CPU Time部分出現:

CPU Time (s)  Elapsed Time (s)  Executions  CPU per Exec (s)  % Total  % Total DB Time  SQL Id           SQL Module     SQL Text
2,655         6,346             164,771     0.02              9.15     9.88             gyc05t01rz49h  TGS2.exe       declare p_owner varchar(30) :=...
2,601         5,389             164,835     0.02              8.96     8.39             8cd1y3p3rnsys  TGS2.exe       declare p_owner varchar(30) :=...
2,563         5,316             164,829     0.02              8.83     8.28             6x6aknt5p0uqd  TGS2.exe       SELECT /*+ RULE */ ARGUMENT_NA...
2,551         6,110             164,791     0.02              8.79     9.52             0tpg5r73s1146  TGS2.exe       SELECT /*+ RULE */ COUNT(*) FR...
848           984               30          28.26             2.92     1.53             67086ft5pmkbn                 DECLARE job BINARY_INTEGER := ...
582           1,174             6,607       0.09              2.01     1.83             04t2q2nf3pd4b  APService.exe  begin SAJET.SJ_CKRT_INPUTMATER...
579           1,165             6,607       0.09              1.99     1.81             cca7qc96c2k2h  APService.exe  SELECT COUNT (DISTINCT STATION...
293           736               128,775     0.00              1.01     1.15             7dupqruy76j0u  TGS2.exe       BEGIN COMMAND_CODE(:TSAJET1, :...
213           380               690         0.31              0.73     0.59             gdb3vjjk0m4xs  APService.exe  select max(END_TIME) from SAJE...
211           392               690         0.31              0.73     0.61             88jc1wztmqbky  APService.exe  select min(END_TIME) from SAJE...

--//sql_id=gdb3vjjk0m4xs
gdb3vjjk0m4xs     select max(END_TIME) from SAJET.G_WH_HR A where A.WORK_ORDER= '20405946' AND PROCESS_ID='100085' AND SHIFT_ID IN ( 10000017) AND SHIFT_DATE ='2020/07/21' AND PDLINE_ID = 10075

SELECT max(END_TIME)
  FROM SAJET.G_WH_HR A
 WHERE A.WORK_ORDER = '20405946'
   AND PROCESS_ID   = '100085'
   AND SHIFT_ID IN ( 10000017)
   AND SHIFT_DATE   = '2020/07/21'
   AND PDLINE_ID    = 10075
--//沒有使用繫結變數,應該可以有優化餘地.

88jc1wztmqbky     select min(END_TIME) from SAJET.G_WH_HR A where A.WORK_ORDER= '20405946' AND PROCESS_ID='100085' AND SHIFT_ID IN ( 10000017) AND SHIFT_DATE ='2020/07/21' AND PDLINE_ID = 10075

SELECT min(END_TIME)
  FROM SAJET.G_WH_HR A
 WHERE A.WORK_ORDER = '20405946'
   AND PROCESS_ID   = '100085'
   AND SHIFT_ID IN ( 10000017)
   AND SHIFT_DATE   = '2020/07/21'
   AND PDLINE_ID    = 10075
--//沒有使用繫結變數,應該可以有優化餘地.我估計程式應該有很多類似的語句.awr報表僅僅出現2次.

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/267265/viewspace-2710302/,如需轉載,請註明出處,否則將追究法律責任。

相關文章