一次資料庫硬解析的分析全過程

YallonKing發表於2012-09-25
ps:其實這個case很常見,而且在正規的開發流程之下也不會出現這種情況,更不會出現在上線之後,這篇文章主要是體現一個發現問題和分析問題解決問題的思路。希望對大家能有多幫助。

說明:

經過開發方面的努力,以及對問題sql的優化之後,截止目前,之前發現的問題sql已經得到了緩解。之後,我們對資料庫2個節點進行了整體分析,目前得出以下問題。

依據材料:2個節點的awr報告,見附件


經過對資料庫2個節點的整體分析,發現2個節點的資源承載相差較大。以下為2個節點的對比:

 

負載情況比對

節點1


Per SecondPer Transaction
Redo size: 289,642.11 2,459.01
Logical reads: 20,588.01 174.79
Block changes: 1,452.02 12.33
Physical reads: 7.15 0.06
Physical writes: 47.59 0.40
User calls: 730.83 6.20
Parses: 268.36 2.28
Hard parses: 2.26 0.02
Sorts: 211.82 1.80
Logons: 0.51 0.00
Executes: 471.03 4.00
Transactions: 117.79 

節點2


Per SecondPer Transaction
Redo size: 115,937.47 4,102.67
Logical reads: 40,494.67 1,432.98
Block changes: 334.81 11.85
Physical reads: 19.53 0.69
Physical writes: 18.26 0.65
User calls: 4,874.41 172.49
Parses: 1,242.13 43.96
Hard parses: 2.03 0.07
Sorts: 203.80 7.21
Logons: 0.66 0.02
Executes: 4,298.22 152.10
Transactions: 28.26 

 

說明:

無論從redosize還是transactions明顯可以看出節點1比節點2忙碌,雖然2個節點的hard parses相差貌似不是很大,但是從其executes數以及parses分析,初步斷定是節點1的大量硬解析導致的。

 

例項效率比對

節點1

Buffer Nowait %: 99.94Redo NoWait %: 100.00
Buffer Hit %: 99.97In-memory Sort %: 100.00
Library Hit %: 57.85Soft Parse %: 99.16
Execute to Parse %: 43.03Latch Hit %: 99.94
Parse CPU to Parse Elapsd %: 20.39% Non-Parse CPU: 98.76

節點2

Buffer Nowait %: 100.00Redo NoWait %: 100.00
Buffer Hit %: 99.95In-memory Sort %: 100.00
Library Hit %: 99.92Soft Parse %: 99.84
Execute to Parse %: 71.10Latch Hit %: 99.96
Parse CPU to Parse Elapsd %: 2.62% Non-Parse CPU: 98.64

 

說明:

通過library hit%parse cpu to parse elapsd%來看,基本能夠確定是節點1的大量硬解析導致的。

 

經過以上分析,我們在sql統計中,節點1如下所示:

(注:SQL ordered by Elapsed Time

Elapsed Time (s)CPU Time (s)Executions Elap per Exec (s) % Total DB Time SQL IdSQL ModuleSQL Text
8,866 7 1 8865.61 20.41 4y67m1n5ru9h9 timeTask@xjitnmapp2 (TNS V1-V3) begin PKP_XJ_BUSI_MONITOR_UPD...
2,017 0 11 183.32 4.64 at30z235zpx7a timeTask@xjitnmapp2 (TNS V1-V3) begin xj_db_inas_proc; end;
1,355 1,312 0   3.12 ax33m7btymb2r PL/SQL Developer begin PKP_XJ_CSD_TEST.agent_pe...
1,189 365 716,555 0.00 2.74 765mqyxhskk9d ValveMgmt@xjitnmapp2 (TNS V1-V3) insert into ne_perf_msg (ne_pe...
1,049 0 10 104.90 2.41 cyyyswcsmvfqx timeTask@xjitnmapp2 (TNS V1-V3) SELECT COUNT(1), 'IOM-N' FROM ...
1,022 398 716,629 0.00 2.35 9qfdgm3c702rs ValveMgmt@xjitnmapp2 (TNS V1-V3) begin Proc_NE_Perf_Msg_UpdateR...
940 0 10 94.00 2.16 dzt58v2bnamc4 timeTask@xjitnmapp2 (TNS V1-V3) SELECT COUNT(1), 'T' FROM INAS...
923 935 24 38.45 2.12 4uxbs8fhb6yv5 timeTask@xjitnmapp2 (TNS V1-V3) begin pkp_maint_job.hastenNot...
923 935 24 38.45 2.12 g2zqp9kt9fhhx timeTask@xjitnmapp2 (TNS V1-V3) SELECT A.JOB_NAME, B.DUTY_ID, ...
916 929 32,717 0.03 2.11 657k3nvp6d77r timeTask@xjitnmapp2 (TNS V1-V3) SELECT COUNT(*) FROM (SELECT :...
914 293 716,627 0.00 2.10 607t3jmh1rjgy ValveMgmt@xjitnmapp2 (TNS V1-V3) UPDATE NE_PERF_MSG_REAL SET PE...
517 0 1 517.19 1.19 3v8usax04tqx6 timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
517 0 1 516.51 1.19 f3d7w5j1khh59 timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
490 0 1 490.14 1.13 2aw24jjhsbz1z timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
482 0 1 482.15 1.11 frgk27g6p6auh timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
482 0 1 481.85 1.11 2mjp53agwxb10 timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
479 0 1 479.35 1.10 4ha87wy9k784t timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
479 0 1 478.68 1.10 c8utt9uwda7mv timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
466 0 1 466.07 1.07 g2s1s0vubjuba timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
466 0 1 465.99 1.07 57pq01hujfa6d timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
464 0 1 463.93 1.07 4z5gn1sh2h3p9 timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
463 0 1 463.24 1.07 3xtjfvgq9hw3u timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
454 0 1 453.65 1.04 cyc352n70sndr timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
447 0 1 447.39 1.03 6dz4h3cpngvfk timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
445 0 1 445.12 1.02 3maa9dy6ybfq0 timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
445 0 1 444.96 1.02 5pabqbfux2bdd timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
443 0 1 443.12 1.02 cu3f3fwntq6q5 timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
441 0 1 440.93 1.02 5v6z9vx2anap4 timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
438 0 1 437.93 1.01 3skc5wbqaa13d timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...
437 0 1 437.22 1.01 ffzq6ucvadp5n timeTask@xjitnmapp2 (TNS V1-V3) SELECT /*+parallel(t, 12)*/t.s...

說明:在上圖(節點1)中發現,有大量的以下語句出現:

SELECT /*+parallel(t, 12)*/t.s…..

具體分析後,這種情況在節點2中不存在。

對上邊的大量相同的sql具體分析如下:

SELECT /*+parallel(t, 12)*/

 t.source_id,

 t.start_date,

 SUM(t.num_cdrs),

 SUM(t.settlement_duration),

 SUM(t.sett_fee)

  FROM (SELECT d.source_id,

               substr(d.start_date, 1, 8) start_date,

               d.num_cdrs,

               d.settlement_duration,

               d.sett_fee

          FROM zhjs_app.tl_y_voice_sett_201208@xjzhjs d,

               (SELECT element_id

                  FROM zhjs_param.tp_relation_element@xjzhjs

                 WHERE rpt_code = '15'

                   AND tnode IN

                       ('415102', '415105', '415108', '416103', '416106')) c

         WHERE d.account_item = c.element_id

        UNION ALL

        SELECT d.source_id,

               substr(d.start_date, 1, 8) start_date,

               d.num_cdrs,

               d.settlement_duration,

               d.sett_fee

          FROM zhjs_app.tl_y_voice_sett_201209@xjzhjs d,

               (SELECT element_id

                  FROM zhjs_param.tp_relation_element@xjzhjs

                 WHERE rpt_code = '15'

                   AND tnode IN

                       ('415102', '415105', '415108', '416103', '416106')) c

         WHERE d.account_item = c.element_id) t

 GROUP BY t.source_id, t.start_date;

 

說明:

在節點1,存在這樣大量的sql,其中除了在in 子句中的物件個數不同外,其他基本全部一致。這也是導致節點1大量硬解析,消耗主機資源,拖低節點1各種效能指標的原因。

 

建議:

對以上語句進行繫結變數處理。

 

總結:

這次對XXXX資料庫效能的監控主要從其整體結構來進行分析的,並不是將全部精力放在盯某個或者某類sql上,我們認為這樣可以避免瞎子摸象的問題產生,至於最後還是落在具體sql上,這個只是一種分析的結果,而非以往的一種慣性“有效能問題就找sql”這樣的分析過程。

         最後,希望在和開發等同事的一起努力下,XXXX效能有所大的提高。

ps:很不習慣寫文件,而且還那種給人家找事的文件,基本形成了格式,首先肯定人家前期的努力和結果,其次不但提出自己的見解,還得表明自己的分析依據和分析過程,完了再加以一定的鼓勵和前瞻性。哎..真勞神,其實也就一句話的事兒..

Normal 0 false 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE

相關文章