最近幾個SQL最佳化案例(水一波部落格,當段子看😼😼)

小至尖尖發表於2024-05-14

某國產資料庫原廠高階工程師找我最佳化SQL,以下是他給的三個案例。😼

案例一:

慢SQL和執行計劃:

SELECT c.*
  FROM aaaaa a
 INNER JOIN bbbbbbbbbbb b
    ON a.attend_rule_id = b.attend_rule_id
 INNER JOIN cccccccccc c
    ON b.work_place_id = c.id
 INNER JOIN ddddddddddd d
    ON a.attend_rule_id = d.id
 WHERE a.staff_id = '0001A11000000005DVBN'
   AND nvl(a.dr, 0) = '0'
   AND nvl(b.dr, 0) = '0'
   AND nvl(c.dr, 0) = '0'
   AND nvl(d.dr, 0) = '0';
   

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.55..258.52 rows=1 width=2253) (actual time=320.842..320.845 rows=0 loops=1)
   ->  Nested Loop  (cost=0.28..251.02 rows=1 width=33) (actual time=320.841..320.843 rows=0 loops=1)
         ->  Nested Loop  (cost=0.00..235.87 rows=1 width=99) (actual time=0.050..83.992 rows=3629 loops=1)
               Join Filter: ((b.attend_rule_id)::text = (d.id)::text)
               Rows Removed by Join Filter: 540055
               ->  Seq Scan on bbbbbbbbbbb b  (cost=0.00..210.09 rows=20 width=66) (actual time=0.021..2.125 rows=3630 loops=1)
                     Filter: (NVL(int4(dr), 0) = 0)
                     Rows Removed by Filter: 314
               ->  Materialize  (cost=0.00..25.19 rows=2 width=33) (actual time=0.000..0.006 rows=150 loops=3630)
                     ->  Seq Scan on ddddddddddd d  (cost=0.00..25.18 rows=2 width=33) (actual time=0.013..0.171 rows=302 loops=1)
                           Filter: (NVL(int4(dr), 0) = 0)
                           Rows Removed by Filter: 45
         ->  Index Scan using scopepsnindex on aaaaa a  (cost=0.28..15.14 rows=1 width=33) (actual time=0.065..0.065 rows=0 loops=3629)
               Index Cond: ((attend_rule_id)::text = (b.attend_rule_id)::text)
               Filter: (((staff_id)::text = '0001A11000000005DVBN'::text) AND (NVL(int4(dr), 0) = 0))
               Rows Removed by Filter: 5
   ->  Index Scan using pk_attend_place on cccccccccc c  (cost=0.28..5.90 rows=1 width=2253) (never executed)
         Index Cond: ((id)::text = (b.work_place_id)::text)
         Filter: (NVL(int4(dr), 0) = 0)
 Planning Time: 3.101 ms
 Execution Time: 321.016 ms
(21 rows)

他將 enable_nestloop 關閉以後強行後hash,速度快了不少,但是需要使用hint干預計劃。

ncc=# set enable_nestloop=off;
SET
ncc=# explain  analyze  SELECT c.*
ncc-#   FROM aaaaa a
ncc-#  INNER JOIN bbbbbbbbbbb b
ncc-#     ON a.attend_rule_id = b.attend_rule_id
ncc-#  INNER JOIN cccccccccc c
ncc-#     ON b.work_place_id = c.id
ncc-#  INNER JOIN ddddddddddd d
ncc-#     ON a.attend_rule_id = d.id
ncc-#  WHERE a.staff_id = '0001A11000000005DVBN'
ncc-#    AND nvl(a.dr, 0) = '0'
ncc-#    AND nvl(b.dr, 0) = '0'
ncc-#    AND nvl(c.dr, 0) = '0'
ncc-#    AND nvl(d.dr, 0) = '0';
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=144.46..354.61 rows=1 width=2253) (actual time=0.497..0.499 rows=0 loops=1)
   Hash Cond: ((a.attend_rule_id)::text = (d.id)::text)
   ->  Hash Join  (cost=119.26..329.41 rows=1 width=2319) (actual time=0.282..0.283 rows=0 loops=1)
         Hash Cond: ((b.attend_rule_id)::text = (a.attend_rule_id)::text)
         ->  Hash Join  (cost=71.19..281.33 rows=1 width=2286) (never executed)
               Hash Cond: ((b.work_place_id)::text = (c.id)::text)
               ->  Seq Scan on bbbbbbbbbbb b  (cost=0.00..210.09 rows=20 width=66) (never executed)
                     Filter: (NVL(int4(dr), 0) = 0)
               ->  Hash  (cost=71.11..71.11 rows=6 width=2253) (never executed)
                     ->  Seq Scan on cccccccccc c  (cost=0.00..71.11 rows=6 width=2253) (never executed)
                           Filter: (NVL(int4(dr), 0) = 0)
         ->  Hash  (cost=48.05..48.05 rows=1 width=33) (actual time=0.278..0.279 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on aaaaa a  (cost=0.00..48.05 rows=1 width=33) (actual time=0.278..0.278 rows=0 loops=1)
                     Filter: (((staff_id)::text = '0001A11000000005DVBN'::text) AND (NVL(int4(dr), 0) = 0))
                     Rows Removed by Filter: 806
   ->  Hash  (cost=25.18..25.18 rows=2 width=33) (actual time=0.199..0.199 rows=302 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 28kB
         ->  Seq Scan on ddddddddddd d  (cost=0.00..25.18 rows=2 width=33) (actual time=0.013..0.148 rows=302 loops=1)
               Filter: (NVL(int4(dr), 0) = 0)
               Rows Removed by Filter: 45
 Planning Time: 1.556 ms
 Execution Time: 0.585 ms

這種簡單的語句我大概看了2秒種,給出改寫方案,不帶思考的。

 SELECT  c.*
   FROM aaaaa a
 INNER JOIN bbbbbbbbbbb b
     ON a.attend_rule_id = b.attend_rule_id
 INNER JOIN cccccccccc c
     ON b.work_place_id = c.id
  INNER JOIN ddddddddddd d
     ON a.attend_rule_id = d.id
  WHERE a.staff_id = '0001A11000000005DVBN'
    AND CASE WHEN a.dr IS NULL THEN 0 ELSE a.dr END = 0
    AND CASE WHEN b.dr IS NULL THEN 0 ELSE b.dr END = 0
    AND CASE WHEN c.dr IS NULL THEN 0 ELSE c.dr END = 0
    AND CASE WHEN d.dr IS NULL THEN 0 ELSE d.dr END = 0;

案例二:

SQL和執行計劃:

SELECT to_char(t1.signbegintime, 'YYYY-MM-DD HH24:MI:SS') AS workSignTime,
       to_char(t1.signendtime, 'YYYY-MM-DD HH24:MI:SS') AS workoffSignTime,
       t1.f_v_8 AS signType,
       to_char(t1.calendar, 'YYYY-MM-DD') AS calendar,
       to_char(t2.begintime, 'YYYY-MM-DD HH24:MI:SS') AS workTime,
       to_char(t2.endtime, 'YYYY-MM-DD HH24:MI:SS') AS workOffTime
  FROM ts_daystat t1
 INNER JOIN ts_shift t2
    ON t1.shift_id = t2.id
 WHERE t1.staff_id = '0001A110000000062QPG'
   AND t1.calendar LIKE '%2024-05%'
 ORDER BY calendar ASC;

                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=98960.15..98960.15 rows=1 width=167) (actual time=436.621..493.088 rows=9 loops=1)
   Sort Key: (to_char((t1.calendar)::timestamp without time zone, 'YYYY-MM-DD'::text))
   Sort Method: quicksort  Memory: 26kB
   ->  Nested Loop  (cost=1000.14..98960.14 rows=1 width=167) (actual time=123.310..493.061 rows=9 loops=1)
         ->  Gather  (cost=1000.00..98951.77 rows=1 width=63) (actual time=123.209..492.857 rows=9 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on ts_daystat t1  (cost=0.00..97951.67 rows=1 width=63) (actual time=184.979..430.729 rows=3 loops=3)
                     Filter: (((staff_id)::text = '0001A110000000062QPG'::text) AND ((calendar)::text ~~ '%2024-05%'::text))
                     Rows Removed by Filter: 199107
         ->  Index Scan using pk_shift on ts_shift t2  (cost=0.14..8.16 rows=1 width=48) (actual time=0.006..0.006 rows=1 loops=9)
               Index Cond: ((id)::text = (t1.shift_id)::text)
 Planning Time: 0.890 ms
 Execution Time: 493.383 ms
(14 rows)

這條SQL我也是簡單喵了一眼,讓原廠高階工程師加個索引 t1 表對 staff_id,shift_id 兩個列加個索引,但是他對我的建議表示懷疑。😂

畢竟人家是原廠工程師,對他們家的產品更加專業,更加權威,專業性這塊是毋庸置疑的。

我只能放低姿態,懇請原廠老師加條索引驗證下我的錯誤,只有認識到自己的錯誤,才能更好得進步。🤓

加了索引以後的執行計劃,已經從 493ms 降低到 1.5ms了。

我確實也很想反思一下自己,被原廠老師指正一下,認識自己的錯誤,才能進步。無奈可惜哥真的是太猛了,沒啥值得反思的。😁

案例三:

SQL和執行計劃:

SELECT a.*
   FROM bbbbbb b
  INNER JOIN eeeeee e
     ON b.pk_psndoc = e.pk_psndoc
  INNER JOIN (SELECT max(orgrelaid) AS orgrelaid, pk_psndoc
                FROM eeeeee
               WHERE indocflag = 'Y'
               GROUP BY pk_psndoc) tmp
     ON e.pk_psndoc = tmp.pk_psndoc
    AND e.orgrelaid = tmp.orgrelaid
  INNER JOIN aaaaaa a
     ON a.pk_psnorg = e.pk_psnorg
    AND a.lastflag = 'Y'
    AND a.ismainjob = 'Y'
  INNER JOIN oooooo o
     ON o.PK_ADMINORG = a.PK_ORG
  WHERE e.indocflag = 'Y'
    AND e.psntype = 0
    AND e.endflag = 'N'
    AND a.endflag = 'N'
    AND a.pk_group = '0001A110000000000JQ6'
    AND length(b.birthdate) = 10
    AND a.pk_hrorg IN ('0001A11000000000L7XF')
    AND substring(b.birthdate, 6, 5) >= '05-10'
    AND substring(b.birthdate, 6, 5) <= '06-09'
  ORDER BY substring(b.birthdate, 6, 5),
           substring(b.birthdate, 1, 4),
           a.PK_ORG,
           a.PK_DEPT,
           b.CODE;
           
           
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=43067.37..43067.37 rows=1 width=1887) (actual time=21375.999..21376.005 rows=38 loops=1)
   Sort Key: ("substring"((b.birthdate)::text, 6, 5)), ("substring"((b.birthdate)::text, 1, 4)), a.pk_org, a.pk_dept, b.code
   Sort Method: quicksort  Memory: 63kB
   ->  Nested Loop  (cost=8706.95..43067.36 rows=1 width=1887) (actual time=137.485..21375.586 rows=38 loops=1)
         ->  Nested Loop  (cost=8706.68..43067.03 rows=1 width=1834) (actual time=137.458..21374.882 rows=38 loops=1)
               ->  Nested Loop  (cost=8706.26..43064.13 rows=1 width=45) (actual time=54.801..21358.779 rows=620 loops=1)
                     Join Filter: ((b.pk_psndoc)::text = (e.pk_psndoc)::text)
                     ->  Nested Loop  (cost=8705.85..43062.82 rows=1 width=68) (actual time=54.750..21244.441 rows=2961 loops=1)
                           Join Filter: ((b.pk_psndoc)::text = (eeeeee.pk_psndoc)::text)
                           Rows Removed by Join Filter: 66180959
                           ->  Seq Scan on bbbbbb b  (cost=0.00..32921.80 rows=1 width=45) (actual time=0.014..79.239 rows=2969 loops=1)
                                 Filter: (("substring"((birthdate)::text, 6, 5) >= '05-10'::text) AND ("substring"((birthdate)::text, 6, 5) <= '06-09'::text) AND (length((birthdate)::text)
= 10))
                                 Rows Removed by Filter: 41327
                           ->  HashAggregate  (cost=8705.85..9147.44 rows=44159 width=23) (actual time=0.014..4.810 rows=22292 loops=2969)
                                 Group Key: eeeeee.pk_psndoc
                                 ->  Seq Scan on eeeeee  (cost=0.00..8484.91 rows=44187 width=23) (actual time=0.005..21.398 rows=44216 loops=1)
                                       Filter: (indocflag = 'Y'::bpchar)
                                       Rows Removed by Filter: 105
                     ->  Index Scan using i_hi_psnorg_1 on eeeeee e  (cost=0.41..1.30 rows=1 width=44) (actual time=0.033..0.033 rows=0 loops=2961)
                           Index Cond: ((pk_psndoc)::text = (eeeeee.pk_psndoc)::text)
                           Filter: ((indocflag = 'Y'::bpchar) AND (psntype = 0) AND (endflag = 'N'::bpchar) AND ((max(eeeeee.orgrelaid)) = orgrelaid))
                           Rows Removed by Filter: 1
               ->  Index Scan using i_hi_psnjob_8 on aaaaaa a  (cost=0.41..2.89 rows=1 width=1810) (actual time=0.024..0.024 rows=0 loops=620)
                     Index Cond: ((pk_psnorg)::text = (e.pk_psnorg)::text)
                     Filter: ((lastflag = 'Y'::bpchar) AND (ismainjob = 'Y'::bpchar) AND (endflag = 'N'::bpchar) AND (pk_group = '0001A110000000000JQ6'::bpchar) AND ((pk_hrorg)::text = '000
1A11000000000L7XF'::text))
                     Rows Removed by Filter: 1
         ->  Index Only Scan using i_pk_adminorg on oooooo o  (cost=0.27..0.32 rows=1 width=21) (actual time=0.011..0.011 rows=1 loops=38)
               Index Cond: (pk_adminorg = a.pk_org)
               Heap Fetches: 0
 Planning Time: 4.773 ms
 Execution Time: 21376.659 ms
(31 rows)

原廠大佬覺得可以加索引解決問題,但是在下確不這麼認為,還是等價改了一個版本讓原廠大佬測試下:

SELECT a.*
   FROM bbbbbb b
  INNER JOIN eeeeee e
     ON b.pk_psndoc = e.pk_psndoc
  INNER JOIN (SELECT max(orgrelaid) AS orgrelaid, pk_psndoc
                FROM eeeeee
               WHERE indocflag = 'Y' AND ROWNUM > 0
               GROUP BY pk_psndoc) tmp
     ON e.pk_psndoc = tmp.pk_psndoc
    AND e.orgrelaid = tmp.orgrelaid
  INNER JOIN aaaaaa a
     ON a.pk_psnorg = e.pk_psnorg
    AND a.lastflag = 'Y'
    AND a.ismainjob = 'Y'
  INNER JOIN oooooo o
     ON o.PK_ADMINORG = a.PK_ORG
  WHERE e.indocflag = 'Y'
    AND e.psntype = 0
    AND e.endflag = 'N'
    AND a.endflag = 'N'
    AND a.pk_group = '0001A110000000000JQ6'
    AND length(b.birthdate) = 10
    AND a.pk_hrorg IN ('0001A11000000000L7XF')
    AND substring(b.birthdate, 6, 5) >= '05-10'
    AND substring(b.birthdate, 6, 5) <= '06-09'
  ORDER BY substring(b.birthdate, 6, 5),
           substring(b.birthdate, 1, 4),
           a.PK_ORG,
           a.PK_DEPT,
           b.CODE;
           

改寫後這條語句 2.3 秒能跑出結果,比之前速度快了10倍。

原廠大佬最後還不遺餘力來請教我這條SQL最佳化的原理,這愛上進,愛學習的精神,真的是領人感到敬佩!!!😁😁😁

本文內容都是開玩笑的,大家當段子看就行。

相關文章