你還在這樣寫SQL嗎?趕緊改改吧

大能貓貓發表於2020-12-19

今天我們來盤點一下SQL常見錯誤用法以及如何去改正~

霍霍~開搞

SQL常見錯誤用法

用法一 分頁查詢

這是最常用的場景之一,同樣也是最容易出問題的地方,看這個sql

SELECT *
FROM   test_paper
WHERE  type = 0
AND name = '測試試卷'
ORDER  BY create_time
LIMIT  1000, 10;

你一般想到的辦法應該是給 type、name、create_time欄位加組合索引,這樣條件排序就能有效利用到索引,效能迅速提升~

BUT

這隻能解決90%的問題

假如 LIMIT子句變成“LIMIT 10000000,10”呢?

要知道DB不知道10000000是從什麼地方開始,即使你加了索引也需要從頭計算,進而出現效能問題

場景一解決方案

在前端瀏覽翻頁資料時或者需要大資料批量匯出等場景下,我們可以將上一頁的最大值當成引數作為查詢條件的~

SELECT   *
FROM     test_paper
WHERE    type = 0 
AND      name = '測試試卷' 
AND      create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;

動動你的小手試試吧

用法二 中間結果集下推

SELECT    a.*,
          c.allocated
FROM      (
                   SELECT   resourceid
                   FROM     my_distribute d
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a
LEFT JOIN 
          (
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                   FROM     my_resources
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

那麼該語句還存在其它問題嗎?不難看出子查詢 c 是全表聚合查詢,在表數量特別大的情況下會導致整個語句的效能下降。

其實對於子查詢 c,左連線最後結果集只關心能和主表resourceid能匹配的資料。因此我們可以重寫語句如下,執行時間從原來的2秒下降到2毫秒。

SELECT    a.*,
          c.allocated
FROM      (
                   SELECT   resourceid
                   FROM     my_distribute d
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a
LEFT JOIN 
          (
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                   FROM     my_resources r,
                            (
                                     SELECT   resourceid
                                     FROM     my_distribute d
                                     WHERE    isdelete = 0 
                                     AND      cusmanagercode = '1234567' 
                                     ORDER BY salecode limit 20) a
                   WHERE    r.resourcesid = a.resourcesid
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

但是子查詢 a 在我們的SQL語句中出現了多次。這種寫法不僅存在額外的開銷,還使得整個語句顯的繁雜。使用WITH語句再次重寫:

WITH a AS
(
         SELECT   resourceid
         FROM     my_distribute d
         WHERE    isdelete = 0 
         AND      cusmanagercode = '1234567' 
         ORDER BY salecode limit 20)
SELECT    a.*,
          c.allocated
FROM      a
LEFT JOIN 
          (
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated
                   FROM     my_resources r,
                            a
                   WHERE    r.resourcesid = a.resourcesid
                   GROUP BY resourcesid) c
ON        a.resourceid = c.resourcesid

用法三 關聯更新與刪除

UPDATE operation o
SET    status = 'applying'
WHERE  o.id IN (SELECT id
FROM   (SELECT o.id,
o.status
FROM   operation o
WHERE  o.group = 123
AND o.status NOT IN ( 'done' )
ORDER  BY o.parent,
o.id
LIMIT  1) t);

上邊語句,MySQL 實際執行的是迴圈/巢狀子查詢(DEPENDENT SUBQUERY),其執行時間可想而知。

重寫為JOIN之後,子查詢的選擇模式從DEPENDENT SUBQUERY變成DERIVED,執行速度大大加快,從7秒降低到2毫秒。

UPDATE operation o
       JOIN  (SELECT o.id,
                            o.status
                     FROM   operation o
                     WHERE  o.group = 123 
                            AND o.status NOT IN ( 'done' )
                     ORDER  BY o.parent,
                               o.id
                     LIMIT  1) t
         ON o.id = t.id
SET    status = 'applying'

用法四 混合排序

SELECT *
FROM   my_order o
       INNER JOIN my_appraise a ON a.orderid = o.id
ORDER  BY a.is_reply ASC,
          a.appraise_time DESC 
LIMIT  0, 20

由於is_reply只有0和1兩種狀態,我們按照下面的方法重寫後,執行時間從1.58秒降低到2毫秒。

SELECT *
FROM   ((SELECT *
         FROM   my_order o
                INNER JOIN my_appraise a
                        ON a.orderid = o.id
                           AND is_reply = 0 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20)
        UNION ALL
        (SELECT *
         FROM   my_order o
                INNER JOIN my_appraise a
                        ON a.orderid = o.id
                           AND is_reply = 1 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20)) t
ORDER  BY  is_reply ASC,
          appraisetime DESC 
LIMIT  20;

用法五 EXISTS

SELECT *
FROM   my_neighbor n
       LEFT JOIN my_neighbor_apply sra
              ON n.id = sra.neighbor_id
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND EXISTS(SELECT 1 
                  FROM   message_info m
                  WHERE  n.id = m.neighbor_id
                         AND m.inuser = 'xxx')
       AND n.topic_type <> 5

去掉exists更改為join,能夠避免巢狀子查詢,將執行時間從1.93秒降低為1毫秒。

SELECT *
FROM   my_neighbor n
       INNER JOIN message_info m
               ON n.id = m.neighbor_id
                  AND m.inuser = 'xxx' 
       LEFT JOIN my_neighbor_apply sra
              ON n.id = sra.neighbor_id
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND n.topic_type <> 5

用法六 複雜檢視

外部查詢條件不能夠下推到複雜的檢視或子查詢的情況有:

  1. 聚合子查詢;
  2. 含有LIMIT的子查詢;
  3. UNION 或UNION ALL子查詢;
  4. 輸出欄位中的子查詢;
SELECT *
FROM   (SELECT target,
               Count(*)
        FROM   operation
        GROUP  BY target) t
WHERE  target = 'rm-xxxx'

確定從語義上查詢條件可以直接下推後,重寫如下:

SELECT target,
       Count(*)
FROM   operation
WHERE  target = 'rm-xxxx' 
GROUP  BY target

用法七 提前縮小範圍

SELECT *
FROM   my_order o
       LEFT JOIN my_userinfo u
              ON o.uid = u.uid
       LEFT JOIN my_productinfo p
              ON o.pid = p.pid
WHERE  ( o.display = 0 )
       AND ( o.ostaus = 1 )
ORDER  BY o.selltime DESC 
LIMIT  0, 15

初始SQL
該SQL語句原意是:先做一系列的左連線,然後排序取前15條記錄。從執行計劃也可以看出,最後一步估算排序記錄數為90萬,時間消耗為12秒。

由於最後WHERE條件以及排序均針對最左主表,因此可以先對my_order排序提前縮小資料量再做左連線。SQL重寫後如下,執行時間縮小為1毫秒左右。

SELECT *
FROM (
SELECT *
FROM   my_order o
WHERE  ( o.display = 0 )
       AND ( o.ostaus = 1 )
ORDER  BY o.selltime DESC 
LIMIT  0, 15
) o
     LEFT JOIN my_userinfo u
              ON o.uid = u.uid
     LEFT JOIN my_productinfo p
              ON o.pid = p.pid
ORDER BY  o.selltime DESC
limit 0, 15

再檢查執行計劃:子查詢物化後(select_type=DERIVED)參與JOIN。雖然估算行掃描仍然為90萬,但是利用了索引以及LIMIT 子句後,實際執行時間變得很小。

總結:

資料庫編譯器產生執行計劃,決定著SQL的實際執行方式。但是編譯器只是盡力服務,所有資料庫的編譯器都不是盡善盡美的。上述提到的多數場景,在其它資料庫中也存在效能問題。瞭解資料庫編譯器的特性,才能避規其短處,寫出高效能的SQL語句。

程式設計師在設計資料模型以及編寫SQL語句時,要把演算法的思想或意識帶進來。編寫複雜SQL語句要養成使用WITH語句的習慣。簡潔且思路清晰的SQL語句也能減小資料庫的負擔

相關文章