今天晚上下班回來才有空看群,群友發了一條很簡單的慢SQL問怎麼最佳化。
非常簡單,我自己模擬的資料。
表結構:
-- auto-generated definition CREATE TABLE HHHHHH ( ID NUMBER NOT NULL PRIMARY KEY, NAME VARCHAR2(20), PARAGRAPH_ID NUMBER ) / CREATE INDEX IDX_1_2_PARAGRAPH_HIST_RULE ON HHHHHH (PARAGRAPH_ID) / CREATE INDEX IDX_1_2_NAME_HIST_RULE ON HHHHHH (NAME) /
資料量:
SQL> select count(1) from HHHHHH; COUNT(1) ---------- 200002 Elapsed: 00:00:00.00
慢SQL:
SELECT a.* FROM hhhhhh a WHERE a.name IN ( SELECT name from hhhhhh b GROUP BY b.name HAVING count(DISTINCT b.paragraph_id) = 1 ); Plan hash value: 1063187735 ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 38 | 5 (20)| 00:00:01 | |* 1 | FILTER | | | | | | | 2 | TABLE ACCESS FULL | HHHHHH | 1 | 38 | 2 (0)| 00:00:01 | |* 3 | FILTER | | | | | | | 4 | HASH GROUP BY | | 1 | 25 | 3 (34)| 00:00:01 | | 5 | VIEW | VM_NWVW_1 | 1 | 25 | 3 (34)| 00:00:01 | | 6 | SORT GROUP BY | | 1 | 25 | 3 (34)| 00:00:01 | | 7 | TABLE ACCESS FULL| HHHHHH | 1 | 25 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- " 1 - filter( EXISTS (SELECT 0 FROM (SELECT ""B"".""PARAGRAPH_ID"" " " ""$vm_col_1"",""B"".""NAME"" ""$vm_col_2"" FROM ""HHHHHH"" ""B"" GROUP BY " " ""B"".""NAME"",""B"".""PARAGRAPH_ID"") ""VM_NWVW_1"" GROUP BY ""$vm_col_2"" HAVING " " ""$vm_col_2""=:B1 AND COUNT(""$vm_col_1"")=1))" " 3 - filter(""$vm_col_2""=:B1 AND COUNT(""$vm_col_1"")=1)"
跑了24秒沒出結果我就幹掉了,正常來說Oracle 這種遙遙領先的資料庫,不能100毫秒以內出結果都有問題。
簡單看了下上面的計劃 Predicate Information 謂詞資訊,裡面資訊很複雜,懶得解釋(其實我也不懂為啥CBO為啥這樣亂分組過濾),並沒啥卵用,感覺很SB。
一句話就是CBO等價改寫了 EXISTS 還有 :B1這種變數,每次都是傳個值到:B1 然後進行filter , 重點是每次。反正各位讀者以後在計劃中看到這種 :B1 變數都是每次每次,就是一次一次的傳值,比較完一個資料繼續傳。
這種按照 PG 的說法就是複雜的子連線無法提升, GROUP BY b.name HAVING count(DISTINCT b.paragraph_id) = 1 惹得鍋。
複雜的子連線無法提升參考 <<PostgreSQL技術內幕:查詢最佳化深度探索 >>這本書 3.2篇章。
加個HINT:
SELECT a.* FROM hhhhhh a WHERE a.name IN ( SELECT /*+ unnest */ name from hhhhhh b GROUP BY b.name HAVING count(DISTINCT b.paragraph_id) = 1 5 ); ID NAME PARAGRAPH_ID ---------- -------------------- ------------ 200002 aaaaa 10000001 Elapsed: 00:00:00.05 Plan hash value: 3353221841 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 50 | 5 (20)| 00:00:01 | |* 1 | HASH JOIN SEMI | | 1 | 50 | 5 (20)| 00:00:01 | | 2 | TABLE ACCESS FULL | HHHHHH | 1 | 38 | 2 (0)| 00:00:01 | | 3 | VIEW | VW_NSO_1 | 1 | 12 | 3 (34)| 00:00:01 | |* 4 | FILTER | | | | | | | 5 | HASH GROUP BY | | 1 | 25 | 3 (34)| 00:00:01 | | 6 | VIEW | VM_NWVW_2 | 1 | 25 | 3 (34)| 00:00:01 | | 7 | HASH GROUP BY | | 1 | 25 | 3 (34)| 00:00:01 | | 8 | TABLE ACCESS FULL| HHHHHH | 1 | 25 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- " 1 - access(""A"".""NAME""=""NAME"")" " 4 - filter(COUNT(""$vm_col_1"")=1)"
使用HINT將子連結強行提升(展開)以後,秒出。
但是使用HINT容易將執行計劃固定住,非必要情況下不推薦。
等價改寫該SQL 方式1:
SELECT A.* FROM HHHHHH A INNER JOIN (SELECT COUNT(1) BB, NAME FROM HHHHHH B 5 GROUP BY NAME) B ON A.NAME = B.NAME AND B.BB = 1; ID NAME PARAGRAPH_ID ---------- -------------------- ------------ 200002 aaaaa 10000001 Elapsed: 00:00:00.03 Plan hash value: 3909860973 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 50 | 5 (20)| 00:00:01 | |* 1 | HASH JOIN | | 1 | 50 | 5 (20)| 00:00:01 | | 2 | TABLE ACCESS FULL | HHHHHH | 1 | 38 | 2 (0)| 00:00:01 | | 3 | VIEW | | 1 | 12 | 3 (34)| 00:00:01 | |* 4 | FILTER | | | | | | | 5 | HASH GROUP BY | | 1 | 12 | 3 (34)| 00:00:01 | | 6 | TABLE ACCESS FULL| HHHHHH | 1 | 12 | 2 (0)| 00:00:01 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- " 1 - access(""A"".""NAME""=""B"".""NAME"")" 4 - filter(COUNT(*)=1)
改寫成 join 以後也是秒出。
等價改寫該SQL 方式2:
SELECT X.ID, X.NAME, X.PARAGRAPH_ID FROM (SELECT A.*, COUNT(DISTINCT PARAGRAPH_ID) OVER (PARTITION BY NAME) CNT FROM HHHHHH A) X 5 WHERE X.CNT = 1; ID NAME PARAGRAPH_ID ---------- -------------------- ------------ 200002 aaaaa 10000001 Elapsed: 00:00:00.07 Plan hash value: 2750561680 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 51 | 3 (34)| 00:00:01 | |* 1 | VIEW | | 1 | 51 | 3 (34)| 00:00:01 | | 2 | WINDOW SORT | | 1 | 38 | 3 (34)| 00:00:01 | | 3 | TABLE ACCESS FULL| HHHHHH | 1 | 38 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- " 1 - filter(""X"".""CNT""=1)"
改寫成開窗函式以後也是秒出。
<<PostgreSQL技術內幕:查詢最佳化深度探索 >> 這本書是真的不錯,偷偷刷了好幾次,每次看完都有新的理解。