德哥的最佳化思路巨牛逼,這種遞迴思維真的太吊了,我目前就缺遞迴思路。
下面SQL1000W行資料,列的選擇性很低,只有兩個值('1'和'11')都是字串型別,'1'只有一條資料,'11'有9999999行資料。
慢SQL:
select distinct col from tt; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=169247.11..169247.12 rows=1 width=3) (actual time=5082.733..5082.735 rows=2 loops=1) Group Key: col -> Seq Scan on tt (cost=0.00..144247.29 rows=9999929 width=3) (actual time=0.005..275.906 rows=10000000 loops=1) Planning Time: 0.365 ms Execution Time: 5082.772 ms (5 行記錄)
CTE遞迴最佳化:
WITH RECURSIVE t AS ( (SELECT col FROM tt ORDER BY col LIMIT 1) UNION ALL SELECT (SELECT col FROM tt WHERE col > t.col ORDER BY col LIMIT 1) FROM t WHERE t.col IS NOT NULL ) SELECT col FROM t WHERE col IS NOT NULL; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on t (cost=50.84..52.86 rows=100 width=38) (actual time=0.024..0.079 rows=2 loops=1) Filter: (col IS NOT NULL) Rows Removed by Filter: 1 CTE t -> Recursive Union (cost=0.43..50.84 rows=101 width=38) (actual time=0.022..0.076 rows=3 loops=1) -> Limit (cost=0.43..0.46 rows=1 width=3) (actual time=0.021..0.021 rows=1 loops=1) -> Index Only Scan using idx_1_2_tt on tt tt_1 (cost=0.43..260443.37 rows=9999929 width=3) (actual time=0.020..0.020 rows=1 loops=1) Heap Fetches: 0 -> WorkTable Scan on t t_1 (cost=0.00..4.84 rows=10 width=38) (actual time=0.017..0.017 rows=1 loops=3) Filter: (col IS NOT NULL) Rows Removed by Filter: 0 SubPlan 1 -> Limit (cost=0.43..0.46 rows=1 width=3) (actual time=0.024..0.024 rows=0 loops=2) -> Index Only Scan using idx_1_2_tt on tt (cost=0.43..95149.36 rows=3333310 width=3) (actual time=0.024..0.024 rows=0 loops=2) Index Cond: (col > (t_1.col)::text) Heap Fetches: 0 Planning Time: 0.096 ms Execution Time: 0.096 ms (18 行記錄)
裡面的邏輯是:
(SELECT col FROM tt ORDER BY col LIMIT 1)
根節點透過order by 升序 找到最小的一條資料作為起點。
遞迴查詢:
SELECT (SELECT col FROM tt WHERE col > t.col ORDER BY col LIMIT 1) FROM t WHERE t.col IS NOT NULL
在第一次迭代中,CTE t 包含值'1'。這個查詢將在tt表中尋找col大於'1'的最小值。在資料集中,這將是'11'。
在第二次迭代,CTE t 將包含'11'。此時,查詢將嘗試找到大於'11'的最小值,但沒有這樣的值,所以返回NULL。
遞迴結束:
當遞迴查詢返回NULL時,遞迴結束。這時,CTE t 將包含'1'和'11',返回和distinct 一樣邏輯的資料。
理解了整個邏輯後我都嚇尿了,就一道演算法題,確實要跟巨佬學習才行,加深遞迴思維。