PostgreSQL 原始碼解讀(42)- 查詢語句#27(等價類)

husthxd發表於2018-09-14

上一小節介紹了函式query_planner中子函式子函式build_base_rel_tlists/find_placeholders_in_jointree/find_lateral_references的實現邏輯,本節介紹下一個子函式deconstruct_jointree函式中涉及的數學知識:等價類以及等價類在PG中的應用例項。

一、數學基礎

等價關係
等價關係(equivalence relation)即設R是某個集合A上的一個二元關係。若R滿足以下條件:
1.自反性:∀x∈A,xRx
2.對稱性:∀x,y∈A,xRy ⇒ yRx
3.傳遞性:∀x,y,z∈A,(xRy ∧ yRz) ⇒ xRz
則稱R是一個定義在A上的等價關係。習慣上會把等價關係的符號由R改寫為∼
詳細的說明和例子詳見維基百科

等價類
假設在一個集合A上定義一個等價關係(用∼表示),則A中的某個元素x的等價類就是在A中等價於x的所有元素所形成的子集:
[x] = {y∈A,y∼x}
詳細的說明和例子詳見維基百科

二、應用

PG在函式deconstruct_jointree中對約束條件子句(qual clauses)進行掃描,可能會在非外連線的連線條件中找到等式,如A=B,這時候會建立一個等價類(記作{A,B}).如果在後面發現另一個等式,如B=C,則把C加到已存在的等價類{A,B}中,即{A,B,C}。透過這個步驟,就產生了一些等價類,這些等價類中任何一對成員的等值關係都可以作為約束或者連線的條件.
這樣做的好處,一是可以透過這樣的簡化,只需要驗證部分等值條件,而不需要驗證所有的等值條件,就可以去掉一些多餘的等價類條件約束;二是從相反的方向上考量,最佳化器在準備一個約束或者連線條件列表的時候,檢查每個等價類是否能夠派生新的滿足當前約束或者連線的隱式條件。例如在上面的例子中,可以依據等價類{A,B,C}產生一個A=C的條件,這樣最佳化器就可以嘗試探索新的最佳化連線路徑。

例一:

testdb=# explain verbose select t1.dwbh,t2.grbh
testdb-# from t_dwxx t1,t_grxx t2
testdb-# where t1.dwbh = t2.dwbh and t1.dwbh = '1001';
                               QUERY PLAN                               
------------------------------------------------------------------------
 Nested Loop  (cost=0.00..16.06 rows=2 width=76)
   Output: t1.dwbh, t2.grbh -- 連線,無需執行filter操作
   ->  Seq Scan on public.t_dwxx t1  (cost=0.00..1.04 rows=1 width=38)
         Output: t1.dwmc, t1.dwbh, t1.dwdz
         Filter: ((t1.dwbh)::text = '1001'::text) -- 連線前完成選擇操作
   ->  Seq Scan on public.t_grxx t2  (cost=0.00..15.00 rows=2 width=76)
         Output: t2.dwbh, t2.grbh, t2.xm, t2.xb, t2.nl
         Filter: ((t2.dwbh)::text = '1001'::text) -- 連線前完成選擇操作
(8 rows)

約束條件:t1.dwbh = t2.dwbh and t1.dwbh = '1001',其相應的等價類可以簡略的理解為:{t1.dwbh,t2.dwbh,'1001'},根據傳遞性,那麼可以推匯出隱性約束條件:t2.dwbh='1001',在連線前把謂詞t1.dwbh='1001'和t2.dwbh='1002'下推到基表掃描中,減少連線的元組數量,從而實現查詢最佳化.從查詢計劃來看,PG實際也是這樣處理的.
下面的SQL語句則不具備以上條件,因此在連線過程中還需要執行join filter.

testdb=# explain verbose select t1.dwbh,t2.grbh
testdb-# from t_dwxx t1,t_grxx t2
testdb-# where t1.dwbh = t2.dwbh and t1.dwdz like '廣東省%';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Nested Loop  (cost=0.00..20.04 rows=2 width=76)
   Output: t1.dwbh, t2.grbh
   Join Filter: ((t1.dwbh)::text = (t2.dwbh)::text)
   ->  Seq Scan on public.t_dwxx t1  (cost=0.00..1.04 rows=1 width=38)
         Output: t1.dwmc, t1.dwbh, t1.dwdz
         Filter: ((t1.dwdz)::text ~~ '廣東省%'::text)
   ->  Seq Scan on public.t_grxx t2  (cost=0.00..14.00 rows=400 width=76)
         Output: t2.dwbh, t2.grbh, t2.xm, t2.xb, t2.nl
(8 rows)

例二:
測試指令碼如下:

testdb=# explain verbose select t1.dwbh,t2.grbh
from t_dwxx t1,t_grxx t2
where t1.dwbh = t2.dwbh 
order by t2.dwbh;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Sort  (cost=20.18..20.19 rows=6 width=114)
   Output: t1.dwbh, t2.grbh, t2.dwbh
   Sort Key: t1.dwbh
   ->  Hash Join  (cost=1.07..20.10 rows=6 width=114)
         Output: t1.dwbh, t2.grbh, t2.dwbh
         Inner Unique: true
         Hash Cond: ((t2.dwbh)::text = (t1.dwbh)::text)
         ->  Seq Scan on public.t_grxx t2  (cost=0.00..14.00 rows=400 width=76)
               Output: t2.dwbh, t2.grbh, t2.xm, t2.xb, t2.nl
         ->  Hash  (cost=1.03..1.03 rows=3 width=38)
               Output: t1.dwbh
               ->  Seq Scan on public.t_dwxx t1  (cost=0.00..1.03 rows=3 width=38)
                     Output: t1.dwbh
(13 rows)

從執行計劃來看,雖然明確指定按t2.dwbh進行排序,但實際上按t1.dwbh進行排序.原因是存在等價類{t1.dwbh,t2.dwbh},不管在t1.dwbh還是t2.dwbh上排序,結果都是一樣的,但t1.dwbh上排序,成本更低,因此優先選擇此Key.

值得注意的是:PG的等價類只有在有等式條件(不包括外連線)和排序的情況下才產生,作為最佳化的一個方向可以考慮存在不等式時,把謂詞進行下推.

testdb=# explain verbose select t1.dwbh,t2.grbh
from t_dwxx t1,t_grxx t2
where t1.dwbh = t2.dwbh and t1.dwbh > '1001';
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Nested Loop  (cost=0.00..20.04 rows=2 width=76)
   Output: t1.dwbh, t2.grbh
   Join Filter: ((t1.dwbh)::text = (t2.dwbh)::text) -- 這一步必不可少
   ->  Seq Scan on public.t_dwxx t1  (cost=0.00..1.04 rows=1 width=38)
         Output: t1.dwmc, t1.dwbh, t1.dwdz
         Filter: ((t1.dwbh)::text > '1001'::text) -- Filter過濾
   ->  Seq Scan on public.t_grxx t2  (cost=0.00..14.00 rows=400 width=76)
         Output: t2.dwbh, t2.grbh, t2.xm, t2.xb, t2.nl -- 全表掃描,可以考慮加入Filter
(8 rows)

三、參考資料

等價關係
等價類
關係代數
查詢最佳化基礎

附:query_planner中的程式碼片段

     //...
     /*
      * Examine the targetlist and join tree, adding entries to baserel
      * targetlists for all referenced Vars, and generating PlaceHolderInfo
      * entries for all referenced PlaceHolderVars.  Restrict and join clauses
      * are added to appropriate lists belonging to the mentioned relations. We
      * also build EquivalenceClasses for provably equivalent expressions. The
      * SpecialJoinInfo list is also built to hold information about join order
      * restrictions.  Finally, we form a target joinlist for make_one_rel() to
      * work from.
      */
     build_base_rel_tlists(root, tlist);//構建"base rels"的投影列
 
     find_placeholders_in_jointree(root);//處理jointree中的PHI
 
     find_lateral_references(root);//處理jointree中Lateral依賴
 
     joinlist = deconstruct_jointree(root);//分解jointree

     /*
      * Reconsider any postponed outer-join quals now that we have built up
      * equivalence classes.  (This could result in further additions or
      * mergings of classes.)
      */
     reconsider_outer_join_clauses(root);//已建立等價類,那麼需要重新考慮被下推後處理的外連線表示式
 
     /*
      * If we formed any equivalence classes, generate additional restriction
      * clauses as appropriate.  (Implied join clauses are formed on-the-fly
      * later.)
      */
     generate_base_implied_equalities(root);//等價類構建後,生成因此外加的約束語句
 
     //...

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374869/,如需轉載,請註明出處,否則將追究法律責任。

相關文章