PostgreSQL 原始碼解讀(98)- 分割槽表#4(資料查詢路由#1-“擴充套件”分割槽表)

husthxd發表於2018-11-28

在查詢分割槽表的時候PG如何確定查詢的是哪個分割槽?如何確定?相關的機制是什麼?接下來幾個章節將一一介紹,本節是第一部分。

零、實現機制

我們先看下面的例子,兩個普通表t_normal_1和t_normal_2,執行UNION ALL操作:

drop table if exists t_normal_1;
drop table if exists t_normal_2;
create table t_normal_1 (c1 int not null,c2  varchar(40),c3 varchar(40));
create table t_normal_2 (c1 int not null,c2  varchar(40),c3 varchar(40));

insert into t_normal_1(c1,c2,c3) VALUES(0,'HASH0','HAHS0');
insert into t_normal_2(c1,c2,c3) VALUES(0,'HASH0','HAHS0');

testdb=# explain verbose select * from t_normal_1 where c1 = 0
testdb-# union all
testdb-# select * from t_normal_2 where c1 <> 0;
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Append  (cost=0.00..34.00 rows=350 width=200)
   ->  Seq Scan on public.t_normal_1  (cost=0.00..14.38 rows=2 width=200)
         Output: t_normal_1.c1, t_normal_1.c2, t_normal_1.c3
         Filter: (t_normal_1.c1 = 0)
   ->  Seq Scan on public.t_normal_2  (cost=0.00..14.38 rows=348 width=200)
         Output: t_normal_2.c1, t_normal_2.c2, t_normal_2.c3
         Filter: (t_normal_2.c1 <> 0)
(7 rows)

兩張普通表的UNION ALL,PG使用APPEND運算子把t_normal_1順序掃描的結果集和t_normal_2順序掃描的結果集"APPEND"在一起作為最終的結果集輸出.

分割槽表的查詢也是類似的機制,把各個分割槽的結果集APPEND在一起,然後作為最終的結果集輸出,如下例所示:

testdb=# explain verbose select * from t_hash_partition where c1 = 1 OR c1 = 2;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Append  (cost=0.00..30.53 rows=6 width=200)
   ->  Seq Scan on public.t_hash_partition_1  (cost=0.00..15.25 rows=3 width=200)
         Output: t_hash_partition_1.c1, t_hash_partition_1.c2, t_hash_partition_1.c3
         Filter: ((t_hash_partition_1.c1 = 1) OR (t_hash_partition_1.c1 = 2))
   ->  Seq Scan on public.t_hash_partition_3  (cost=0.00..15.25 rows=3 width=200)
         Output: t_hash_partition_3.c1, t_hash_partition_3.c2, t_hash_partition_3.c3
         Filter: ((t_hash_partition_3.c1 = 1) OR (t_hash_partition_3.c1 = 2))
(7 rows)

查詢分割槽表t_hash_partition,條件為c1 = 1 OR c1 = 2,從執行計劃可見是把t_hash_partition_1順序掃描的結果集和t_hash_partition_3順序掃描的結果集"APPEND"在一起作為最終的結果集輸出.

這裡面有幾個問題需要解決:
1.識別分割槽表並找到所有的分割槽子表;
2.根據約束條件識別需要查詢的分割槽,這是出於效能的考慮;
3.對結果集執行APPEND,作為最終結果輸出.
本節介紹了PG如何識別分割槽表並找到所有的分割槽子表,實現的函式是expand_inherited_tables.

一、資料結構

AppendRelInfo
Append-relation資訊.
當我們將可繼承表(分割槽表)或UNION-ALL子查詢展開為“追加關係”(本質上是子RTE的連結串列)時,為每個子RTE構建一個AppendRelInfo。
AppendRelInfos連結串列指示在展開父節點時必須包含哪些子rte,每個節點具有將引用父節點的Vars轉換為引用該子節點的Vars所需的所有資訊。

/*
 * Append-relation info.
 * Append-relation資訊.
 * 
 * When we expand an inheritable table or a UNION-ALL subselect into an
 * "append relation" (essentially, a list of child RTEs), we build an
 * AppendRelInfo for each child RTE.  The list of AppendRelInfos indicates
 * which child RTEs must be included when expanding the parent, and each node
 * carries information needed to translate Vars referencing the parent into
 * Vars referencing that child.
 * 當我們將可繼承表(分割槽表)或UNION-ALL子查詢展開為“追加關係”(本質上是子RTE的連結串列)時,
 *   為每個子RTE構建一個AppendRelInfo。
 * AppendRelInfos連結串列指示在展開父節點時必須包含哪些子rte,
 *   每個節點具有將引用父節點的Vars轉換為引用該子節點的Vars所需的所有資訊。
 * 
 * These structs are kept in the PlannerInfo node's append_rel_list.
 * Note that we just throw all the structs into one list, and scan the
 * whole list when desiring to expand any one parent.  We could have used
 * a more complex data structure (eg, one list per parent), but this would
 * be harder to update during operations such as pulling up subqueries,
 * and not really any easier to scan.  Considering that typical queries
 * will not have many different append parents, it doesn't seem worthwhile
 * to complicate things.
 * 這些結構體儲存在PlannerInfo節點的append_rel_list中。
 * 注意,只是將所有的結構體放入一個連結串列中,並在希望展開任何父類時掃描整個連結串列。
 * 本可以使用更復雜的資料結構(例如,每個父節點一個列表),
 *   但是在提取子查詢之類的操作中更新它會更困難,
 *   而且實際上也不會更容易掃描。
 * 考慮到典型的查詢不會有很多不同的附加項,因此似乎不值得將事情複雜化。
 * 
 * Note: after completion of the planner prep phase, any given RTE is an
 * append parent having entries in append_rel_list if and only if its
 * "inh" flag is set.  We clear "inh" for plain tables that turn out not
 * to have inheritance children, and (in an abuse of the original meaning
 * of the flag) we set "inh" for subquery RTEs that turn out to be
 * flattenable UNION ALL queries.  This lets us avoid useless searches
 * of append_rel_list.
 * 注意:計劃準備階段完成後,
 *   當且僅當它的“inh”標誌已設定時,給定的RTE是一個append parent在append_rel_list中的一個條目。
 * 我們為沒有child的平面表清除“inh”標記,
 *   同時(有濫用標記的嫌疑)為UNION ALL查詢中的子查詢RTEs設定“inh”標記。
 * 這樣可以避免對append_rel_list進行無用的搜尋。
 * 
 * Note: the data structure assumes that append-rel members are single
 * baserels.  This is OK for inheritance, but it prevents us from pulling
 * up a UNION ALL member subquery if it contains a join.  While that could
 * be fixed with a more complex data structure, at present there's not much
 * point because no improvement in the plan could result.
 * 注意:資料結構假定附加的rel成員是獨立的baserels。
 * 這對於繼承來說是可以的,但是如果UNION ALL member子查詢包含一個join,
 *   那麼它將阻止我們提取UNION ALL member子查詢。
 * 雖然可以用更復雜的資料結構解決這個問題,但目前沒有太大意義,因為該計劃可能不會有任何改進。
 */

typedef struct AppendRelInfo
{
    NodeTag     type;

    /*
     * These fields uniquely identify this append relationship.  There can be
     * (in fact, always should be) multiple AppendRelInfos for the same
     * parent_relid, but never more than one per child_relid, since a given
     * RTE cannot be a child of more than one append parent.
     * 這些欄位惟一地標識這個append relationship。
     * 對於同一個parent_relid可以有(實際上應該總是)多個AppendRelInfos,
     *   但是每個child_relid不能有多個AppendRelInfos,
     *   因為給定的RTE不能是多個append parent的子節點。
     */
    Index       parent_relid;   /* parent rel的RT索引;RT index of append parent rel */
    Index       child_relid;    /* child rel的RT索引;RT index of append child rel */

    /*
     * For an inheritance appendrel, the parent and child are both regular
     * relations, and we store their rowtype OIDs here for use in translating
     * whole-row Vars.  For a UNION-ALL appendrel, the parent and child are
     * both subqueries with no named rowtype, and we store InvalidOid here.
     * 對於繼承appendrel,父類和子類都是普通關係,
     *   我們將它們的rowtype OIDs儲存在這裡,用於轉換whole-row Vars。
     * 對於UNION-ALL appendrel,父查詢和子查詢都是沒有指定行型別的子查詢,
     * 我們在這裡儲存InvalidOid。
     */
    Oid         parent_reltype; /* OID of parent's composite type */
    Oid         child_reltype;  /* OID of child's composite type */

    /*
     * The N'th element of this list is a Var or expression representing the
     * child column corresponding to the N'th column of the parent. This is
     * used to translate Vars referencing the parent rel into references to
     * the child.  A list element is NULL if it corresponds to a dropped
     * column of the parent (this is only possible for inheritance cases, not
     * UNION ALL).  The list elements are always simple Vars for inheritance
     * cases, but can be arbitrary expressions in UNION ALL cases.
     * 這個列表的第N個元素是一個Var或表示式,表示與父元素的第N列對應的子列。
     * 這用於將引用parent rel的Vars轉換為對子rel的引用。
     * 如果連結串列元素與父元素的已刪除列相對應,則該元素為NULL
     *   (這隻適用於繼承情況,而不是UNION ALL)。
     * 對於繼承情況,連結串列元素總是簡單的變數,但是可以是UNION ALL情況下的任意表示式。
     *
     * Notice we only store entries for user columns (attno > 0).  Whole-row
     * Vars are special-cased, and system columns (attno < 0) need no special
     * translation since their attnos are the same for all tables.
     * 注意,我們只儲存使用者列的條目(attno > 0)。
     * Whole-row Vars是大小寫敏感的,系統列(attno < 0)不需要特別的轉換,
     *   因為它們的attno對所有表都是相同的。
     *
     * Caution: the Vars have varlevelsup = 0.  Be careful to adjust as needed
     * when copying into a subquery.
     * 注意:Vars的varlevelsup = 0。
     * 在將資料複製到子查詢時,要注意根據需要進行調整。
     */
    //child's Vars中的表示式
    List       *translated_vars;    /* Expressions in the child's Vars */

    /*
     * We store the parent table's OID here for inheritance, or InvalidOid for
     * UNION ALL.  This is only needed to help in generating error messages if
     * an attempt is made to reference a dropped parent column.
     * 我們將父表的OID儲存在這裡用於繼承,
     *   如為UNION ALL,則這裡儲存的是InvalidOid。
     * 只有在試圖引用已刪除的父列時,才需要這樣做來幫助生成錯誤訊息。
     */
    Oid         parent_reloid;  /* OID of parent relation */
} AppendRelInfo;

PlannerInfo
該資料結構用於儲存查詢語句在規劃/最佳化過程中的相關資訊

/*----------
 * PlannerInfo
 *      Per-query information for planning/optimization
 *      用於規劃/最佳化的每個查詢資訊
 * 
 * This struct is conventionally called "root" in all the planner routines.
 * It holds links to all of the planner's working state, in addition to the
 * original Query.  Note that at present the planner extensively modifies
 * the passed-in Query data structure; someday that should stop.
 * 在所有計劃程式例程中,這個結構通常稱為“root”。
 * 除了原始查詢之外,它還儲存到所有計劃器工作狀態的連結。
 * 注意,目前計劃器會毫無節制的修改傳入的查詢資料結構,相信總有一天這種情況會停止的。
 *----------
 */
struct AppendRelInfo;

typedef struct PlannerInfo
{
    NodeTag     type;//Node標識
    //查詢樹
    Query      *parse;          /* the Query being planned */
    //當前的planner全域性資訊
    PlannerGlobal *glob;        /* global info for current planner run */
    //查詢層次,1標識最高層
    Index       query_level;    /* 1 at the outermost Query */
    // 如為子計劃,則這裡儲存父計劃器指標,NULL標識最高層
    struct PlannerInfo *parent_root;    /* NULL at outermost Query */

    /*
     * plan_params contains the expressions that this query level needs to
     * make available to a lower query level that is currently being planned.
     * outer_params contains the paramIds of PARAM_EXEC Params that outer
     * query levels will make available to this query level.
     * plan_params包含該查詢級別需要提供給當前計劃的較低查詢級別的表示式。
     * outer_params包含PARAM_EXEC Params的引數,外部查詢級別將使該查詢級別可用這些引數。
     */
    List       *plan_params;    /* list of PlannerParamItems, see below */
    Bitmapset  *outer_params;

    /*
     * simple_rel_array holds pointers to "base rels" and "other rels" (see
     * comments for RelOptInfo for more info).  It is indexed by rangetable
     * index (so entry 0 is always wasted).  Entries can be NULL when an RTE
     * does not correspond to a base relation, such as a join RTE or an
     * unreferenced view RTE; or if the RelOptInfo hasn't been made yet.
     * simple_rel_array儲存指向“base rels”和“other rels”的指標
     * (有關RelOptInfo的更多資訊,請參見注釋)。
     * 它由可範圍表索引建立索引(因此條目0總是被浪費)。
     * 當RTE與基本關係(如JOIN RTE或未被引用的檢視RTE時)不相對應
     *   或者如果RelOptInfo還沒有生成,條目可以為NULL。
     */
    //RelOptInfo陣列,儲存"base rels",比如基表/子查詢等.
    //該陣列與RTE的順序一一對應,而且是從1開始,因此[0]無用 */
    struct RelOptInfo **simple_rel_array;   /* All 1-rel RelOptInfos */
    int         simple_rel_array_size;  /* 陣列大小,allocated size of array */

    /*
     * simple_rte_array is the same length as simple_rel_array and holds
     * pointers to the associated rangetable entries.  This lets us avoid
     * rt_fetch(), which can be a bit slow once large inheritance sets have
     * been expanded.
     * simple_rte_array的長度與simple_rel_array相同,
     *   並儲存指向相應範圍表條目的指標。
     * 這使我們可以避免執行rt_fetch(),因為一旦擴充套件了大型繼承集,rt_fetch()可能會有點慢。
     */
    //RTE陣列
    RangeTblEntry **simple_rte_array;   /* rangetable as an array */

    /*
     * append_rel_array is the same length as the above arrays, and holds
     * pointers to the corresponding AppendRelInfo entry indexed by
     * child_relid, or NULL if none.  The array itself is not allocated if
     * append_rel_list is empty.
     * append_rel_array與上述陣列的長度相同,
     *   並儲存指向對應的AppendRelInfo條目的指標,該條目由child_relid索引,
     *   如果沒有索引則為NULL。
     * 如果append_rel_list為空,則不分配陣列本身。
     */
    //處理集合操作如UNION ALL時使用和分割槽表時使用
    struct AppendRelInfo **append_rel_array;

    /*
     * all_baserels is a Relids set of all base relids (but not "other"
     * relids) in the query; that is, the Relids identifier of the final join
     * we need to form.  This is computed in make_one_rel, just before we
     * start making Paths.
     * all_baserels是查詢中所有base relids(但不是“other” relids)的一個Relids集合;
     *   也就是說,這是需要形成的最終連線的Relids識別符號。
     * 這是在開始建立路徑之前在make_one_rel中計算的。
     */
    Relids      all_baserels;//"base rels"

    /*
     * nullable_baserels is a Relids set of base relids that are nullable by
     * some outer join in the jointree; these are rels that are potentially
     * nullable below the WHERE clause, SELECT targetlist, etc.  This is
     * computed in deconstruct_jointree.
     * nullable_baserels是由jointree中的某些外連線中值可為空的base Relids集合;
     *   這些是在WHERE子句、SELECT targetlist等下面可能為空的樹。
     * 這是在deconstruct_jointree中處理獲得的。
     */
    //Nullable-side端的"base rels"
    Relids      nullable_baserels;

    /*
     * join_rel_list is a list of all join-relation RelOptInfos we have
     * considered in this planning run.  For small problems we just scan the
     * list to do lookups, but when there are many join relations we build a
     * hash table for faster lookups.  The hash table is present and valid
     * when join_rel_hash is not NULL.  Note that we still maintain the list
     * even when using the hash table for lookups; this simplifies life for
     * GEQO.
     * join_rel_list是在計劃執行中考慮的所有連線關係RelOptInfos的連結串列。
     * 對於小問題,只需要掃描連結串列執行查詢,但是當存在許多連線關係時,
     *    需要構建一個雜湊表來進行更快的查詢。
     * 當join_rel_hash不為空時,雜湊表是有效可用於查詢的。
     * 注意,即使在使用雜湊表進行查詢時,仍然維護該連結串列;這簡化了GEQO(遺傳演算法)的生命週期。
     */
    //參與連線的Relation的RelOptInfo連結串列
    List       *join_rel_list;  /* list of join-relation RelOptInfos */
    //可加快連結串列訪問的hash表
    struct HTAB *join_rel_hash; /* optional hashtable for join relations */

    /*
     * When doing a dynamic-programming-style join search, join_rel_level[k]
     * is a list of all join-relation RelOptInfos of level k, and
     * join_cur_level is the current level.  New join-relation RelOptInfos are
     * automatically added to the join_rel_level[join_cur_level] list.
     * join_rel_level is NULL if not in use.
     * 在執行動態規劃演算法的連線搜尋時,join_rel_level[k]是k級的所有連線關係RelOptInfos的列表,
     * join_cur_level是當前級別。
     * 新的連線關係RelOptInfos會自動新增到join_rel_level[join_cur_level]連結串列中。
     * 如果不使用join_rel_level,則為NULL。
     */
    //RelOptInfo指標連結串列陣列,k層的join儲存在[k]中
    List      **join_rel_level; /* lists of join-relation RelOptInfos */
    //當前的join層次
    int         join_cur_level; /* index of list being extended */
    //查詢的初始化計劃連結串列
    List       *init_plans;     /* init SubPlans for query */
    //CTE子計劃ID連結串列
    List       *cte_plan_ids;   /* per-CTE-item list of subplan IDs */
    //MULTIEXPR子查詢輸出的引數連結串列的連結串列
    List       *multiexpr_params;   /* List of Lists of Params for MULTIEXPR
                                     * subquery outputs */
    //活動的等價類連結串列
    List       *eq_classes;     /* list of active EquivalenceClasses */
    //規範化的PathKey連結串列
    List       *canon_pathkeys; /* list of "canonical" PathKeys */
    //外連線約束條件連結串列(左)
    List       *left_join_clauses;  /* list of RestrictInfos for mergejoinable
                                     * outer join clauses w/nonnullable var on
                                     * left */
    //外連線約束條件連結串列(右)
    List       *right_join_clauses; /* list of RestrictInfos for mergejoinable
                                     * outer join clauses w/nonnullable var on
                                     * right */
    //全連線約束條件連結串列
    List       *full_join_clauses;  /* list of RestrictInfos for mergejoinable
                                     * full join clauses */
    //特殊連線資訊連結串列
    List       *join_info_list; /* list of SpecialJoinInfos */
    //AppendRelInfo連結串列
    List       *append_rel_list;    /* list of AppendRelInfos */
    //PlanRowMarks連結串列
    List       *rowMarks;       /* list of PlanRowMarks */
    //PHI連結串列
    List       *placeholder_list;   /* list of PlaceHolderInfos */
    // 外來鍵資訊連結串列
    List       *fkey_list;      /* list of ForeignKeyOptInfos */
    //query_planner()要求的PathKeys連結串列
    List       *query_pathkeys; /* desired pathkeys for query_planner() */
    //分組子句路徑鍵
    List       *group_pathkeys; /* groupClause pathkeys, if any */
    //視窗函式路徑鍵
    List       *window_pathkeys;    /* pathkeys of bottom window, if any */
    //distinctClause路徑鍵
    List       *distinct_pathkeys;  /* distinctClause pathkeys, if any */
    //排序路徑鍵
    List       *sort_pathkeys;  /* sortClause pathkeys, if any */
    //已規範化的分割槽Schema
    List       *part_schemes;   /* Canonicalised partition schemes used in the
                                 * query. */
    //嘗試連線的RelOptInfo連結串列
    List       *initial_rels;   /* RelOptInfos we are now trying to join */

    /* Use fetch_upper_rel() to get any particular upper rel */
    //上層的RelOptInfo連結串列
    List       *upper_rels[UPPERREL_FINAL + 1]; /*  upper-rel RelOptInfos */

    /* Result tlists chosen by grouping_planner for upper-stage processing */
    //grouping_planner為上層處理選擇的結果tlists
    struct PathTarget *upper_targets[UPPERREL_FINAL + 1];//

    /*
     * grouping_planner passes back its final processed targetlist here, for
     * use in relabeling the topmost tlist of the finished Plan.
     * grouping_planner在這裡傳回它最終處理過的targetlist,用於重新標記已完成計劃的最頂層tlist。
     */
    ////最後需處理的投影列
    List       *processed_tlist;

    /* Fields filled during create_plan() for use in setrefs.c */
    //setrefs.c中在create_plan()函式呼叫期間填充的欄位
    //分組函式屬性對映
    AttrNumber *grouping_map;   /* for GroupingFunc fixup */
    //MinMaxAggInfos連結串列
    List       *minmax_aggs;    /* List of MinMaxAggInfos */
    //記憶體上下文
    MemoryContext planner_cxt;  /* context holding PlannerInfo */
    //關係的page計數
    double      total_table_pages;  /* # of pages in all tables of query */
    //query_planner輸入引數:元組處理比例
    double      tuple_fraction; /* tuple_fraction passed to query_planner */
    //query_planner輸入引數:limit_tuple
    double      limit_tuples;   /* limit_tuples passed to query_planner */
    //表示式的最小安全等級
    Index       qual_security_level;    /* minimum security_level for quals */
    /* Note: qual_security_level is zero if there are no securityQuals */
    //注意:如果沒有securityQuals, 則qual_security_level是NULL(0)

    //如目標relation是分割槽表的child/partition/分割槽表,則透過此欄位標記
    InheritanceKind inhTargetKind;  /* indicates if the target relation is an
                                     * inheritance child or partition or a
                                     * partitioned table */
    //是否存在RTE_JOIN的RTE
    bool        hasJoinRTEs;    /* true if any RTEs are RTE_JOIN kind */
    //是否存在標記為LATERAL的RTE
    bool        hasLateralRTEs; /* true if any RTEs are marked LATERAL */
    //是否存在已在jointree刪除的RTE
    bool        hasDeletedRTEs; /* true if any RTE was deleted from jointree */
    //是否存在Having子句
    bool        hasHavingQual;  /* true if havingQual was non-null */
    //如約束條件中存在pseudoconstant = true,則此欄位為T
    bool        hasPseudoConstantQuals; /* true if any RestrictInfo has
                                         * pseudoconstant = true */
    //是否存在遞迴語句
    bool        hasRecursion;   /* true if planning a recursive WITH item */

    /* These fields are used only when hasRecursion is true: */
    //這些欄位僅在hasRecursion為T時使用:
    //工作表的PARAM_EXEC ID
    int         wt_param_id;    /* PARAM_EXEC ID for the work table */
    //非遞迴模式的訪問路徑
    struct Path *non_recursive_path;    /* a path for non-recursive term */

    /* These fields are workspace for createplan.c */
    //這些欄位用於createplan.c
    //當前節點之上的外部rels
    Relids      curOuterRels;   /* outer rels above current node */
    //未賦值的NestLoopParams引數
    List       *curOuterParams; /* not-yet-assigned NestLoopParams */

    /* optional private data for join_search_hook, e.g., GEQO */
    //可選的join_search_hook私有資料,例如GEQO
    void       *join_search_private;

    /* Does this query modify any partition key columns? */
    //該查詢是否更新分割槽鍵列?
    bool        partColsUpdated;
} PlannerInfo;

二、原始碼解讀

expand_inherited_tables函式將表示繼承集合的每個範圍表條目展開為“append relation”。

/*
 * expand_inherited_tables
 *      Expand each rangetable entry that represents an inheritance set
 *      into an "append relation".  At the conclusion of this process,
 *      the "inh" flag is set in all and only those RTEs that are append
 *      relation parents.
 *      將表示繼承集合的每個範圍表條目展開為“append relation”。
 *      在這個過程結束時,“inh”標誌被設定在所有且只有那些作為append
 *      relation parents的RTEs中。
 */
void
expand_inherited_tables(PlannerInfo *root)
{
    Index       nrtes;
    Index       rti;
    ListCell   *rl;

    /*
     * expand_inherited_rtentry may add RTEs to parse->rtable. The function is
     * expected to recursively handle any RTEs that it creates with inh=true.
     * So just scan as far as the original end of the rtable list.
     * expand_inherited_rtentry可以新增RTEs到parse->rtable中。
     * 這個函式被期望遞迴地處理它用inh = true建立的所有RTEs。
     * 所以只要掃描到rtable連結串列最開始的末尾即可。
     */
    nrtes = list_length(root->parse->rtable);
    rl = list_head(root->parse->rtable);
    for (rti = 1; rti <= nrtes; rti++)
    {
        RangeTblEntry *rte = (RangeTblEntry *) lfirst(rl);

        expand_inherited_rtentry(root, rte, rti);
        rl = lnext(rl);
    }
}

/*
 * expand_inherited_rtentry
 *      Check whether a rangetable entry represents an inheritance set.
 *      If so, add entries for all the child tables to the query's
 *      rangetable, and build AppendRelInfo nodes for all the child tables
 *      and add them to root->append_rel_list.  If not, clear the entry's
 *      "inh" flag to prevent later code from looking for AppendRelInfos.
 *      檢查範圍表條目是否表示繼承集合。
 *      如是,將所有子表的條目新增到查詢的範圍表中,
 *        併為所有子表構建AppendRelInfo節點,並將它們新增到root->append_rel_list。
 *      如沒有,清除條目的“inh”標誌,以防止以後的程式碼尋找AppendRelInfos。
 *
 * Note that the original RTE is considered to represent the whole
 * inheritance set.  The first of the generated RTEs is an RTE for the same
 * table, but with inh = false, to represent the parent table in its role
 * as a simple member of the inheritance set.
 * 注意,原始的RTEs被認為代表了整個繼承集合。
 * 生成的第一個RTE是同一個表的RTE,但inh = false表示父表作為繼承集的一個簡單成員的角色。
 *
 * A childless table is never considered to be an inheritance set. For
 * regular inheritance, a parent RTE must always have at least two associated
 * AppendRelInfos: one corresponding to the parent table as a simple member of
 * inheritance set and one or more corresponding to the actual children.
 * Since a partitioned table is not scanned, it might have only one associated
 * AppendRelInfo.
 * 無子表的關係永遠不會被認為是繼承集合。
 * 對於常規繼承,父RTE必須始終至少有兩個相關的AppendRelInfos:
 *   一個作為繼承集的簡單成員與父表相對應,
 *   另一個或多個與實際的子表相對應。
 * 因為沒有掃描分割槽表,所以它可能只有一個關聯的AppendRelInfo。
 */
static void
expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
{
    Oid         parentOID;
    PlanRowMark *oldrc;
    Relation    oldrelation;
    LOCKMODE    lockmode;
    List       *inhOIDs;
    ListCell   *l;

    /* Does RT entry allow inheritance? */
    //是否分割槽表?
    if (!rte->inh)
        return;
    /* Ignore any already-expanded UNION ALL nodes */
    //忽略所有已擴充套件的UNION ALL節點
    if (rte->rtekind != RTE_RELATION)
    {
        Assert(rte->rtekind == RTE_SUBQUERY);
        return;//返回
    }
    /* Fast path for common case of childless table */
    //對於常規的無子表的關係,快速判斷
    parentOID = rte->relid;
    if (!has_subclass(parentOID))
    {
        /* Clear flag before returning */
        //無子表,設定標記並返回
        rte->inh = false;
        return;
    }

    /*
     * The rewriter should already have obtained an appropriate lock on each
     * relation named in the query.  However, for each child relation we add
     * to the query, we must obtain an appropriate lock, because this will be
     * the first use of those relations in the parse/rewrite/plan pipeline.
     * Child rels should use the same lockmode as their parent.
     * 查詢rewriter程式應該已經在查詢中命名的每個關係上獲得了適當的鎖。
     * 但是,對於新增到查詢中的每個子關係,必須獲得適當的鎖,
     *   因為這將是解析/重寫/計劃過程中這些關係的第一次使用。
     * 子樹應該使用與父樹相同的鎖模式。
     */
    lockmode = rte->rellockmode;

    /* Scan for all members of inheritance set, acquire needed locks */
    //掃描繼承集的所有成員,獲取所需的鎖
    inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);

    /*
     * Check that there's at least one descendant, else treat as no-child
     * case.  This could happen despite above has_subclass() check, if table
     * once had a child but no longer does.
     * 檢查是否至少有一個後代,否則視為無子女情況。
     * 儘管上面有has_subclass()檢查,但如果table曾經有一個子元素,
     *   但現在不再有了,則可能發生這種情況。
     */
    if (list_length(inhOIDs) < 2)
    {
        /* Clear flag before returning */
        //清除標記,返回
        rte->inh = false;
        return;
    }

    /*
     * If parent relation is selected FOR UPDATE/SHARE, we need to mark its
     * PlanRowMark as isParent = true, and generate a new PlanRowMark for each
     * child.
     * 如果父關係是 selected FOR UPDATE/SHARE,
     *   則需要將其PlanRowMark標記為isParent = true,
     *   併為每個子關係生成一個新的PlanRowMark。
     */
    oldrc = get_plan_rowmark(root->rowMarks, rti);
    if (oldrc)
        oldrc->isParent = true;

    /*
     * Must open the parent relation to examine its tupdesc.  We need not lock
     * it; we assume the rewriter already did.
     * 必須開啟父關係以檢查其tupdesc。
     * 不需要鎖定,我們假設查詢重寫已經這麼做了。
     */
    oldrelation = heap_open(parentOID, NoLock);

    /* Scan the inheritance set and expand it */
    //掃描繼承集合並擴充套件之
    if (RelationGetPartitionDesc(oldrelation) != NULL)//
    {
        Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);

        /*
         * If this table has partitions, recursively expand them in the order
         * in which they appear in the PartitionDesc.  While at it, also
         * extract the partition key columns of all the partitioned tables.
         * 如果這個表有分割槽,則按分割槽在PartitionDesc中出現的順序遞迴展開它們。
         * 同時,還提取所有分割槽表的分割槽鍵列。
         */
        expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
                                   lockmode, &root->append_rel_list);
    }
    else
    {
        //分割槽描述符獲取不成功(沒有分割槽資訊)
        List       *appinfos = NIL;
        RangeTblEntry *childrte;
        Index       childRTindex;

        /*
         * This table has no partitions.  Expand any plain inheritance
         * children in the order the OIDs were returned by
         * find_all_inheritors.
         * 這個表沒有分割槽。
         * 按find_all_inheritors返回的OIDs的順序展開所有普通繼承子元素。
         */
        foreach(l, inhOIDs)//遍歷OIDs
        {
            Oid         childOID = lfirst_oid(l);
            Relation    newrelation;

            /* Open rel if needed; we already have required locks */
            //如有需要,開啟rel(已獲得鎖)
            if (childOID != parentOID)
                newrelation = heap_open(childOID, NoLock);
            else
                newrelation = oldrelation;

            /*
             * It is possible that the parent table has children that are temp
             * tables of other backends.  We cannot safely access such tables
             * (because of buffering issues), and the best thing to do seems
             * to be to silently ignore them.
             * 父表的子表可能是其他後臺的臨時表。
             * 我們不能安全地訪問這些表(因為存在緩衝問題),最好的辦法似乎是悄悄地忽略它們。
             */
            if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation))
            {
                heap_close(newrelation, lockmode);//忽略它們
                continue;
            }

            expand_single_inheritance_child(root, rte, rti, oldrelation, oldrc,
                                            newrelation,
                                            &appinfos, &childrte,
                                            &childRTindex);//展開

            /* Close child relations, but keep locks */
            //關閉子表,但仍持有鎖
            if (childOID != parentOID)
                heap_close(newrelation, NoLock);
        }

        /*
         * If all the children were temp tables, pretend it's a
         * non-inheritance situation; we don't need Append node in that case.
         * The duplicate RTE we added for the parent table is harmless, so we
         * don't bother to get rid of it; ditto for the useless PlanRowMark
         * node.
         * 如果所有的子表都是臨時表,則假設這是非繼承情況;
         *   在這種情況下,不需要APPEND NODE。
         * 我們為父表新增重複的RTE是無關緊要的,
         *   因此我們不必費心刪除它;無用的PlanRowMark節點也是如此。
         */
        if (list_length(appinfos) < 2)
            rte->inh = false;//設定標記
        else
            root->append_rel_list = list_concat(root->append_rel_list,
                                                appinfos);//新增到連結串列中

    }

    heap_close(oldrelation, NoLock);//關閉relation
}

/*
 * expand_partitioned_rtentry
 *      Recursively expand an RTE for a partitioned table.
 *      遞迴擴充套件分割槽表RTE
 */
static void
expand_partitioned_rtentry(PlannerInfo *root, RangeTblEntry *parentrte,
                           Index parentRTindex, Relation parentrel,
                           PlanRowMark *top_parentrc, LOCKMODE lockmode,
                           List **appinfos)
{
    int         i;
    RangeTblEntry *childrte;
    Index       childRTindex;
    PartitionDesc partdesc = RelationGetPartitionDesc(parentrel);

    check_stack_depth();

    /* A partitioned table should always have a partition descriptor. */
    //分配表通常應具備分割槽描述符
    Assert(partdesc);

    Assert(parentrte->inh);

    /*
     * Note down whether any partition key cols are being updated. Though it's
     * the root partitioned table's updatedCols we are interested in, we
     * instead use parentrte to get the updatedCols. This is convenient
     * because parentrte already has the root partrel's updatedCols translated
     * to match the attribute ordering of parentrel.
     * 請注意是否正在更新分割槽鍵cols。
     * 雖然感興趣的是根分割槽表的updatedCols,但是使用parentrte來獲取updatedCols。
     * 這很方便,因為parentrte已經將root partrel的updatedCols轉換為匹配parentrel的屬性順序。
     */
    if (!root->partColsUpdated)
        root->partColsUpdated =
            has_partition_attrs(parentrel, parentrte->updatedCols, NULL);

    /* First expand the partitioned table itself. */
    //
    expand_single_inheritance_child(root, parentrte, parentRTindex, parentrel,
                                    top_parentrc, parentrel,
                                    appinfos, &childrte, &childRTindex);

    /*
     * If the partitioned table has no partitions, treat this as the
     * non-inheritance case.
     * 如果分割槽表沒有分割槽,則將其視為非繼承情況。
     */
    if (partdesc->nparts == 0)
    {
        parentrte->inh = false;
        return;
    }

    for (i = 0; i < partdesc->nparts; i++)
    {
        Oid         childOID = partdesc->oids[i];
        Relation    childrel;

        /* Open rel; we already have required locks */
        //開啟rel
        childrel = heap_open(childOID, NoLock);

        /*
         * Temporary partitions belonging to other sessions should have been
         * disallowed at definition, but for paranoia's sake, let's double
         * check.
         * 屬於其他會話的臨時分割槽在定義時應該是不允許的,但是出於偏執狂的考慮,再檢查一下。
         */
        if (RELATION_IS_OTHER_TEMP(childrel))
            elog(ERROR, "temporary relation from another session found as partition");
        //擴充套件之
        expand_single_inheritance_child(root, parentrte, parentRTindex,
                                        parentrel, top_parentrc, childrel,
                                        appinfos, &childrte, &childRTindex);

        /* If this child is itself partitioned, recurse */
        //子關係是分割槽表,遞迴擴充套件
        if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
            expand_partitioned_rtentry(root, childrte, childRTindex,
                                       childrel, top_parentrc, lockmode,
                                       appinfos);

        /* Close child relation, but keep locks */
        //關閉子關係,但仍持有鎖
        heap_close(childrel, NoLock);
    }
}


 /* expand_single_inheritance_child
 *      Build a RangeTblEntry and an AppendRelInfo, if appropriate, plus
 *      maybe a PlanRowMark.
 *      構建一個RangeTblEntry和一個AppendRelInfo,如果合適的話,再加上一個PlanRowMark。
 *
 * We now expand the partition hierarchy level by level, creating a
 * corresponding hierarchy of AppendRelInfos and RelOptInfos, where each
 * partitioned descendant acts as a parent of its immediate partitions.
 * (This is a difference from what older versions of PostgreSQL did and what
 * is still done in the case of table inheritance for unpartitioned tables,
 * where the hierarchy is flattened during RTE expansion.)
 * 現在我們逐層擴充套件分割槽層次結構,建立一個對應的AppendRelInfos和RelOptInfos層次結構,
 *   其中每個分割槽的後代充當其直接分割槽的父級。
 * (在未分割槽表的表繼承中,
 *    層次結構在RTE擴充套件期間被扁平化,這與老版本的PostgreSQL有所不同。)
 *
 * PlanRowMarks still carry the top-parent's RTI, and the top-parent's
 * allMarkTypes field still accumulates values from all descendents.
 * PlanRowMarks仍然具有頂級父類的RTI資訊,
 *   而頂級父類的allMarkTypes欄位仍然從所有子類累積。
 * 
 * "parentrte" and "parentRTindex" are immediate parent's RTE and
 * RTI. "top_parentrc" is top parent's PlanRowMark.
 * “parentrte”和“parentRTindex”是直接父級的RTE和RTI。
 * “top_parentrc”是top父類的PlanRowMark。
 *
 * The child RangeTblEntry and its RTI are returned in "childrte_p" and
 * "childRTindex_p" resp.
 * 子RTE及其RTI在“childrte_p”和“childRTindex_p”resp中返回。
 */
static void
expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
                                Index parentRTindex, Relation parentrel,
                                PlanRowMark *top_parentrc, Relation childrel,
                                List **appinfos, RangeTblEntry **childrte_p,
                                Index *childRTindex_p)
{
    Query      *parse = root->parse;
    Oid         parentOID = RelationGetRelid(parentrel);//父關係
    Oid         childOID = RelationGetRelid(childrel);//子關係
    RangeTblEntry *childrte;
    Index       childRTindex;
    AppendRelInfo *appinfo;

    /*
     * Build an RTE for the child, and attach to query's rangetable list. We
     * copy most fields of the parent's RTE, but replace relation OID and
     * relkind, and set inh = false.  Also, set requiredPerms to zero since
     * all required permissions checks are done on the original RTE. Likewise,
     * set the child's securityQuals to empty, because we only want to apply
     * the parent's RLS conditions regardless of what RLS properties
     * individual children may have.  (This is an intentional choice to make
     * inherited RLS work like regular permissions checks.) The parent
     * securityQuals will be propagated to children along with other base
     * restriction clauses, so we don't need to do it here.
     * 為子元素構建一個RTE,並附加到query的範圍錶連結串列中。
     * 我們複製父RTE的大部分欄位,但是替換關係OID和relkind,並設定inh = false。
     * 另外,將requiredPerms設定為0,因為所有需要的許可權檢查都是在原始RTE上完成的。
     * 同樣,將子元素securityQuals設定為空,因為只想應用父元素的RLS條件,
     *   而不管每個子元素可能具有什麼RLS屬性。
     *   (這是一種有意的選擇,目的是讓繼承的RLS像常規許可權檢查一樣工作。)
     * 父安全條件quals將與其他基本限制條款一起傳播到子級,因此不需要在這裡這樣做。
     */
    childrte = copyObject(parentrte);
    *childrte_p = childrte;
    childrte->relid = childOID;
    childrte->relkind = childrel->rd_rel->relkind;
    /* A partitioned child will need to be expanded further. */
    //分割槽表的子關係會在"將來"擴充套件
    if (childOID != parentOID &&
        childrte->relkind == RELKIND_PARTITIONED_TABLE)
        childrte->inh = true;
    else
        childrte->inh = false;
    childrte->requiredPerms = 0;
    childrte->securityQuals = NIL;
    parse->rtable = lappend(parse->rtable, childrte);
    childRTindex = list_length(parse->rtable);
    *childRTindex_p = childRTindex;

    /*
     * We need an AppendRelInfo if paths will be built for the child RTE. If
     * childrte->inh is true, then we'll always need to generate append paths
     * for it.  If childrte->inh is false, we must scan it if it's not a
     * partitioned table; but if it is a partitioned table, then it never has
     * any data of its own and need not be scanned.
     * 如果要為子RTE構建路徑,則需要一個AppendRelInfo。
     * 如果children ->inh為真,那麼我們總是需要為它生成APPEND訪問路徑。
     * 如果children ->inh為假,則必須掃描它,如果它不是分割槽表;
     *   但是如果它是一個分割槽表,那麼它永遠不會有任何自己的資料,也不需要掃描。
     */
    if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh)
    {
        appinfo = makeNode(AppendRelInfo);
        appinfo->parent_relid = parentRTindex;
        appinfo->child_relid = childRTindex;
        appinfo->parent_reltype = parentrel->rd_rel->reltype;
        appinfo->child_reltype = childrel->rd_rel->reltype;
        make_inh_translation_list(parentrel, childrel, childRTindex,
                                  &appinfo->translated_vars);
        appinfo->parent_reloid = parentOID;
        *appinfos = lappend(*appinfos, appinfo);

        /*
         * Translate the column permissions bitmaps to the child's attnums (we
         * have to build the translated_vars list before we can do this). But
         * if this is the parent table, leave copyObject's result alone.
         * 將列許可權點陣圖轉換為子節點的attnums(在此之前必須構建translated_vars列表)。
         * 但是,如果這是父表,則不要理會copyObject的結果。
         *
         * Note: we need to do this even though the executor won't run any
         * permissions checks on the child RTE.  The insertedCols/updatedCols
         * bitmaps may be examined for trigger-firing purposes.
         * 注意:即使執行程式不會在子RTE上執行任何許可權檢查,我們也需要這樣做。
         * 可以檢查插入的tedcols /updatedCols點陣圖是否具有觸發目的。
         */
        if (childOID != parentOID)
        {
            childrte->selectedCols = translate_col_privs(parentrte->selectedCols,
                                                         appinfo->translated_vars);
            childrte->insertedCols = translate_col_privs(parentrte->insertedCols,
                                                         appinfo->translated_vars);
            childrte->updatedCols = translate_col_privs(parentrte->updatedCols,
                                                        appinfo->translated_vars);
        }
    }

    /*
     * Build a PlanRowMark if parent is marked FOR UPDATE/SHARE.
     * 如父關係標記為FOR UPDATE/SHARE,則建立PlanRowMark
     */
    if (top_parentrc)
    {
        PlanRowMark *childrc = makeNode(PlanRowMark);

        childrc->rti = childRTindex;
        childrc->prti = top_parentrc->rti;
        childrc->rowmarkId = top_parentrc->rowmarkId;
        /* Reselect rowmark type, because relkind might not match parent */
        //重新選擇rowmark型別,因為relkind可能與父類不匹配
        childrc->markType = select_rowmark_type(childrte,
                                                top_parentrc->strength);
        childrc->allMarkTypes = (1 << childrc->markType);
        childrc->strength = top_parentrc->strength;
        childrc->waitPolicy = top_parentrc->waitPolicy;

        /*
         * We mark RowMarks for partitioned child tables as parent RowMarks so
         * that the executor ignores them (except their existence means that
         * the child tables be locked using appropriate mode).
         * 我們將分割槽的子表的RowMarks標記為父RowMarks,
         *   以便執行程式忽略它們(除非它們的存在意味著子表使用適當的模式被鎖定)。
         */
        childrc->isParent = (childrte->relkind == RELKIND_PARTITIONED_TABLE);

        /* Include child's rowmark type in top parent's allMarkTypes */
        //在父類的allMarkTypes中包含子類的rowmark型別
        top_parentrc->allMarkTypes |= childrc->allMarkTypes;

        root->rowMarks = lappend(root->rowMarks, childrc);
    }
}

三、跟蹤分析

測試指令碼如下

testdb=# explain verbose select * from t_hash_partition where c1 = 1 OR c1 = 2;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Append  (cost=0.00..30.53 rows=6 width=200)
   ->  Seq Scan on public.t_hash_partition_1  (cost=0.00..15.25 rows=3 width=200)
         Output: t_hash_partition_1.c1, t_hash_partition_1.c2, t_hash_partition_1.c3
         Filter: ((t_hash_partition_1.c1 = 1) OR (t_hash_partition_1.c1 = 2))
   ->  Seq Scan on public.t_hash_partition_3  (cost=0.00..15.25 rows=3 width=200)
         Output: t_hash_partition_3.c1, t_hash_partition_3.c2, t_hash_partition_3.c3
         Filter: ((t_hash_partition_3.c1 = 1) OR (t_hash_partition_3.c1 = 2))
(7 rows)

啟動gdb,設定斷點

(gdb) b expand_inherited_tables
Breakpoint 1 at 0x7e53ba: file prepunion.c, line 1483.
(gdb) c
Continuing.

Breakpoint 1, expand_inherited_tables (root=0x28fcdc8) at prepunion.c:1483
1483        nrtes = list_length(root->parse->rtable);

獲取RTE的個數和連結串列元素

(gdb) n
1484        rl = list_head(root->parse->rtable);
(gdb) 
1485        for (rti = 1; rti <= nrtes; rti++)
(gdb) p nrtes
$1 = 1
(gdb) p *rl
$2 = {data = {ptr_value = 0x28d83d0, int_value = 42828752, oid_value = 42828752}, next = 0x0}
(gdb) 

迴圈處理RTE

(gdb) n
1487            RangeTblEntry *rte = (RangeTblEntry *) lfirst(rl);
(gdb) 
1489            expand_inherited_rtentry(root, rte, rti);
(gdb) p *rte
$3 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relid = 16986, relkind = 112 'p', tablesample = 0x0, subquery = 0x0, 
  security_barrier = false, jointype = JOIN_INNER, joinaliasvars = 0x0, functions = 0x0, funcordinality = false, 
  tablefunc = 0x0, values_lists = 0x0, ctename = 0x0, ctelevelsup = 0, self_reference = false, coltypes = 0x0, 
  coltypmods = 0x0, colcollations = 0x0, enrname = 0x0, enrtuples = 0, alias = 0x0, eref = 0x28d84e8, lateral = false, 
  inh = true, inFromCl = true, requiredPerms = 2, checkAsUser = 0, selectedCols = 0x28d8c40, insertedCols = 0x0, 
  updatedCols = 0x0, securityQuals = 0x0}

進入expand_inherited_rtentry

(gdb) step
expand_inherited_rtentry (root=0x28fcdc8, rte=0x28d83d0, rti=1) at prepunion.c:1517
1517        Query      *parse = root->parse;

expand_inherited_rtentry->分割槽表標記為T

1526        if (!rte->inh)
(gdb) p rte->inh
$4 = true

expand_inherited_rtentry->執行相關判斷

(gdb) n
1529        if (rte->rtekind != RTE_RELATION)
(gdb) p rte->rtekind
$5 = RTE_RELATION
(gdb) n
1535        parentOID = rte->relid;
(gdb) 
1536        if (!has_subclass(parentOID))
(gdb) p parentOID
$6 = 16986
(gdb) n
1556        oldrc = get_plan_rowmark(root->rowMarks, rti);
(gdb) 
1557        if (rti == parse->resultRelation)
(gdb) p *oldrc
Cannot access memory at address 0x0

expand_inherited_rtentry->掃描繼承集的所有成員,獲取所需的鎖,並構建OIDs連結串列

(gdb) n
1559        else if (oldrc && RowMarkRequiresRowShareLock(oldrc->markType))
(gdb) 
1562            lockmode = AccessShareLock;
(gdb) 
1565        inhOIDs = find_all_inheritors(parentOID, lockmode, NULL);
(gdb) 
1572        if (list_length(inhOIDs) < 2)
(gdb) p inhOIDs
$7 = (List *) 0x28fd208
(gdb) p *inhOIDs
$8 = {type = T_OidList, length = 7, head = 0x28fd1e0, tail = 0x28fd778}
(gdb) 

expand_inherited_rtentry->開啟relation

(gdb) n
1584        if (oldrc)
(gdb) 
1591        oldrelation = heap_open(parentOID, NoLock);

expand_inherited_rtentry->成功獲取分割槽描述符,呼叫expand_partitioned_rtentry

(gdb) 
1594        if (RelationGetPartitionDesc(oldrelation) != NULL)
(gdb) 
1596            Assert(rte->relkind == RELKIND_PARTITIONED_TABLE);
(gdb) 
1603            expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
(gdb) 

expand_inherited_rtentry->進入expand_partitioned_rtentry

(gdb) step
expand_partitioned_rtentry (root=0x28fcdc8, parentrte=0x28d83d0, parentRTindex=1, parentrel=0x7f4e66827980, 
    top_parentrc=0x0, lockmode=1, appinfos=0x28fce98) at prepunion.c:1684
1684        PartitionDesc partdesc = RelationGetPartitionDesc(parentrel);

expand_partitioned_rtentry->獲取分割槽描述符

1684        PartitionDesc partdesc = RelationGetPartitionDesc(parentrel);
(gdb) n
1686        check_stack_depth();
(gdb) p *partdesc
$9 = {nparts = 6, oids = 0x298e4f8, boundinfo = 0x298e530}

expand_partitioned_rtentry->執行相關校驗

(gdb) n
1689        Assert(partdesc);
(gdb) 
1691        Assert(parentrte->inh);
(gdb) 
1700        if (!root->partColsUpdated)
(gdb) 
1702                has_partition_attrs(parentrel, parentrte->updatedCols, NULL);
(gdb) 
1701            root->partColsUpdated =
(gdb) 
1705        expand_single_inheritance_child(root, parentrte, parentRTindex, parentrel,

expand_partitioned_rtentry->首先展開分割槽表本身,進入expand_single_inheritance_child

(gdb) step
expand_single_inheritance_child (root=0x28fcdc8, parentrte=0x28d83d0, parentRTindex=1, parentrel=0x7f4e66827980, 
    top_parentrc=0x0, childrel=0x7f4e66827980, appinfos=0x28fce98, childrte_p=0x7ffd1928d2f8, childRTindex_p=0x7ffd1928d2f4)
    at prepunion.c:1778
1778        Query      *parse = root->parse;

expand_single_inheritance_child->執行相關初始化(childrte)

(gdb) n
1779        Oid         parentOID = RelationGetRelid(parentrel);
(gdb) 
1780        Oid         childOID = RelationGetRelid(childrel);
(gdb) 
1797        childrte = copyObject(parentrte);
(gdb) p parentOID
$10 = 16986
(gdb) p childOID
$11 = 16986
(gdb) n
1798        *childrte_p = childrte;
(gdb) 
1799        childrte->relid = childOID;
(gdb) 
1800        childrte->relkind = childrel->rd_rel->relkind;
(gdb) 
1802        if (childOID != parentOID &&
(gdb) 
1806            childrte->inh = false;
(gdb) 
1807        childrte->requiredPerms = 0;
(gdb) 
1808        childrte->securityQuals = NIL;
(gdb) 
1809        parse->rtable = lappend(parse->rtable, childrte);
(gdb) 
1810        childRTindex = list_length(parse->rtable);
(gdb) 
1811        *childRTindex_p = childRTindex;
(gdb) p *childrte -->relid = 16986,仍為分割槽表
$12 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relid = 16986, relkind = 112 'p', tablesample = 0x0, subquery = 0x0, 
  security_barrier = false, jointype = JOIN_INNER, joinaliasvars = 0x0, functions = 0x0, funcordinality = false, 
  tablefunc = 0x0, values_lists = 0x0, ctename = 0x0, ctelevelsup = 0, self_reference = false, coltypes = 0x0, 
  coltypmods = 0x0, colcollations = 0x0, enrname = 0x0, enrtuples = 0, alias = 0x0, eref = 0x28fd268, lateral = false, 
  inh = false, inFromCl = true, requiredPerms = 0, checkAsUser = 0, selectedCols = 0x28fd898, insertedCols = 0x0, 
  updatedCols = 0x0, securityQuals = 0x0}
(gdb) p *childRTindex_p
$13 = 0

expand_single_inheritance_child->完成分割槽表本身的擴充套件,回到expand_partitioned_rtentry

(gdb) p *childRTindex_p
$13 = 0
(gdb) n
1820        if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh)
(gdb) 
1855        if (top_parentrc)
(gdb) 
1881    }
(gdb) 
expand_partitioned_rtentry (root=0x28fcdc8, parentrte=0x28d83d0, parentRTindex=1, parentrel=0x7f4e66827980, 
    top_parentrc=0x0, lockmode=1, appinfos=0x28fce98) at prepunion.c:1713
1713        if (partdesc->nparts == 0)

expand_partitioned_rtentry->開始遍歷分割槽描述符中的分割槽

1713        if (partdesc->nparts == 0)
(gdb) n
1719        for (i = 0; i < partdesc->nparts; i++)
(gdb) 
1721            Oid         childOID = partdesc->oids[i];
(gdb) 
1725            childrel = heap_open(childOID, NoLock);
(gdb) 
1732            if (RELATION_IS_OTHER_TEMP(childrel))
(gdb) 
1735            expand_single_inheritance_child(root, parentrte, parentRTindex,
(gdb) p childOID
$14 = 16989 
----------------------------------------
testdb=# select relname from pg_class where oid=16989;
      relname       
--------------------
 t_hash_partition_1
(1 row)
----------------------------------------

expand_single_inheritance_child->再次進入expand_single_inheritance_child

(gdb) step
expand_single_inheritance_child (root=0x28fcdc8, parentrte=0x28d83d0, parentRTindex=1, parentrel=0x7f4e66827980, 
    top_parentrc=0x0, childrel=0x7f4e668306a0, appinfos=0x28fce98, childrte_p=0x7ffd1928d2f8, childRTindex_p=0x7ffd1928d2f4)
    at prepunion.c:1778
1778        Query      *parse = root->parse;

expand_single_inheritance_child->開始構建AppendRelInfo

...
1820        if (childrte->relkind != RELKIND_PARTITIONED_TABLE || childrte->inh)
(gdb) 
1822            appinfo = makeNode(AppendRelInfo);
(gdb) p *childrte
$17 = {type = T_RangeTblEntry, rtekind = RTE_RELATION, relid = 16989, relkind = 114 'r', tablesample = 0x0, subquery = 0x0, 
  security_barrier = false, jointype = JOIN_INNER, joinaliasvars = 0x0, functions = 0x0, funcordinality = false, 
  tablefunc = 0x0, values_lists = 0x0, ctename = 0x0, ctelevelsup = 0, self_reference = false, coltypes = 0x0, 
  coltypmods = 0x0, colcollations = 0x0, enrname = 0x0, enrtuples = 0, alias = 0x0, eref = 0x28fd9d0, lateral = false, 
  inh = false, inFromCl = true, requiredPerms = 0, checkAsUser = 0, selectedCols = 0x28fdbc8, insertedCols = 0x0, 
  updatedCols = 0x0, securityQuals = 0x0}
(gdb) p *childrte->relkind
Cannot access memory at address 0x72
(gdb) p childrte->relkind
$18 = 114 'r'
(gdb) p childrte->inh
$19 = false

expand_single_inheritance_child->構建完畢,檢視AppendRelInfo結構體

(gdb) n
1823            appinfo->parent_relid = parentRTindex;
(gdb) 
1824            appinfo->child_relid = childRTindex;
(gdb) 
1825            appinfo->parent_reltype = parentrel->rd_rel->reltype;
(gdb) 
1826            appinfo->child_reltype = childrel->rd_rel->reltype;
(gdb) 
1827            make_inh_translation_list(parentrel, childrel, childRTindex,
(gdb) 
1829            appinfo->parent_reloid = parentOID;
(gdb) 
1830            *appinfos = lappend(*appinfos, appinfo);
(gdb) 
1841            if (childOID != parentOID)
(gdb) 
1843                childrte->selectedCols = translate_col_privs(parentrte->selectedCols,
(gdb) 
1845                childrte->insertedCols = translate_col_privs(parentrte->insertedCols,
(gdb) 
1847                childrte->updatedCols = translate_col_privs(parentrte->updatedCols,
(gdb) 
1855        if (top_parentrc)
(gdb) p *appinfo
$20 = {type = T_AppendRelInfo, parent_relid = 1, child_relid = 3, parent_reltype = 16988, child_reltype = 16991, 
  translated_vars = 0x28fdc90, parent_reloid = 16986}

expand_single_inheritance_child->完成呼叫,返回

(gdb) 
1855        if (top_parentrc)
(gdb) p *appinfo
$20 = {type = T_AppendRelInfo, parent_relid = 1, child_relid = 3, parent_reltype = 16988, child_reltype = 16991, 
  translated_vars = 0x28fdc90, parent_reloid = 16986}
(gdb) n
1881    }
(gdb) 
expand_partitioned_rtentry (root=0x28fcdc8, parentrte=0x28d83d0, parentRTindex=1, parentrel=0x7f4e66827980, 
    top_parentrc=0x0, lockmode=1, appinfos=0x28fce98) at prepunion.c:1740
1740            if (childrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

expand_inherited_rtentry->完成expand_partitioned_rtentry過程呼叫,回到expand_inherited_rtentry

(gdb) finish
Run till exit from #0  expand_partitioned_rtentry (root=0x28fcdc8, parentrte=0x28d83d0, parentRTindex=1, 
    parentrel=0x7f4e66827980, top_parentrc=0x0, lockmode=1, appinfos=0x28fce98) at prepunion.c:1740
0x00000000007e55e3 in expand_inherited_rtentry (root=0x28fcdc8, rte=0x28d83d0, rti=1) at prepunion.c:1603
1603            expand_partitioned_rtentry(root, rte, rti, oldrelation, oldrc,
(gdb) 

expand_inherited_rtentry->完成expand_inherited_rtentry的呼叫,回到expand_inherited_tables

(gdb) n
1665        heap_close(oldrelation, NoLock);
(gdb) 
1666    }
(gdb) 
expand_inherited_tables (root=0x28fcdc8) at prepunion.c:1490
1490            rl = lnext(rl);
(gdb) 

expand_inherited_tables->完成expand_inherited_tables呼叫,回到subquery_planner

(gdb) n
1485        for (rti = 1; rti <= nrtes; rti++)
(gdb) 
1492    }
(gdb) 
subquery_planner (glob=0x28fcd30, parse=0x28d82b8, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at planner.c:719
719     root->hasHavingQual = (parse->havingQual != NULL);
(gdb) 

DONE!

四、參考資料

Parallel Append implementation
Partition Elimination in PostgreSQL 11

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374792/,如需轉載,請註明出處,否則將追究法律責任。

相關文章