PostgreSQL 原始碼解讀(78)- 查詢語句#63(create_plan函式#2-cr...

husthxd發表於2018-11-05

本節簡單介紹了建立執行計劃中的create_plan->create_plan_recurse->create_projection_plan和create_sort_plan函式實現邏輯。

一、資料結構

Plan
所有計劃節點透過將Plan結構作為第一個欄位從Plan結構“派生”。這確保了在將節點轉換為計劃節點時,一切都能正常工作。(在執行器中以通用方式傳遞時,節點指標經常被轉換為Plan *)

/* ----------------
 *      Plan node
 *
 * All plan nodes "derive" from the Plan structure by having the
 * Plan structure as the first field.  This ensures that everything works
 * when nodes are cast to Plan's.  (node pointers are frequently cast to Plan*
 * when passed around generically in the executor)
 * 所有計劃節點透過將Plan結構作為第一個欄位從Plan結構“派生”。
 * 這確保了在將節點轉換為計劃節點時,一切都能正常工作。
 * (在執行器中以通用方式傳遞時,節點指標經常被轉換為Plan *)
 *
 * We never actually instantiate any Plan nodes; this is just the common
 * abstract superclass for all Plan-type nodes.
 * 從未例項化任何Plan節點;這只是所有Plan-type節點的通用抽象超類。
 * ----------------
 */
typedef struct Plan
{
    NodeTag     type;//節點型別

    /*
     * 成本估算資訊;estimated execution costs for plan (see costsize.c for more info)
     */
    Cost        startup_cost;   /* 啟動成本;cost expended before fetching any tuples */
    Cost        total_cost;     /* 總成本;total cost (assuming all tuples fetched) */

    /*
     * 最佳化器估算資訊;planner's estimate of result size of this plan step
     */
    double      plan_rows;      /* 行數;number of rows plan is expected to emit */
    int         plan_width;     /* 平均行大小(Byte為單位);average row width in bytes */

    /*
     * 並行執行相關的資訊;information needed for parallel query
     */
    bool        parallel_aware; /* 是否參與並行執行邏輯?engage parallel-aware logic? */
    bool        parallel_safe;  /* 是否並行安全;OK to use as part of parallel plan? */

    /*
     * Plan型別節點通用的資訊.Common structural data for all Plan types.
     */
    int         plan_node_id;   /* unique across entire final plan tree */
    List       *targetlist;     /* target list to be computed at this node */
    List       *qual;           /* implicitly-ANDed qual conditions */
    struct Plan *lefttree;      /* input plan tree(s) */
    struct Plan *righttree;
    List       *initPlan;       /* Init Plan nodes (un-correlated expr
                                 * subselects) */

    /*
     * Information for management of parameter-change-driven rescanning
     * parameter-change-driven重掃描的管理資訊.
     * 
     * extParam includes the paramIDs of all external PARAM_EXEC params
     * affecting this plan node or its children.  setParam params from the
     * node's initPlans are not included, but their extParams are.
     *
     * allParam includes all the extParam paramIDs, plus the IDs of local
     * params that affect the node (i.e., the setParams of its initplans).
     * These are _all_ the PARAM_EXEC params that affect this node.
     */
    Bitmapset  *extParam;
    Bitmapset  *allParam;
} Plan;

二、原始碼解讀

create_plan->create_plan_recurse->create_projection_plan函式建立計劃樹,執行投影操作並透過遞迴的方式為子訪問路徑生成執行計劃。create_sort_plan函式建立Sort計劃節點。


//---------------------------------------------------------------- create_projection_plan

/*
 * create_projection_plan
 *
 *    Create a plan tree to do a projection step and (recursively) plans
 *    for its subpaths.  We may need a Result node for the projection,
 *    but sometimes we can just let the subplan do the work.
 *    建立計劃樹,執行投影操作並透過遞迴的方式為子訪問路徑生成執行計劃.
 *    一般來說需要一個Result節點用於投影操作,但有時候可以讓子計劃執行此項任務.
 */
static Plan *
create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
{
    Plan       *plan;
    Plan       *subplan;
    List       *tlist;
    bool        needs_result_node = false;

    /*
     * Convert our subpath to a Plan and determine whether we need a Result
     * node.
     * 轉換subpath為Plan,並確定是否需要Result節點.
     *
     * In most cases where we don't need to project, creation_projection_path
     * will have set dummypp, but not always.  First, some createplan.c
     * routines change the tlists of their nodes.  (An example is that
     * create_merge_append_plan might add resjunk sort columns to a
     * MergeAppend.)  Second, create_projection_path has no way of knowing
     * what path node will be placed on top of the projection path and
     * therefore can't predict whether it will require an exact tlist. For
     * both of these reasons, we have to recheck here.
     * 在大多數情況下,我們不需要投影運算,creation_projection_path將設定dummypp標誌,但並不總是如此。
     * 首先,一些createplan.c中的函式更改其節點的tlist。
     * (例如,create_merge_append_plan可能會向MergeAppend新增resjunk sort列)。
     * 其次,create_projection_path無法知道將在投影路徑頂部放置哪些路徑節點,因此無法預測它是否需要一個確切的tlist。
     * 由於這兩個原因,我們不得不在這裡重新檢查。
     */
    if (use_physical_tlist(root, &best_path->path, flags))
    {
        /*
         * Our caller doesn't really care what tlist we return, so we don't
         * actually need to project.  However, we may still need to ensure
         * proper sortgroupref labels, if the caller cares about those.
         * 如果我們的呼叫者並不關心返回的結果tlist,所以實際上不需要投影運算。
         * 然而,如果呼叫者關心sortgroupref標籤,可能仍然需要確保正確的sortgroupref資料。
         */
        subplan = create_plan_recurse(root, best_path->subpath, 0);
        tlist = subplan->targetlist;
        if (flags & CP_LABEL_TLIST)
            apply_pathtarget_labeling_to_tlist(tlist,
                                               best_path->path.pathtarget);
    }
    else if (is_projection_capable_path(best_path->subpath))
    {
        /*
         * Our caller requires that we return the exact tlist, but no separate
         * result node is needed because the subpath is projection-capable.
         * Tell create_plan_recurse that we're going to ignore the tlist it
         * produces.
         * 呼叫者要求返回精確的tlist,但是不需要單獨的Result節點,因為子路徑支援投影。
         * 呼叫create_plan_recurse時忽略它生成的tlist。
         */
        subplan = create_plan_recurse(root, best_path->subpath,
                                      CP_IGNORE_TLIST);
        tlist = build_path_tlist(root, &best_path->path);
    }
    else
    {
        /*
         * It looks like we need a result node, unless by good fortune the
         * requested tlist is exactly the one the child wants to produce.
         * 看起來需要一個Result節點,除非幸運的是,請求的tlist正是子節點想要生成的。
         */
        subplan = create_plan_recurse(root, best_path->subpath, 0);
        tlist = build_path_tlist(root, &best_path->path);
        needs_result_node = !tlist_same_exprs(tlist, subplan->targetlist);
    }

    /*
     * If we make a different decision about whether to include a Result node
     * than create_projection_path did, we'll have made slightly wrong cost
     * estimates; but label the plan with the cost estimates we actually used,
     * not "corrected" ones.  (XXX this could be cleaned up if we moved more
     * of the sortcolumn setup logic into Path creation, but that would add
     * expense to creating Paths we might end up not using.)
     * 如果我們對是否包含一個Result節點做出與create_projection_path不同的決定,
     * 我們就會做出略微錯誤的成本估算;
     * 但是在這個計劃上標上我們實際使用的成本估算,而不是“修正的”成本估算。
     * (如果我們將更多的sortcolumn設定邏輯移到路徑建立中,這個問題就可以解決了,
     *  但是這會增加建立路徑的成本,而最終可能不會使用這些路徑。)
     */
    if (!needs_result_node)
    {
        /* Don't need a separate Result, just assign tlist to subplan */
        //不需要單獨的Result節點,把tlist賦值給subplan
        plan = subplan;
        plan->targetlist = tlist;

        /* Label plan with the estimated costs we actually used */
        //標記估算成本
        plan->startup_cost = best_path->path.startup_cost;
        plan->total_cost = best_path->path.total_cost;
        plan->plan_rows = best_path->path.rows;
        plan->plan_width = best_path->path.pathtarget->width;
        plan->parallel_safe = best_path->path.parallel_safe;
        /* ... but don't change subplan's parallel_aware flag */
    }
    else
    {
        /* We need a Result node */
        //需要Result節點
        plan = (Plan *) make_result(tlist, NULL, subplan);

        copy_generic_path_info(plan, (Path *) best_path);
    }

    return plan;
}

//---------------------------------------------------------------- create_sort_plan
 /*
  * create_sort_plan
  *
  *    Create a Sort plan for 'best_path' and (recursively) plans
  *    for its subpaths.
  *    建立Sort計劃節點
  */
 static Sort *
 create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags)
 {
     Sort       *plan;
     Plan       *subplan;
 
     /*
      * We don't want any excess columns in the sorted tuples, so request a
      * smaller tlist.  Otherwise, since Sort doesn't project, tlist
      * requirements pass through.
      * 我們不希望在排序元組中有任何多餘的列,所以希望得到一個更小的tlist。
      * 否則,由於Sort不執行投影運算,tlist就會透過完整的保留傳遞。
      */
     subplan = create_plan_recurse(root, best_path->subpath,
                                   flags | CP_SMALL_TLIST);
 
     /*
      * make_sort_from_pathkeys() indirectly calls find_ec_member_for_tle(),
      * which will ignore any child EC members that don't belong to the given
      * relids. Thus, if this sort path is based on a child relation, we must
      * pass its relids.
      * make_sort_from_pathkeys()間接呼叫find_ec_member_for_tle(),它將忽略不屬於給定relid的任何子EC成員。
      * 因此,如果這個排序路徑是基於子關係的,我們必須傳遞它的relids。
      */
     plan = make_sort_from_pathkeys(subplan, best_path->path.pathkeys,
                                    IS_OTHER_REL(best_path->subpath->parent) ?
                                    best_path->path.parent->relids : NULL);
 
     copy_generic_path_info(&plan->plan, (Path *) best_path);
 
     return plan;
 }


//------------------------------------------------ build_path_tlist

/*
 * Build a target list (ie, a list of TargetEntry) for the Path's output.
 * 構建用於輸出的target連結串列(比如TargetEntry節點連結串列)
 *
 * This is almost just make_tlist_from_pathtarget(), but we also have to
 * deal with replacing nestloop params.
 * 該函式幾乎就是make_tlist_from_pathtarget()的實現邏輯,但還必須處理替換nestloop引數。
 */
static List *
build_path_tlist(PlannerInfo *root, Path *path)
{
    List       *tlist = NIL;
    Index      *sortgrouprefs = path->pathtarget->sortgrouprefs;
    int         resno = 1;
    ListCell   *v;

    foreach(v, path->pathtarget->exprs)
    {
        Node       *node = (Node *) lfirst(v);
        TargetEntry *tle;

        /*
         * If it's a parameterized path, there might be lateral references in
         * the tlist, which need to be replaced with Params.  There's no need
         * to remake the TargetEntry nodes, so apply this to each list item
         * separately.
         * 如果是引數化路徑,那麼tlist中可能有橫向引用,需要用Params替換。
         * 不需要重新構建TargetEntry節點,因此可以將其單獨應用於每個連結串列項。
         */
        if (path->param_info)
            node = replace_nestloop_params(root, node);

        tle = makeTargetEntry((Expr *) node,
                              resno,
                              NULL,
                              false);
        if (sortgrouprefs)
            tle->ressortgroupref = sortgrouprefs[resno - 1];

        tlist = lappend(tlist, tle);
        resno++;
    }
    return tlist;
}

三、跟蹤分析

測試指令碼如下

testdb=# explain select dw.*,grjf.grbh,grjf.xm,grjf.ny,grjf.je 
testdb-# from t_dwxx dw,lateral (select gr.grbh,gr.xm,jf.ny,jf.je 
testdb(#                         from t_grxx gr inner join t_jfxx jf 
testdb(#                                        on gr.dwbh = dw.dwbh 
testdb(#                                           and gr.grbh = jf.grbh) grjf
testdb-# order by dw.dwbh;
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Sort  (cost=20070.93..20320.93 rows=100000 width=47)
   Sort Key: dw.dwbh
   ->  Hash Join  (cost=3754.00..8689.61 rows=100000 width=47)
         Hash Cond: ((gr.dwbh)::text = (dw.dwbh)::text)
         ->  Hash Join  (cost=3465.00..8138.00 rows=100000 width=31)
               Hash Cond: ((jf.grbh)::text = (gr.grbh)::text)
               ->  Seq Scan on t_jfxx jf  (cost=0.00..1637.00 rows=100000 width=20)
               ->  Hash  (cost=1726.00..1726.00 rows=100000 width=16)
                     ->  Seq Scan on t_grxx gr  (cost=0.00..1726.00 rows=100000 width=16)
         ->  Hash  (cost=164.00..164.00 rows=10000 width=20)
               ->  Seq Scan on t_dwxx dw  (cost=0.00..164.00 rows=10000 width=20)
(11 rows)

啟動gdb,設定斷點,進入create_projection_plan函式

(gdb) b create_projection_plan
Breakpoint 2 at 0x7b95a8: file createplan.c, line 1627.
(gdb) c
Continuing.

Breakpoint 2, create_projection_plan (root=0x26c1258, best_path=0x2722d00, flags=1) at createplan.c:1627
1627        bool        needs_result_node = false;

轉換subpath為Plan,並確定是否需要Result節點,並且判斷是否需要生成Result節點

...
(gdb) n
1642        if (use_physical_tlist(root, &best_path->path, flags))
(gdb) n
1655        else if (is_projection_capable_path(best_path->subpath))
(gdb) 
1673            subplan = create_plan_recurse(root, best_path->subpath, 0);

檢視best_path&best_path->subpath變數

(gdb) p *best_path
$3 = {path = {type = T_ProjectionPath, pathtype = T_Result, parent = 0x2722998, pathtarget = 0x27226f8, param_info = 0x0, 
    parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, startup_cost = 20070.931487218411, 
    total_cost = 20320.931487218411, pathkeys = 0x26cfe98}, subpath = 0x2722c68, dummypp = true}
(gdb) p *(SortPath *)best_path->subpath
$16 = {path = {type = T_SortPath, pathtype = T_Sort, parent = 0x2722998, pathtarget = 0x27212d8, param_info = 0x0, 
    parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, startup_cost = 20070.931487218411, 
    total_cost = 20320.931487218411, pathkeys = 0x26cfe98}, subpath = 0x2721e60}

建立subpath(SortPath)的執行計劃

(gdb) step
create_plan_recurse (root=0x26c1258, best_path=0x2722c68, flags=0) at createplan.c:364
364     check_stack_depth();
(gdb) n
366     switch (best_path->pathtype)
(gdb) 
447             plan = (Plan *) create_sort_plan(root,

進入create_sort_plan

(gdb) step
create_sort_plan (root=0x26c1258, best_path=0x2722c68, flags=0) at createplan.c:1759
1759        subplan = create_plan_recurse(root, best_path->subpath,

SortPath的subpath是HashPath

(gdb) p best_path->subpath->type
$17 = T_HashPath
(gdb) p *(HashPath *)best_path->subpath
$18 = {jpath = {path = {type = T_HashPath, pathtype = T_HashJoin, parent = 0x27210c0, pathtarget = 0x27212d8, 
      param_info = 0x0, parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 100000, 
      startup_cost = 3754, total_cost = 8689.6112499999999, pathkeys = 0x0}, jointype = JOIN_INNER, inner_unique = true, 
    outerjoinpath = 0x2720f68, innerjoinpath = 0x26d0598, joinrestrictinfo = 0x2722068}, path_hashclauses = 0x27223c0, 
  num_batches = 1, inner_rows_total = 10000}

完成SortPath執行計劃的構建

(gdb) 
1774        return plan;
(gdb) 
1775    }
(gdb) p *plan
$20 = {plan = {type = T_Sort, startup_cost = 20070.931487218411, total_cost = 20320.931487218411, plan_rows = 100000, 
    plan_width = 47, parallel_aware = false, parallel_safe = true, plan_node_id = 0, targetlist = 0x2723208, qual = 0x0, 
    lefttree = 0x27243d0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}, numCols = 1, 
  sortColIdx = 0x27222a0, sortOperators = 0x2724468, collations = 0x2724488, nullsFirst = 0x27244a8}

回到上一層

(gdb) n
create_plan_recurse (root=0x26c1258, best_path=0x2722c68, flags=0) at createplan.c:450
450             break;

回到create_projection_plan函式

(gdb) n
504     return plan;
(gdb) 
505 }
(gdb) 
create_projection_plan (root=0x26c1258, best_path=0x2722d00, flags=1) at createplan.c:1674
1674            tlist = build_path_tlist(root, &best_path->path);

執行完畢,返回create_plan,結果,最外層的Plan為Sort

(gdb) 
1708        return plan;
(gdb) 
1709    }
(gdb) p *plan
$22 = {type = T_Sort, startup_cost = 20070.931487218411, total_cost = 20320.931487218411, plan_rows = 100000, 
  plan_width = 47, parallel_aware = false, parallel_safe = true, plan_node_id = 0, targetlist = 0x2724548, qual = 0x0, 
  lefttree = 0x27243d0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}
(gdb) n
create_plan_recurse (root=0x26c1258, best_path=0x2722d00, flags=1) at createplan.c:504
504     return plan;
(gdb) p *plan
$23 = {type = T_Sort, startup_cost = 20070.931487218411, total_cost = 20320.931487218411, plan_rows = 100000, 
  plan_width = 47, parallel_aware = false, parallel_safe = true, plan_node_id = 0, targetlist = 0x2724548, qual = 0x0, 
  lefttree = 0x27243d0, righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}
(gdb) n
505 }
(gdb) 
create_plan (root=0x26c1258, best_path=0x2722d00) at createplan.c:329
329     if (!IsA(plan, ModifyTable))

DONE!

四、參考資料

createplan.c
PG Document:Query Planning

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374812/,如需轉載,請註明出處,否則將追究法律責任。

相關文章