PostgreSQL 原始碼解讀(74)- 查詢語句#59(Review - subquery_...

husthxd發表於2018-11-01

本節回過頭來Review subquery_planner函式的實現邏輯,該函式對(子)查詢進行執行規劃。對於查詢樹中的每個(子)查詢(sub-SELECT),都會遞迴執行此處理過程。

一、原始碼解讀

subquery_planner函式由函式standard_planner呼叫,生成最終的結果Relation(成本最低),其輸出作為生成實際執行計劃的輸入,在此函式中會呼叫grouping_planner執行主要的計劃過程

/*--------------------
 * subquery_planner
 *    Invokes the planner on a subquery.  We recurse to here for each
 *    sub-SELECT found in the query tree.
 *    對子查詢進行執行規劃。對於查詢樹中的每個子查詢(sub-SELECT),都會遞迴此處理過程。    
 *
 * glob is the global state for the current planner run.
 * parse is the querytree produced by the parser & rewriter.
 * parent_root is the immediate parent Query's info (NULL at the top level).
 * hasRecursion is true if this is a recursive WITH query.
 * tuple_fraction is the fraction of tuples we expect will be retrieved.
 * tuple_fraction is interpreted as explained for grouping_planner, below.
 * glob-當前計劃器執行的全域性狀態。
 * parse-由解析器和重寫器生成的查詢樹querytree。
 * parent_root是父查詢的資訊(如為頂層則為空)。
 * hasRecursion-如果這是一個帶查詢的遞迴,值為T。
 * tuple_fraction-掃描元組的比例。tuple_fraction在grouping_planner中詳細解釋。
 *
 * Basically, this routine does the stuff that should only be done once
 * per Query object.  It then calls grouping_planner.  At one time,
 * grouping_planner could be invoked recursively on the same Query object;
 * that's not currently true, but we keep the separation between the two
 * routines anyway, in case we need it again someday.
 * 基本上,這個函式包含完成了每個Query只需要執行一次的任務。
 * 該函式呼叫grouping_planner一次。在同一個Query上,每次遞迴grouping_planner都呼叫一次;
 * 當然,這不是通常的情況,但我們仍然保持這兩個例程(subquery_planner和grouping_planner)之間的分離,
 * 以防有一天我們再次需要它。
 * 
 * subquery_planner will be called recursively to handle sub-Query nodes
 * found within the query's expressions and rangetable.
 * 函式subquery_planner將被遞迴呼叫,以處理表示式和RTE中的子查詢節點。 
 *
 * Returns the PlannerInfo struct ("root") that contains all data generated
 * while planning the subquery.  In particular, the Path(s) attached to
 * the (UPPERREL_FINAL, NULL) upperrel represent our conclusions about the
 * cheapest way(s) to implement the query.  The top level will select the
 * best Path and pass it through createplan.c to produce a finished Plan.
 * 返回PlannerInfo struct(“root”),它包含在計劃子查詢時生成的所有資料。
 * 特別地,訪問路徑附加到(UPPERREL_FINAL, NULL) 上層關係中,以代表最佳化器已找到查詢成本最低的方法.
 * 頂層將選擇最佳路徑並將其透過createplan.c傳遞以制定一個已完成的計劃。
 *--------------------
 */
/*
輸入:
    glob-PlannerGlobal
    parse-Query結構體指標
    parent_root-父PlannerInfo Root節點
    hasRecursion-是否遞迴?
    tuple_fraction-掃描Tuple比例
輸出:
    PlannerInfo指標
*/
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
                 PlannerInfo *parent_root,
                 bool hasRecursion, double tuple_fraction)
{
    PlannerInfo *root;//返回值
    List       *newWithCheckOptions;//
    List       *newHaving;//Having子句
    bool        hasOuterJoins;//是否存在Outer Join?
    RelOptInfo *final_rel;//
    ListCell   *l;//臨時變數

    /* Create a PlannerInfo data structure for this subquery */
    //建立一個規劃器資料結構:PlannerInfo
    root = makeNode(PlannerInfo);//構造返回值
    root->parse = parse;
    root->glob = glob;
    root->query_level = parent_root ? parent_root->query_level + 1 : 1;
    root->parent_root = parent_root;
    root->plan_params = NIL;
    root->outer_params = NULL;
    root->planner_cxt = CurrentMemoryContext;
    root->init_plans = NIL;
    root->cte_plan_ids = NIL;
    root->multiexpr_params = NIL;
    root->eq_classes = NIL;
    root->append_rel_list = NIL;
    root->rowMarks = NIL;
    memset(root->upper_rels, 0, sizeof(root->upper_rels));
    memset(root->upper_targets, 0, sizeof(root->upper_targets));
    root->processed_tlist = NIL;
    root->grouping_map = NULL;
    root->minmax_aggs = NIL;
    root->qual_security_level = 0;
    root->inhTargetKind = INHKIND_NONE;
    root->hasRecursion = hasRecursion;
    if (hasRecursion)
        root->wt_param_id = SS_assign_special_param(root);
    else
        root->wt_param_id = -1;
    root->non_recursive_path = NULL;
    root->partColsUpdated = false;

    /*
     * If there is a WITH list, process each WITH query and build an initplan
     * SubPlan structure for it.
     * 如果有一個WITH連結串列,使用查詢處理每個連結串列,併為其構建一個initplan子計劃結構。
     */
    if (parse->cteList)
        SS_process_ctes(root);//處理With 語句

    /*
     * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
     * to transform them into joins.  Note that this step does not descend
     * into subqueries; if we pull up any subqueries below, their SubLinks are
     * processed just before pulling them up.
     * 查詢WHERE和JOIN/ON子句中的ANY/EXISTS子句,並嘗試將它們轉換為JOIN。
     * 注意,此步驟不會下降為子查詢;如果我們上拉子查詢,它們的SubLinks將在調出它們上拉前被處理。
     */
    if (parse->hasSubLinks)
        pull_up_sublinks(root); //上拉子連結

    /*
     * Scan the rangetable for set-returning functions, and inline them if
     * possible (producing subqueries that might get pulled up next).
     * Recursion issues here are handled in the same way as for SubLinks.
     * 掃描RTE中的set-returning函式,
     * 如果可能,內聯它們(生成下一個可能被上拉的子查詢)。
     * 這裡遞迴問題的處理方式與SubLinks相同。
     */
    inline_set_returning_functions(root);//

    /*
     * Check to see if any subqueries in the jointree can be merged into this
     * query.
     * 檢查連線樹中的子查詢是否可以合併到該查詢中(上拉子查詢)
     */
    pull_up_subqueries(root);//上拉子查詢

    /*
     * If this is a simple UNION ALL query, flatten it into an appendrel. We
     * do this now because it requires applying pull_up_subqueries to the leaf
     * queries of the UNION ALL, which weren't touched above because they
     * weren't referenced by the jointree (they will be after we do this).
     * 如果這是一個簡單的UNION ALL查詢,則將其ftatten為appendrel結構。
     * 我們現在這樣做是因為它需要對UNION ALL的葉子查詢應用pull_up_subqueries,
     * 上面沒有涉及到這些查詢,因為它們沒有被jointree引用(在我們這樣做之後它們將被引用)。
     */
    if (parse->setOperations)
        flatten_simple_union_all(root);//扁平化處理UNION ALL

    /*
     * Detect whether any rangetable entries are RTE_JOIN kind; if not, we can
     * avoid the expense of doing flatten_join_alias_vars().  Also check for
     * outer joins --- if none, we can skip reduce_outer_joins().  And check
     * for LATERAL RTEs, too.  This must be done after we have done
     * pull_up_subqueries(), of course.
     * 檢測是否有任何RTE中的元素是RTE_JOIN型別;如果沒有,可以避免執行refin_join_alias_vars()的開銷。
     * 檢查外部連線——如果沒有,可以跳過reduce_outer_join()函式。同樣的,我們會檢查LATERAL RTEs。
     * 當然,這必須在我們完成pull_up_subqueries()呼叫之後完成。
     */
     //判斷RTE中是否存在RTE_JOIN?
    root->hasJoinRTEs = false;
    root->hasLateralRTEs = false;
    hasOuterJoins = false;
    foreach(l, parse->rtable)
    {
        RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);

        if (rte->rtekind == RTE_JOIN)
        {
            root->hasJoinRTEs = true;
            if (IS_OUTER_JOIN(rte->jointype))
                hasOuterJoins = true;
        }
        if (rte->lateral)
            root->hasLateralRTEs = true;
    }

    /*
     * Preprocess RowMark information.  We need to do this after subquery
     * pullup (so that all non-inherited RTEs are present) and before
     * inheritance expansion (so that the info is available for
     * expand_inherited_tables to examine and modify).
     * 預處理RowMark資訊。
     * 我們需要在子查詢上拉(以便所有非繼承的RTEs都存在)和繼承展開之後完成
     * (以便expand_inherited_tables可以使用這個資訊來檢查和修改)。
     */
     //預處理RowMark資訊
    preprocess_rowmarks(root);

    /*
     * Expand any rangetable entries that are inheritance sets into "append
     * relations".  This can add entries to the rangetable, but they must be
     * plain base relations not joins, so it's OK (and marginally more
     * efficient) to do it after checking for join RTEs.  We must do it after
     * pulling up subqueries, else we'd fail to handle inherited tables in
     * subqueries.
     * 將繼承集的任何可範圍條目展開為“append relations”。
     * 將相關的relation新增到RTE中,但它們必須是純基礎關係而不是連線,
     * 因此在檢查連線RTEs之後執行它是可以的(而且更有效)。
     * 我們必須在啟動子查詢後執行,否則我們將無法在子查詢中處理繼承表。
     */
     //展開繼承表
    expand_inherited_tables(root);

    /*
     * Set hasHavingQual to remember if HAVING clause is present.  Needed
     * because preprocess_expression will reduce a constant-true condition to
     * an empty qual list ... but "HAVING TRUE" is not a semantic no-op.
     * 如果存在HAVING子句,則務必設定hasHavingQual屬性。
     * 因為preprocess_expression將把constant-true條件減少為空的條件qual列表…
     * 但是,“HAVING TRUE”並沒有語義錯誤。
     */
     //是否存在Having表示式
    root->hasHavingQual = (parse->havingQual != NULL);

    /* Clear this flag; might get set in distribute_qual_to_rels */
    //清除hasPseudoConstantQuals標記,該標記可能在distribute_qual_to_rels函式中設定
    root->hasPseudoConstantQuals = false;

    /*
     * Do expression preprocessing on targetlist and quals, as well as other
     * random expressions in the querytree.  Note that we do not need to
     * handle sort/group expressions explicitly, because they are actually
     * part of the targetlist.
     * 對targetlist和quals以及querytree中的其他隨機表示式進行表示式預處理。
     * 注意,我們不需要顯式地處理sort/group表示式,因為它們實際上是targetlist的一部分。
     */
     //預處理表示式:targetList(投影列)
    parse->targetList = (List *)
        preprocess_expression(root, (Node *) parse->targetList,
                              EXPRKIND_TARGET);

    /* Constant-folding might have removed all set-returning functions */
    //Constant-folding 可能已經把set-returning函式去掉
    if (parse->hasTargetSRFs)
        parse->hasTargetSRFs = expression_returns_set((Node *) parse->targetList);

    newWithCheckOptions = NIL;
    foreach(l, parse->withCheckOptions)//witch Check Options
    {
        WithCheckOption *wco = lfirst_node(WithCheckOption, l);

        wco->qual = preprocess_expression(root, wco->qual,
                                          EXPRKIND_QUAL);
        if (wco->qual != NULL)
            newWithCheckOptions = lappend(newWithCheckOptions, wco);
    }
    parse->withCheckOptions = newWithCheckOptions;
     //返回列資訊returningList
    parse->returningList = (List *)
        preprocess_expression(root, (Node *) parse->returningList,
                              EXPRKIND_TARGET);
     //預處理條件表示式
    preprocess_qual_conditions(root, (Node *) parse->jointree);
     //預處理Having表示式
    parse->havingQual = preprocess_expression(root, parse->havingQual,
                                              EXPRKIND_QUAL);
     //視窗函式
    foreach(l, parse->windowClause)
    {
        WindowClause *wc = lfirst_node(WindowClause, l);

        /* partitionClause/orderClause are sort/group expressions */
        wc->startOffset = preprocess_expression(root, wc->startOffset,
                                                EXPRKIND_LIMIT);
        wc->endOffset = preprocess_expression(root, wc->endOffset,
                                              EXPRKIND_LIMIT);
    }
     //Limit子句
    parse->limitOffset = preprocess_expression(root, parse->limitOffset,
                                               EXPRKIND_LIMIT);
    parse->limitCount = preprocess_expression(root, parse->limitCount,
                                              EXPRKIND_LIMIT);
     //On Conflict子句
    if (parse->onConflict)
    {
        parse->onConflict->arbiterElems = (List *)
            preprocess_expression(root,
                                  (Node *) parse->onConflict->arbiterElems,
                                  EXPRKIND_ARBITER_ELEM);
        parse->onConflict->arbiterWhere =
            preprocess_expression(root,
                                  parse->onConflict->arbiterWhere,
                                  EXPRKIND_QUAL);
        parse->onConflict->onConflictSet = (List *)
            preprocess_expression(root,
                                  (Node *) parse->onConflict->onConflictSet,
                                  EXPRKIND_TARGET);
        parse->onConflict->onConflictWhere =
            preprocess_expression(root,
                                  parse->onConflict->onConflictWhere,
                                  EXPRKIND_QUAL);
        /* exclRelTlist contains only Vars, so no preprocessing needed */
    }
     //集合操作(AppendRelInfo)
    root->append_rel_list = (List *)
        preprocess_expression(root, (Node *) root->append_rel_list,
                              EXPRKIND_APPINFO);
     //RTE
    /* Also need to preprocess expressions within RTEs */
    foreach(l, parse->rtable)
    {
        RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
        int         kind;
        ListCell   *lcsq;

        if (rte->rtekind == RTE_RELATION)
        {
            if (rte->tablesample)
                rte->tablesample = (TableSampleClause *)
                    preprocess_expression(root,
                                          (Node *) rte->tablesample,
                                          EXPRKIND_TABLESAMPLE);//資料表取樣語句
        }
        else if (rte->rtekind == RTE_SUBQUERY)//子查詢
        {
            /*
             * We don't want to do all preprocessing yet on the subquery's
             * expressions, since that will happen when we plan it.  But if it
             * contains any join aliases of our level, those have to get
             * expanded now, because planning of the subquery won't do it.
             * That's only possible if the subquery is LATERAL.
             * 我們還不想對子查詢的表示式進行預處理,因為這將在計劃時發生。
             * 但是,如果它包含當前級別的任何連線別名,那麼現在就必須擴充套件這些別名,
             * 因為子查詢的計劃無法做到這一點。只有在子查詢是LATERAL的情況下才有可能。
             */
            if (rte->lateral && root->hasJoinRTEs)
                rte->subquery = (Query *)
                    flatten_join_alias_vars(root, (Node *) rte->subquery);
        }
        else if (rte->rtekind == RTE_FUNCTION)//函式
        {
            /* Preprocess the function expression(s) fully */
            //預處理函式表示式
            kind = rte->lateral ? EXPRKIND_RTFUNC_LATERAL : EXPRKIND_RTFUNC;
            rte->functions = (List *)
                preprocess_expression(root, (Node *) rte->functions, kind);
        }
        else if (rte->rtekind == RTE_TABLEFUNC)//TABLE FUNC
        {
            /* Preprocess the function expression(s) fully */
            kind = rte->lateral ? EXPRKIND_TABLEFUNC_LATERAL : EXPRKIND_TABLEFUNC;
            rte->tablefunc = (TableFunc *)
                preprocess_expression(root, (Node *) rte->tablefunc, kind);
        }
        else if (rte->rtekind == RTE_VALUES)//VALUES子句
        {
            /* Preprocess the values lists fully */
            kind = rte->lateral ? EXPRKIND_VALUES_LATERAL : EXPRKIND_VALUES;
            rte->values_lists = (List *)
                preprocess_expression(root, (Node *) rte->values_lists, kind);
        }

        /*
         * Process each element of the securityQuals list as if it were a
         * separate qual expression (as indeed it is).  We need to do it this
         * way to get proper canonicalization of AND/OR structure.  Note that
         * this converts each element into an implicit-AND sublist.
         * 處理securityQuals列表的每個元素,就好像它是一個單獨的qual表示式(事實也是如此)。
         * 之所以這樣做,是因為需要獲得適當的規範化AND/OR結構。
         * 注意,這將把每個元素轉換為隱含的子列表。
         */
        foreach(lcsq, rte->securityQuals)
        {
            lfirst(lcsq) = preprocess_expression(root,
                                                 (Node *) lfirst(lcsq),
                                                 EXPRKIND_QUAL);
        }
    }

    /*
     * Now that we are done preprocessing expressions, and in particular done
     * flattening join alias variables, get rid of the joinaliasvars lists.
     * They no longer match what expressions in the rest of the tree look
     * like, because we have not preprocessed expressions in those lists (and
     * do not want to; for example, expanding a SubLink there would result in
     * a useless unreferenced subplan).  Leaving them in place simply creates
     * a hazard for later scans of the tree.  We could try to prevent that by
     * using QTW_IGNORE_JOINALIASES in every tree scan done after this point,
     * but that doesn't sound very reliable.
     * 現在,已經完成了預處理表示式,特別是扁平化連線別名變數,現在可以去掉joinaliasvars連結串列了。
     * 它們不再匹配樹中其他部分中的表示式,因為我們沒有在那些連結串列中預處理表示式
     * (而且是不希望這樣做,例如,在那裡展開一個SubLink將導致無用的未引用的子計劃)。
     * 把它們放在連結串列中只會給以後掃描樹造成問題。
     * 我們可以在這之後的每一次樹掃描中使用QTW_IGNORE_JOINALIASES來防止這種情況,雖然這聽起來不太可靠。
     */
    if (root->hasJoinRTEs)
    {
        foreach(l, parse->rtable)
        {
            RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);

            rte->joinaliasvars = NIL;
        }
    }

    /*
     * In some cases we may want to transfer a HAVING clause into WHERE. We
     * cannot do so if the HAVING clause contains aggregates (obviously) or
     * volatile functions (since a HAVING clause is supposed to be executed
     * only once per group).  We also can't do this if there are any nonempty
     * grouping sets; moving such a clause into WHERE would potentially change
     * the results, if any referenced column isn't present in all the grouping
     * sets.  (If there are only empty grouping sets, then the HAVING clause
     * must be degenerate as discussed below.)
     * 在某些情況下,我們可能想把“HAVING”條件轉移到WHERE子句中。
     * 如果HAVING子句包含聚合(顯式的)或易變volatile函式(因為每個GROUP只執行一次HAVING子句),就不能這樣做。
     * 如果有任何非空GROUPING SET,也不能這樣做;
     * 如果在所有GROUPING SET中沒有出現任何引用列,將這樣的子句移動到WHERE可能會改變結果。
     * (如果只有空的GROUP SET分組集,則可以按照下面討論的那樣簡化HAVING子句->WHERE中。)
     *
     * Also, it may be that the clause is so expensive to execute that we're
     * better off doing it only once per group, despite the loss of
     * selectivity.  This is hard to estimate short of doing the entire
     * planning process twice, so we use a heuristic: clauses containing
     * subplans are left in HAVING.  Otherwise, we move or copy the HAVING
     * clause into WHERE, in hopes of eliminating tuples before aggregation
     * instead of after.
     * 而且,執行子句的成本非常高,所以最好每組只執行一次,儘管這樣會導致選擇性selectivity。
     * 如果不把整個規劃過程重複一遍,這是很難估計的,因此我們使用啟發式的方法:
     * 包含子計劃的條款在HAVING的後面。
     * 否則,我們將把HAVING子句移動到WHERE中,希望在聚合之前而不是聚合之後消除元組。
     * 
     * If the query has explicit grouping then we can simply move such a
     * clause into WHERE; any group that fails the clause will not be in the
     * output because none of its tuples will reach the grouping or
     * aggregation stage.  Otherwise we must have a degenerate (variable-free)
     * HAVING clause, which we put in WHERE so that query_planner() can use it
     * in a gating Result node, but also keep in HAVING to ensure that we
     * don't emit a bogus aggregated row. (This could be done better, but it
     * seems not worth optimizing.)
     * 如果查詢有顯式分組,那麼可以簡單地將這樣的子句移動到WHERE中;
     * 任何失敗的GROUP子句都不會出現在輸出中,因為它的元組不會到達分組或聚合階段。
     * 否則,我們必須有一個退化的(無變數的)HAVING子句,把它放在WHERE中,
     * 以便query_planner()可以在一個控制結果節點中使用它,但同時還要確保不會發出一個偽造的聚合行。
     * (這本來可以做得更好,但似乎不值得繼續深入最佳化。)
     *
     * Note that both havingQual and parse->jointree->quals are in
     * implicitly-ANDed-list form at this point, even though they are declared
     * as Node *.
     * 請注意,現在不管是qual還是parse->jointree->quals,即使它們被宣告為節點 *,
     * 但它們在這個點上都是都是隱式的連結串列形式。
     */
    newHaving = NIL;
    foreach(l, (List *) parse->havingQual)
    {
        Node       *havingclause = (Node *) lfirst(l);

        if ((parse->groupClause && parse->groupingSets) ||
            contain_agg_clause(havingclause) ||
            contain_volatile_functions(havingclause) ||
            contain_subplans(havingclause))
        {
            /* keep it in HAVING */
            newHaving = lappend(newHaving, havingclause);
        }
        else if (parse->groupClause && !parse->groupingSets)
        {
            /* move it to WHERE */
            parse->jointree->quals = (Node *)
                lappend((List *) parse->jointree->quals, havingclause);
        }
        else
        {
            /* put a copy in WHERE, keep it in HAVING */
            parse->jointree->quals = (Node *)
                lappend((List *) parse->jointree->quals,
                        copyObject(havingclause));
            newHaving = lappend(newHaving, havingclause);
        }
    }
    parse->havingQual = (Node *) newHaving;

    /* Remove any redundant GROUP BY columns */
    //移除多餘的GROUP BY 列
    remove_useless_groupby_columns(root);

    /*
     * If we have any outer joins, try to reduce them to plain inner joins.
     * This step is most easily done after we've done expression
     * preprocessing.
     * 如果存在外連線,則嘗試將它們轉換為普通的內部連線。
     * 在我們完成表示式預處理之後,這個步驟相對容易完成。
     */
    if (hasOuterJoins)
        reduce_outer_joins(root);

    /*
     * Do the main planning.  If we have an inherited target relation, that
     * needs special processing, else go straight to grouping_planner.
     * 執行主要的計劃過程。
     * 如果存在繼承的目標關係,則需要特殊處理,否則直接執行grouping_planner。
     */
    if (parse->resultRelation &&
        rt_fetch(parse->resultRelation, parse->rtable)->inh)
        inheritance_planner(root);
    else
        grouping_planner(root, false, tuple_fraction);

    /*
     * Capture the set of outer-level param IDs we have access to, for use in
     * extParam/allParam calculations later.
     * 獲取我們可以訪問的outer-level的引數IDs,以便稍後在extParam/allParam計算中使用。
     */
    SS_identify_outer_params(root);

    /*
     * If any initPlans were created in this query level, adjust the surviving
     * Paths' costs and parallel-safety flags to account for them.  The
     * initPlans won't actually get attached to the plan tree till
     * create_plan() runs, but we must include their effects now.
     * 如果在此查詢級別中建立了initplan,則調整現存的訪問路徑成本和並行安全標誌,以反映這些成本。
     * 在create_plan()執行之前,initPlans實際上不會被附加到計劃樹中,但是我們現在必須包含它們的效果。
     */
    final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
    SS_charge_for_initplans(root, final_rel);

    /*
     * Make sure we've identified the cheapest Path for the final rel.  (By
     * doing this here not in grouping_planner, we include initPlan costs in
     * the decision, though it's unlikely that will change anything.)
     * 確保我們已經為最終的關係確定了成本最低的路徑
     * (我們沒有在grouping_planner中這樣做,而是在最終決定中加入了initPlan的成本,儘管這不太可能改變任何事情)。
     */
    set_cheapest(final_rel);

    return root;
}
 

二、參考資料

allpaths.c
PG Document:Query Planning

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374817/,如需轉載,請註明出處,否則將追究法律責任。

相關文章