PostgreSQL 原始碼解讀(100)- 分割槽表#6(資料查詢路由#3-prune part...

husthxd發表於2018-12-03

本節介紹了PG在查詢分割槽表的時候如何確定查詢的是哪個分割槽。在規劃階段,函式set_rel_size中,如RTE為分割槽表(rte->inh=T),則呼叫set_append_rel_size函式,在set_append_rel_size中透過prune_append_rel_partitions函式獲取“仍存活”的分割槽,下面介紹了prune_append_rel_partitions函式的主邏輯和依賴的函式gen_partprune_steps。

一、資料結構

PartitionScheme
分割槽方案,根據設計,分割槽方案只包含分割槽方法的一般屬性(列表與範圍、分割槽列的數量和每個分割槽列的型別資訊),而不包含特定的分割槽邊界資訊。

/*
 * If multiple relations are partitioned the same way, all such partitions
 * will have a pointer to the same PartitionScheme.  A list of PartitionScheme
 * objects is attached to the PlannerInfo.  By design, the partition scheme
 * incorporates only the general properties of the partition method (LIST vs.
 * RANGE, number of partitioning columns and the type information for each)
 * and not the specific bounds.
 * 如果多個關係以相同的方式分割槽,那麼所有這些分割槽都將具有指向相同PartitionScheme的指標。
 * PartitionScheme物件的連結串列附加到PlannerInfo中。
 * 根據設計,分割槽方案只包含分割槽方法的一般屬性(列表與範圍、分割槽列的數量和每個分割槽列的型別資訊),
 *   而不包含特定的界限。
 *
 * We store the opclass-declared input data types instead of the partition key
 * datatypes since the former rather than the latter are used to compare
 * partition bounds. Since partition key data types and the opclass declared
 * input data types are expected to be binary compatible (per ResolveOpClass),
 * both of those should have same byval and length properties.
 * 我們儲存opclass-declared的輸入資料型別,而不是分割槽鍵資料型別,
 *   因為前者用於比較分割槽邊界,而不是後者。
 * 由於分割槽鍵資料型別和opclass-declared的輸入資料型別預期是二進位制相容的(每個ResolveOpClass),
 *   所以它們應該具有相同的byval和length屬性。
 */
typedef struct PartitionSchemeData
{
    char        strategy;       /* 分割槽策略;partition strategy */
    int16       partnatts;      /* 分割槽屬性個數;number of partition attributes */
    Oid        *partopfamily;   /* 運算子族OIDs;OIDs of operator families */
    Oid        *partopcintype;  /* opclass宣告的輸入資料型別的OIDs;OIDs of opclass declared input data types */
    Oid        *partcollation;  /* 分割槽排序規則OIDs;OIDs of partitioning collations */

    /* Cached information about partition key data types. */
    //快取有關分割槽鍵資料型別的資訊。
    int16      *parttyplen;
    bool       *parttypbyval;

    /* Cached information about partition comparison functions. */
    //快取有關分割槽比較函式的資訊。
    FmgrInfo   *partsupfunc;
}           PartitionSchemeData;

typedef struct PartitionSchemeData *PartitionScheme;

PartitionPruneXXX
執行Prune期間需要使用的資料結構,包括PartitionPruneStep/PartitionPruneStepOp/PartitionPruneCombineOp/PartitionPruneStepCombine

/*
 * Abstract Node type for partition pruning steps (there are no concrete
 * Nodes of this type).
 * 用於分割槽修剪步驟pruning的抽象節點型別(沒有這種型別的具體節點)。
 * 
 * step_id is the global identifier of the step within its pruning context.
 * step_id是步驟在其修剪pruning上下文中的全域性識別符號。
 */
typedef struct PartitionPruneStep
{
    NodeTag     type;
    int         step_id;
} PartitionPruneStep;

 /*
 * PartitionPruneStepOp - Information to prune using a set of mutually AND'd
 *                          OpExpr clauses
 * PartitionPruneStepOp - 使用一組AND操作的OpExpr條件子句進行修剪prune的資訊
 *
 * This contains information extracted from up to partnatts OpExpr clauses,
 * where partnatts is the number of partition key columns.  'opstrategy' is the
 * strategy of the operator in the clause matched to the last partition key.
 * 'exprs' contains expressions which comprise the lookup key to be passed to
 * the partition bound search function.  'cmpfns' contains the OIDs of
 * comparison functions used to compare aforementioned expressions with
 * partition bounds.  Both 'exprs' and 'cmpfns' contain the same number of
 * items, up to partnatts items.
 * 它包含從partnatts OpExpr子句中提取的資訊,
 *   其中partnatts是分割槽鍵列的數量。
 * “opstrategy”是子句中與最後一個分割槽鍵匹配的運算子的策略。
 * 'exprs'包含一些表示式,這些表示式包含要傳遞給分割槽繫結搜尋函式的查詢鍵。
 * “cmpfns”包含用於比較上述表示式與分割槽邊界的比較函式的OIDs。
 * “exprs”和“cmpfns”包含相同數量的條目,最多包含partnatts個條目。
 *
 * Once we find the offset of a partition bound using the lookup key, we
 * determine which partitions to include in the result based on the value of
 * 'opstrategy'.  For example, if it were equality, we'd return just the
 * partition that would contain that key or a set of partitions if the key
 * didn't consist of all partitioning columns.  For non-equality strategies,
 * we'd need to include other partitions as appropriate.
 * 一旦我們使用查詢鍵找到分割槽繫結的偏移量,
 *   我們將根據“opstrategy”的值確定在結果中包含哪些分割槽。
 * 例如,如果它是相等的,我們只返回包含該鍵的分割槽,或者如果該鍵不包含所有分割槽列,
 *  則返回一組分割槽。
 * 對於非等值的情況,需要適當地包括其他分割槽。
 *
 * 'nullkeys' is the set containing the offset of the partition keys (0 to
 * partnatts - 1) that were matched to an IS NULL clause.  This is only
 * considered for hash partitioning as we need to pass which keys are null
 * to the hash partition bound search function.  It is never possible to
 * have an expression be present in 'exprs' for a given partition key and
 * the corresponding bit set in 'nullkeys'.
 * 'nullkeys'是包含與is NULL子句匹配的分割槽鍵(0到partnatts - 1)偏移量的集合。
 * 這隻適用於雜湊分割槽,因為我們需要將哪些鍵為null傳遞給雜湊分割槽繫結搜尋函式。
 * 對於給定的分割槽鍵和“nullkeys”中設定的相應bit,不可能在“exprs”中出現表示式。
 */
typedef struct PartitionPruneStepOp
{
    PartitionPruneStep step;

    StrategyNumber opstrategy;
    List       *exprs;
    List       *cmpfns;
    Bitmapset  *nullkeys;
} PartitionPruneStepOp;

/*
 * PartitionPruneStepCombine - Information to prune using a BoolExpr clause
 * PartitionPruneStepCombine - 使用BoolExpr條件prune的資訊
 *
 * For BoolExpr clauses, we combine the set of partitions determined for each
 * of the argument clauses.
 * 對於BoolExpr子句,我們為每個引數子句確定的分割槽集進行組合。
 */
typedef enum PartitionPruneCombineOp
{
    PARTPRUNE_COMBINE_UNION,
    PARTPRUNE_COMBINE_INTERSECT
} PartitionPruneCombineOp;

typedef struct PartitionPruneStepCombine
{
    PartitionPruneStep step;

    PartitionPruneCombineOp combineOp;
    List       *source_stepids;
} PartitionPruneStepCombine;


二、原始碼解讀

prune_append_rel_partitions函式返回必須掃描以滿足rel約束條件baserestrictinfo quals的最小子分割槽集的RT索引。

 
/*
 * prune_append_rel_partitions
 *      Returns RT indexes of the minimum set of child partitions which must
 *      be scanned to satisfy rel's baserestrictinfo quals.
 *      返回必須掃描以滿足rel約束條件baserestrictinfo quals的最小子分割槽集的RT索引。
 *
 * Callers must ensure that 'rel' is a partitioned table.
 * 呼叫者必須確保rel是分割槽表
 */
Relids
prune_append_rel_partitions(RelOptInfo *rel)
{
    Relids      result;
    List       *clauses = rel->baserestrictinfo;
    List       *pruning_steps;
    bool        contradictory;
    PartitionPruneContext context;
    Bitmapset  *partindexes;
    int         i;

    Assert(clauses != NIL);
    Assert(rel->part_scheme != NULL);

    /* If there are no partitions, return the empty set */
    //如無分割槽,則返回NULL
    if (rel->nparts == 0)
        return NULL;

    /*
     * Process clauses.  If the clauses are found to be contradictory, we can
     * return the empty set.
     * 處理條件子句.
     * 如果發現約束條件相互矛盾,返回NULL。
     */
    pruning_steps = gen_partprune_steps(rel, clauses, &contradictory);
    if (contradictory)
        return NULL;

    /* Set up PartitionPruneContext */
    //配置PartitionPruneContext上下文
    context.strategy = rel->part_scheme->strategy;
    context.partnatts = rel->part_scheme->partnatts;
    context.nparts = rel->nparts;
    context.boundinfo = rel->boundinfo;
    context.partcollation = rel->part_scheme->partcollation;
    context.partsupfunc = rel->part_scheme->partsupfunc;
    context.stepcmpfuncs = (FmgrInfo *) palloc0(sizeof(FmgrInfo) *
                                                context.partnatts *
                                                list_length(pruning_steps));
    context.ppccontext = CurrentMemoryContext;

    /* These are not valid when being called from the planner */
    //如從規劃器呼叫,這些狀態變數為NULL
    context.planstate = NULL;
    context.exprstates = NULL;
    context.exprhasexecparam = NULL;
    context.evalexecparams = false;

    /* Actual pruning happens here. */
    //這是實現邏輯
    partindexes = get_matching_partitions(&context, pruning_steps);

    /* Add selected partitions' RT indexes to result. */
    //把選中的分割槽RT索引放到結果中
    i = -1;
    result = NULL;
    while ((i = bms_next_member(partindexes, i)) >= 0)
        result = bms_add_member(result, rel->part_rels[i]->relid);

    return result;
}



/*
 * gen_partprune_steps
 *      Process 'clauses' (a rel's baserestrictinfo list of clauses) and return
 *      a list of "partition pruning steps"
 *      處理“子句”(rel的baserestrictinfo子句連結串列)並返回“分割槽pruning步驟”連結串列
 * 
 * If the clauses in the input list are contradictory or there is a
 * pseudo-constant "false", *contradictory is set to true upon return.
 * 如果輸入連結串列中的條件子句是互斥矛盾的,或者有一個偽常量“false”,返回時將*contradictory設定為true。
 */
static List *
gen_partprune_steps(RelOptInfo *rel, List *clauses, bool *contradictory)
{
    GeneratePruningStepsContext context;

    context.next_step_id = 0;
    context.steps = NIL;

    /* The clauses list may be modified below, so better make a copy. */
    //為確保安全,複製一份副本
    clauses = list_copy(clauses);

    /*
     * For sub-partitioned tables there's a corner case where if the
     * sub-partitioned table shares any partition keys with its parent, then
     * it's possible that the partitioning hierarchy allows the parent
     * partition to only contain a narrower range of values than the
     * sub-partitioned table does.  In this case it is possible that we'd
     * include partitions that could not possibly have any tuples matching
     * 'clauses'.  The possibility of such a partition arrangement is perhaps
     * unlikely for non-default partitions, but it may be more likely in the
     * case of default partitions, so we'll add the parent partition table's
     * partition qual to the clause list in this case only.  This may result
     * in the default partition being eliminated.
     * 對於子分割槽表,如果子分割槽表與其父分割槽表共享分割槽鍵,
     *     那麼分割槽層次結構可能只允許父分割槽包含比分割槽表更窄的值範圍。
     * 在這種情況下,可能會包含不可能有任何匹配“子句”的元組的分割槽。
     * 對於非預設分割槽,這種分割槽設計的可能性不大,但是對於預設分割槽,這種可能性更大,
     *     所以只在這種情況下將父分割槽表的分割槽條件qual新增到子句連結串列中。
     * 這可能會導致預設分割槽被消除。
     */
    if (partition_bound_has_default(rel->boundinfo) &&
        rel->partition_qual != NIL)
    {
        List       *partqual = rel->partition_qual;//分割槽條件連結串列

        partqual = (List *) expression_planner((Expr *) partqual);

        /* Fix Vars to have the desired varno */
        //修正Vars,以使其具備合適的編號varno
        if (rel->relid != 1)
            ChangeVarNodes((Node *) partqual, 1, rel->relid, 0);

        clauses = list_concat(clauses, partqual);//新增到條件連結串列中
    }

    /* Down into the rabbit-hole. */
    //進入到"兔子洞"中(實際生成步驟)
    gen_partprune_steps_internal(&context, rel, clauses, contradictory);

    return context.steps;
}
 


/*
 * gen_partprune_steps_internal
 *      Processes 'clauses' to generate partition pruning steps.
 * 處理條件“子句”以生成分割槽修剪(pruning)步驟。 
 *
 * From OpExpr clauses that are mutually AND'd, we find combinations of those
 * that match to the partition key columns and for every such combination,
 * we emit a PartitionPruneStepOp containing a vector of expressions whose
 * values are used as a look up key to search partitions by comparing the
 * values with partition bounds.  Relevant details of the operator and a
 * vector of (possibly cross-type) comparison functions is also included with
 * each step.
 * From中的OpExpr條件是AND,
 *   我們發現那些匹配的組合分割槽鍵列,對於每一個這樣的組合,
 *   構造PartitionPruneStepOp結構體,
 *   以包含一個向量表示式的值用作查詢關鍵搜尋分割槽比較值和分割槽範圍。
 * 每個步驟還包括運算子的相關細節和(可能是交叉型別的)比較函式的向量。
 *
 * For BoolExpr clauses, we recursively generate steps for each argument, and
 * return a PartitionPruneStepCombine of their results.
 * 對於BoolExpr子句,我們遞迴地為每個引數生成步驟,
 *   並組合他們的結果返回PartitionPruneStepCombine。
 *
 * The return value is a list of the steps generated, which are also added to
 * the context's steps list.  Each step is assigned a step identifier, unique
 * even across recursive calls.
 * 返回值是生成的步驟連結串列,這些步驟也被新增到上下文的步驟連結串列中。
 * 每個步驟都被分配一個步驟識別符號,即使在遞迴呼叫中也是惟一的。
 *
 * If we find clauses that are mutually contradictory, or a pseudoconstant
 * clause that contains false, we set *contradictory to true and return NIL
 * (that is, no pruning steps).  Caller should consider all partitions as
 * pruned in that case.  Otherwise, *contradictory is set to false.
 * 如果我們發現相互矛盾的子句,或包含false的偽常量子句,
 *   我們將*contradictory設定為true並返回NIL(即沒有修剪步驟).
 * 在這種情況下,呼叫方應將所有分割槽視為pruned的。
 * 否則,*contradictory設定為false。
 *
 * Note: the 'clauses' List may be modified inside this function. Callers may
 * like to make a copy of it before passing them to this function.
 * 注:“子句”連結串列可以在此函式中修改。
 * 呼叫者可能希望在將其傳遞給此函式之前複製它。
 */
static List *
gen_partprune_steps_internal(GeneratePruningStepsContext *context,
                             RelOptInfo *rel, List *clauses,
                             bool *contradictory)
{
    PartitionScheme part_scheme = rel->part_scheme;
    List       *keyclauses[PARTITION_MAX_KEYS];
    Bitmapset  *nullkeys = NULL,
               *notnullkeys = NULL;
    bool        generate_opsteps = false;
    List       *result = NIL;
    ListCell   *lc;

    *contradictory = false;

    memset(keyclauses, 0, sizeof(keyclauses));
    foreach(lc, clauses)
    {
        Expr       *clause = (Expr *) lfirst(lc);
        int         i;

        /* Look through RestrictInfo, if any */
        //RestrictInfo型別
        if (IsA(clause, RestrictInfo))
            clause = ((RestrictInfo *) clause)->clause;

        /* Constant-false-or-null is contradictory */
        //False或者是NULL,設定為互斥,返回NIL
        if (IsA(clause, Const) &&
            (((Const *) clause)->constisnull ||
             !DatumGetBool(((Const *) clause)->constvalue)))
        {
            *contradictory = true;
            return NIL;
        }

        /* Get the BoolExpr's out of the way. */
        //Bool表示式
        if (IsA(clause, BoolExpr))
        {
            /*
             * Generate steps for arguments.
             * 生成步驟
             *
             * While steps generated for the arguments themselves will be
             * added to context->steps during recursion and will be evaluated
             * independently, collect their step IDs to be stored in the
             * combine step we'll be creating.
             * 在遞迴過程中,為引數本身生成的步驟將被新增到context->steps中,
             *   並將獨立地進行解析,同時收集它們的步驟id,以便儲存在我們即將建立的組合步驟中。
             */
            if (or_clause((Node *) clause))
            {
                //OR
                List       *arg_stepids = NIL;
                bool        all_args_contradictory = true;
                ListCell   *lc1;

                /*
                 * Get pruning step for each arg.  If we get contradictory for
                 * all args, it means the OR expression is false as a whole.
                 * 對每個引數進行修剪(pruning)。
                 * 如果對條件中的所有arg都是互斥的,這意味著OR表示式作為一個整體是錯誤的。
                 */
                foreach(lc1, ((BoolExpr *) clause)->args)//遍歷條件引數
                {
                    Expr       *arg = lfirst(lc1);
                    bool        arg_contradictory;
                    List       *argsteps;

                    argsteps =
                        gen_partprune_steps_internal(context, rel,
                                                     list_make1(arg),
                                                     &arg_contradictory);
                    if (!arg_contradictory)
                        all_args_contradictory = false;

                    if (argsteps != NIL)
                    {
                        PartitionPruneStep *step;

                        Assert(list_length(argsteps) == 1);
                        step = (PartitionPruneStep *) linitial(argsteps);
                        arg_stepids = lappend_int(arg_stepids, step->step_id);
                    }
                    else
                    {
                        /*
                         * No steps either means that arg_contradictory is
                         * true or the arg didn't contain a clause matching
                         * this partition key.
                         * 如無steps,則意味著要麼arg_contradictory為T要麼引數沒有
                         *   包含匹配分割槽鍵的條件子句
                         *
                         * In case of the latter, we cannot prune using such
                         * an arg.  To indicate that to the pruning code, we
                         * must construct a dummy PartitionPruneStepCombine
                         * whose source_stepids is set to an empty List.
                         * However, if we can prove using constraint exclusion
                         * that the clause refutes the table's partition
                         * constraint (if it's sub-partitioned), we need not
                         * bother with that.  That is, we effectively ignore
                         * this OR arm.
                         * 如果是後者,我們不能使用這樣的引數進行修剪prune。
                         * 為了向修剪prune程式碼表明這一點,我們必須構造一個虛構的PartitionPruneStepCombine,
                         *   它的source_stepids設定為一個空列表。
                         * 但是,如果我們可以使用約束排除證明條件子句與表的分割槽約束(如果它是子分割槽的)互斥,
                         *   那麼就不需要為此費心了。也就是說,實際上忽略了這個OR。
                         * 
                         */
                        List       *partconstr = rel->partition_qual;
                        PartitionPruneStep *orstep;

                        /* Just ignore this argument. */
                        //忽略該引數
                        if (arg_contradictory)
                            continue;

                        if (partconstr)
                        {
                            partconstr = (List *)
                                expression_planner((Expr *) partconstr);
                            if (rel->relid != 1)
                                ChangeVarNodes((Node *) partconstr, 1,
                                               rel->relid, 0);
                            if (predicate_refuted_by(partconstr,
                                                     list_make1(arg),
                                                     false))//沒有匹配分割槽鍵
                                continue;
                        }
                        //構造PARTPRUNE_COMBINE_UNION步驟
                        orstep = gen_prune_step_combine(context, NIL,
                                                        PARTPRUNE_COMBINE_UNION);
                        //ID
                        arg_stepids = lappend_int(arg_stepids, orstep->step_id);
                    }
                }
                //輸出引數賦值
                *contradictory = all_args_contradictory;

                /* Check if any contradicting clauses were found */
                //檢查是否互斥,如是則返回NIL
                if (*contradictory)
                    return NIL;

                if (arg_stepids != NIL)
                {
                    PartitionPruneStep *step;
                    //構造step
                    step = gen_prune_step_combine(context, arg_stepids,
                                                  PARTPRUNE_COMBINE_UNION);
                    result = lappend(result, step);
                }
                continue;
            }
            else if (and_clause((Node *) clause))
            {
                //AND
                List       *args = ((BoolExpr *) clause)->args;//引數連結串列
                List       *argsteps,
                           *arg_stepids = NIL;
                ListCell   *lc1;

                /*
                 * args may itself contain clauses of arbitrary type, so just
                 * recurse and later combine the component partitions sets
                 * using a combine step.
                 * args本身可能包含任意型別的子句,
                 *   因此只需遞迴,然後使用組合步驟來對元件分割槽集合進行組合。
                 */
                argsteps = gen_partprune_steps_internal(context, rel, args,
                                                        contradictory);
                if (*contradictory)
                    return NIL;//互斥,返回NIL

                foreach(lc1, argsteps)//遍歷步驟
                {
                    PartitionPruneStep *step = lfirst(lc1);

                    arg_stepids = lappend_int(arg_stepids, step->step_id);
                }

                if (arg_stepids != NIL)//組合步驟
                {
                    PartitionPruneStep *step;

                    step = gen_prune_step_combine(context, arg_stepids,
                                                  PARTPRUNE_COMBINE_INTERSECT);
                    result = lappend(result, step);
                }
                continue;
            }

            /*
             * Fall-through for a NOT clause, which if it's a Boolean clause,
             * will be handled in match_clause_to_partition_key(). We
             * currently don't perform any pruning for more complex NOT
             * clauses.
             * NOT子句的Fall-through(如果是Boolean子句)將在match_clause_to_partition_key()中處理。
             * 目前不為更復雜的NOT子句執行任何修剪pruning。
             */
        }

        /*
         * Must be a clause for which we can check if one of its args matches
         * the partition key.
         * 必須是一個條件子句,我們才可以檢查它的一個引數是否與分割槽鍵匹配。
         */
        for (i = 0; i < part_scheme->partnatts; i++)
        {
            Expr       *partkey = linitial(rel->partexprs[i]);//分割槽鍵
            bool        clause_is_not_null = false;
            PartClauseInfo *pc = NULL;//分割槽條件資訊
            List       *clause_steps = NIL;
            //嘗試將給定的“條件子句”與指定的分割槽鍵匹配。
            switch (match_clause_to_partition_key(rel, context,
                                                  clause, partkey, i,
                                                  &clause_is_not_null,
                                                  &pc, &clause_steps))
            {
                //存在匹配項,輸出引數為條件
                case PARTCLAUSE_MATCH_CLAUSE:
                    Assert(pc != NULL);

                    /*
                     * Since we only allow strict operators, check for any
                     * contradicting IS NULL.
                     * 因為我們只允許嚴格的運算子,所以檢查任何互斥條件時都是NULL。
                     */
                    if (bms_is_member(i, nullkeys))
                    {
                        *contradictory = true;
                        return NIL;
                    }
                    generate_opsteps = true;
                    keyclauses[i] = lappend(keyclauses[i], pc);
                    break;
                //存在匹配項,匹配的子句是“a is NULL”或“a is NOT NULL”子句    
                case PARTCLAUSE_MATCH_NULLNESS:
                    if (!clause_is_not_null)
                    {
                        /* check for conflicting IS NOT NULL */
                        if (bms_is_member(i, notnullkeys))
                        {
                            *contradictory = true;
                            return NIL;
                        }
                        nullkeys = bms_add_member(nullkeys, i);
                    }
                    else
                    {
                        /* check for conflicting IS NULL */
                        if (bms_is_member(i, nullkeys))
                        {
                            *contradictory = true;
                            return NIL;
                        }
                        notnullkeys = bms_add_member(notnullkeys, i);
                    }
                    break;
                //存在匹配項,輸出引數是步驟    
                case PARTCLAUSE_MATCH_STEPS:
                    Assert(clause_steps != NIL);
                    result = list_concat(result, clause_steps);
                    break;

                case PARTCLAUSE_MATCH_CONTRADICT:
                    /* We've nothing more to do if a contradiction was found. */
                    *contradictory = true;
                    return NIL;
                //不存在匹配項
                case PARTCLAUSE_NOMATCH:

                    /*
                     * Clause didn't match this key, but it might match the
                     * next one.
                     * 子句與這個鍵不匹配,但它可能與下一個鍵匹配。
                     */
                    continue;
                //該子句不能用於pruning
                case PARTCLAUSE_UNSUPPORTED:
                    /* This clause cannot be used for pruning. */
                    break;
            }

            /* done; go check the next clause. */
            //完成一個子句的處理,繼續下一個
            break;
        }
    }

    /*-----------
     * Now generate some (more) pruning steps.  We have three strategies:
     * 現在生成一些(更多)修剪pruning步驟。有三個策略:
     *
     * 1) Generate pruning steps based on IS NULL clauses:
     *   a) For list partitioning, null partition keys can only be found in
     *      the designated null-accepting partition, so if there are IS NULL
     *      clauses containing partition keys we should generate a pruning
     *      step that gets rid of all partitions but that one.  We can
     *      disregard any OpExpr we may have found.
     *   b) For range partitioning, only the default partition can contain
     *      NULL values, so the same rationale applies.
     *   c) For hash partitioning, we only apply this strategy if we have
     *      IS NULL clauses for all the keys.  Strategy 2 below will take
     *      care of the case where some keys have OpExprs and others have
     *      IS NULL clauses.
     * 1) 基於IS NULL子句生成修剪步驟:
     *   a)對於列表分割槽,空分割槽鍵只能在指定的接受null的分割槽中找到,
     *     因此,如果存在包含分割槽鍵的null子句,我們應該生成一個刪除步驟,
     *     刪除除該分割槽之外的所有分割槽。
     *     可以忽略我們可能發現的任何OpExpr。
     *   b)對於範圍分割槽,只有預設分割槽可以包含NULL值,所以應用相同的原理。
     *   c)對於雜湊分割槽,只有在所有鍵都有IS NULL子句時才應用這種策略。
     *     下面的策略2將處理一些鍵具有OpExprs而另一些鍵具有IS NULL子句的情況。
     *   
     * 2) If not, generate steps based on OpExprs we have (if any).
     * 2) 如果沒有,根據我們擁有的OpExprs生成步驟(如果有)。
     *
     * 3) If this doesn't work either, we may be able to generate steps to
     *    prune just the null-accepting partition (if one exists), if we have
     *    IS NOT NULL clauses for all partition keys.
     * 3) 如果這兩種方法都不起作用,那麼如果我們對所有分割槽鍵都有IS NOT NULL子句,
     *    我們就可以生成步驟來只刪除接受NULL的分割槽(如果存在的話)。
     */
    if (!bms_is_empty(nullkeys) &&
        (part_scheme->strategy == PARTITION_STRATEGY_LIST ||
         part_scheme->strategy == PARTITION_STRATEGY_RANGE ||
         (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
          bms_num_members(nullkeys) == part_scheme->partnatts)))
    {
        PartitionPruneStep *step;

        /* Strategy 1 */
        //策略1
        step = gen_prune_step_op(context, InvalidStrategy,
                                 false, NIL, NIL, nullkeys);
        result = lappend(result, step);
    }
    else if (generate_opsteps)
    {
        PartitionPruneStep *step;

        /* Strategy 2 */
        //策略2
        step = gen_prune_steps_from_opexps(part_scheme, context,
                                           keyclauses, nullkeys);
        if (step != NULL)
            result = lappend(result, step);
    }
    else if (bms_num_members(notnullkeys) == part_scheme->partnatts)
    {
        PartitionPruneStep *step;

        /* Strategy 3 */
        //策略3
        step = gen_prune_step_op(context, InvalidStrategy,
                                 false, NIL, NIL, NULL);
        result = lappend(result, step);
    }

    /*
     * Finally, results from all entries appearing in result should be
     * combined using an INTERSECT combine step, if more than one.
     * 最後,如果結果中出現的所有條目的結果多於一個,則應該使用相交組合步驟組合。
     */
    if (list_length(result) > 1)
    {
        List       *step_ids = NIL;

        foreach(lc, result)
        {
            PartitionPruneStep *step = lfirst(lc);

            step_ids = lappend_int(step_ids, step->step_id);
        }

        if (step_ids != NIL)
        {
            PartitionPruneStep *step;

            step = gen_prune_step_combine(context, step_ids,
                                          PARTPRUNE_COMBINE_INTERSECT);
            result = lappend(result, step);
        }
    }

    return result;
}
 

三、跟蹤分析

測試指令碼如下

testdb=# explain verbose select * from t_hash_partition where c1 = 1 OR c1 = 2;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Append  (cost=0.00..30.53 rows=6 width=200)
   ->  Seq Scan on public.t_hash_partition_1  (cost=0.00..15.25 rows=3 width=200)
         Output: t_hash_partition_1.c1, t_hash_partition_1.c2, t_hash_partition_1.c3
         Filter: ((t_hash_partition_1.c1 = 1) OR (t_hash_partition_1.c1 = 2))
   ->  Seq Scan on public.t_hash_partition_3  (cost=0.00..15.25 rows=3 width=200)
         Output: t_hash_partition_3.c1, t_hash_partition_3.c2, t_hash_partition_3.c3
         Filter: ((t_hash_partition_3.c1 = 1) OR (t_hash_partition_3.c1 = 2))
(7 rows)

啟動gdb,設定斷點

(gdb) b prune_append_rel_partitions 
Breakpoint 1 at 0x804b07: file partprune.c, line 555.
(gdb) c
Continuing.

Breakpoint 1, prune_append_rel_partitions (rel=0x20faba0) at partprune.c:555
555     List       *clauses = rel->baserestrictinfo;

獲取約束條件

(gdb) n
562     Assert(clauses != NIL);
(gdb) 
563     Assert(rel->part_scheme != NULL);
(gdb) 
566     if (rel->nparts == 0)
(gdb) 
573     pruning_steps = gen_partprune_steps(rel, clauses, &contradictory);

進入gen_partprune_steps

(gdb) step
gen_partprune_steps (rel=0x20faba0, clauses=0x21c4d20, contradictory=0x7ffe1953a8d7) at partprune.c:505
505     context.next_step_id = 0;

gen_partprune_steps->判斷是否有預設分割槽(無)

(gdb) n
506     context.steps = NIL;
(gdb) n
509     clauses = list_copy(clauses);
(gdb) 
524     if (partition_bound_has_default(rel->boundinfo) &&
(gdb) 

gen_partprune_steps_internal->進入gen_partprune_steps_internal

(gdb) step
gen_partprune_steps_internal (context=0x7ffe1953a830, rel=0x20faba0, clauses=0x21c4e00, contradictory=0x7ffe1953a8d7)
    at partprune.c:741
741     PartitionScheme part_scheme = rel->part_scheme;

gen_partprune_steps_internal->檢視分割槽方案(PartitionScheme)

(gdb) n
743     Bitmapset  *nullkeys = NULL,
(gdb) p *part_scheme
$1 = {strategy = 104 'h', partnatts = 1, partopfamily = 0x21c3180, partopcintype = 0x21c31a0, partcollation = 0x21c31c0, 
  parttyplen = 0x21c31e0, parttypbyval = 0x21c3200, partsupfunc = 0x21c3220}
(gdb) p *part_scheme->partopfamily
$2 = 1977
(gdb) p *part_scheme->partopcintype
$3 = 23
(gdb) p *part_scheme->partcollation
$4 = 0
(gdb) p *part_scheme->parttyplen
$5 = 4
(gdb) p *part_scheme->parttypbyval
$6 = true
(gdb) p *part_scheme->partsupfunc
$7 = {fn_addr = 0x4c85e7 <hashint4extended>, fn_oid = 425, fn_nargs = 2, fn_strict = true, fn_retset = false, 
  fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt = 0x20f8db0, fn_expr = 0x0}  

gen_partprune_steps_internal->SQL查詢結果
opfamily->integer_ops,整型操作

testdb=# select * from pg_opfamily where oid=1977;
 opfmethod |   opfname   | opfnamespace | opfowner 
-----------+-------------+--------------+----------
       405 | integer_ops |           11 |       10
(1 row)

gen_partprune_steps_internal->初始化變數

(gdb) n
744                *notnullkeys = NULL;
(gdb) 
745     bool        generate_opsteps = false;
(gdb) 
746     List       *result = NIL;
(gdb) 
749     *contradictory = false;
(gdb) 
751     memset(keyclauses, 0, sizeof(keyclauses));
(gdb) 

gen_partprune_steps_internal->迴圈處理條件子句

752     foreach(lc, clauses)
(gdb) n
754         Expr       *clause = (Expr *) lfirst(lc);
(gdb) p *clauses
$8 = {type = T_List, length = 1, head = 0x21c4dd8, tail = 0x21c4dd8}
(gdb) p *clause
$9 = {type = T_RestrictInfo}
(gdb) n
759             clause = ((RestrictInfo *) clause)->clause;
(gdb) 
762         if (IsA(clause, Const) &&
(gdb) p *clause
$10 = {type = T_BoolExpr}

gen_partprune_steps_internal->布林表示式,進入相應的處理邏輯

(gdb) n
771         if (IsA(clause, BoolExpr))
(gdb) 
781             if (or_clause((Node *) clause))
(gdb) 

gen_partprune_steps_internal->OR子句,進入相應的實現邏輯

(gdb) 
783                 List       *arg_stepids = NIL;
(gdb) 
(gdb) 
784                 bool        all_args_contradictory = true;
(gdb) 
791                 foreach(lc1, ((BoolExpr *) clause)->args)
(gdb) 
793                     Expr       *arg = lfirst(lc1);
(gdb) 
798                         gen_partprune_steps_internal(context, rel,
(gdb) 

gen_partprune_steps_internal->OR子句的相關資訊

(gdb) p *((BoolExpr *) clause)->args
$3 = {type = T_List, length = 2, head = 0x21bf138, tail = 0x21bf198}
(gdb) p *(OpExpr *)arg
$4 = {xpr = {type = T_OpExpr}, opno = 96, opfuncid = 65, opresulttyp

gen_partprune_steps_internal->遞迴呼叫gen_partprune_steps_internal,返回argsteps連結串列

797                     argsteps =
(gdb) n
801                     if (!arg_contradictory)
(gdb) 
802                         all_args_contradictory = false;
(gdb) 
804                     if (argsteps != NIL)
(gdb) 
808                         Assert(list_length(argsteps) == 1);
(gdb) p argsteps
$6 = (List *) 0x21c29b0
(gdb) p *argsteps
$7 = {type = T_List, length = 1, head = 0x21c2988, tail = 0x21c2988}
(gdb) p *(Node *)argsteps->head->data.ptr_value
$8 = {type = T_PartitionPruneStepOp}
(gdb) p *(PartitionPruneStepOp *)argsteps->head->data.ptr_value
$9 = {step = {type = T_PartitionPruneStepOp, step_id = 0}, opstrategy = 1, exprs = 0x21c2830, cmpfns = 0x21c27d0, 
  nullkeys = 0x0}

gen_partprune_steps_internal->構造step,繼續OR子句的下一個條件

(gdb) n
809                         step = (PartitionPruneStep *) linitial(argsteps);
(gdb) 
810                         arg_stepids = lappend_int(arg_stepids, step->step_id);
(gdb) 
(gdb) 
791                 foreach(lc1, ((BoolExpr *) clause)->args)

gen_partprune_steps_internal->遞迴呼叫gen_partprune_steps_internal,進入遞迴呼叫gen_partprune_steps_internal函式

(gdb) step
gen_partprune_steps_internal (context=0x7ffe1953a830, rel=0x20fab08, clauses=0x21c2a70, contradictory=0x7ffe1953a60f)
    at partprune.c:741
741     PartitionScheme part_scheme = rel->part_scheme;
(gdb) 
...

遞迴呼叫gen_partprune_steps_internal->遍歷條件

752     foreach(lc, clauses)
(gdb) 
754         Expr       *clause = (Expr *) lfirst(lc);
(gdb) 
758         if (IsA(clause, RestrictInfo))
(gdb) p *(Expr *)clause
$14 = {type = T_OpExpr}
(gdb) p *(OpExpr *)clause
$15 = {xpr = {type = T_OpExpr}, opno = 96, opfuncid = 65, opresulttype = 16, opretset = false, opcollid = 0, 
  inputcollid = 0, args = 0x21becf8, location = 50}
(gdb) n
762         if (IsA(clause, Const) &&
(gdb) 
771         if (IsA(clause, BoolExpr))
(gdb) 
918         for (i = 0; i < part_scheme->partnatts; i++)

遞迴呼叫gen_partprune_steps_internal->遍歷分割槽方案

(gdb) 
920             Expr       *partkey = linitial(rel->partexprs[i]);
(gdb) 
921             bool        clause_is_not_null = false;
(gdb) p *(Expr *)partkey
$16 = {type = T_Var}
(gdb) p *(Var *)partkey
$17 = {xpr = {type = T_Var}, varno = 1, varattno = 1, vartype = 23, vartypmod = -1, varcollid = 0, varlevelsup = 0, 
  varnoold = 1, varoattno = 1, location = -1}

遞迴呼叫gen_partprune_steps_internal->嘗試將給定的“條件子句”與指定的分割槽鍵匹配,match_clause_to_partition_key函式輸出結果為PARTCLAUSE_MATCH_CLAUSE(存在匹配項,輸出引數為條件)

(gdb) n
922             PartClauseInfo *pc = NULL;
(gdb) 
923             List       *clause_steps = NIL;
(gdb) 
925             switch (match_clause_to_partition_key(rel, context,
(gdb) 
931                     Assert(pc != NULL);
(gdb) 
937                     if (bms_is_member(i, nullkeys))
942                     generate_opsteps = true;
(gdb) 
943                     keyclauses[i] = lappend(keyclauses[i], pc);
(gdb) 
944                     break;
(gdb) p keyclauses[i]
$18 = (List *) 0x21c2b08
(gdb) p *keyclauses[i]
$19 = {type = T_List, length = 1, head = 0x21c2ae0, tail = 0x21c2ae0}
(gdb) p *(Node *)keyclauses[i]->head->data.ptr_value
$20 = {type = T_Invalid}

遞迴呼叫gen_partprune_steps_internal->完成條件遍歷,開始生產pruning步驟,使用第2種策略(根據擁有的OpExprs生成步驟)生成

(gdb) n
752     foreach(lc, clauses)
(gdb) n
1019        if (!bms_is_empty(nullkeys) &&
(gdb) 
1032        else if (generate_opsteps)
(gdb) 
1037            step = gen_prune_steps_from_opexps(part_scheme, context,
(gdb) n
1039            if (step != NULL)
(gdb) p *step
$21 = {type = T_PartitionPruneStepOp, step_id = 1}
(gdb) n
1040                result = lappend(result, step);
(gdb) 
1056        if (list_length(result) > 1)
(gdb) p *result
$22 = {type = T_List, length = 1, head = 0x21c2da0, tail = 0x21c2da0}
(gdb) n
1077        return result;
(gdb) 
1078    }
(gdb)     

gen_partprune_steps_internal->遞迴呼叫返回,完成OR子句的處理

(gdb) 
801                     if (!arg_contradictory)
(gdb) 
802                         all_args_contradictory = false;
(gdb) 
804                     if (argsteps != NIL)
(gdb) 
808                         Assert(list_length(argsteps) == 1);
(gdb) 
809                         step = (PartitionPruneStep *) linitial(argsteps);
(gdb) 
810                         arg_stepids = lappend_int(arg_stepids, step->step_id);
(gdb) 
791                 foreach(lc1, ((BoolExpr *) clause)->args)
(gdb) 
(gdb) 
855                 *contradictory = all_args_contradictory;
(gdb) 
858                 if (*contradictory)
(gdb) p all_args_contradictory
$23 = false
(gdb) n
861                 if (arg_stepids != NIL)
(gdb) 
865                     step = gen_prune_step_combine(context, arg_stepids,
(gdb) 
867                     result = lappend(result, step);
(gdb) 
869                 continue;
(gdb) p *step
$24 = {<text variable, no debug info>} 0x7f4522678be0 <__step>
(gdb) p *result
$25 = {type = T_List, length = 1, head = 0x21c2e88, tail = 0x21c2e88}

gen_partprune_steps_internal->完成所有條件子句的遍歷,返回result

(gdb) n
752     foreach(lc, clauses)
(gdb) 
1019        if (!bms_is_empty(nullkeys) &&
(gdb) 
1032        else if (generate_opsteps)
(gdb) 
1042        else if (bms_num_members(notnullkeys) == part_scheme->partnatts)
(gdb) 
1056        if (list_length(result) > 1)
(gdb) 
1077        return result;
(gdb) 
1078    }
(gdb) 

gen_partprune_steps->回到gen_partprune_steps,返回steps連結串列

(gdb) 
gen_partprune_steps (rel=0x20fab08, clauses=0x21c2390, contradictory=0x7ffe1953a8d7) at partprune.c:541
541     return context.steps;
(gdb) p *result
$26 = 0 '\000'
(gdb) p context.steps
$27 = (List *) 0x21c2890
(gdb) p *context.steps
$28 = {type = T_List, length = 3, head = 0x21c2868, tail = 0x21c2e60}
$29 = {type = T_PartitionPruneStepOp}
(gdb) p *(PartitionPruneStepOp *)context.steps->head->data.ptr_value
$30 = {step = {type = T_PartitionPruneStepOp, step_id = 0}, opstrategy = 1, exprs = 0x21c2830, cmpfns = 0x21c27d0, 
  nullkeys = 0x0}
(gdb) p *(PartitionPruneStepOp *)context.steps->head->next->data.ptr_value
$31 = {step = {type = T_PartitionPruneStepOp, step_id = 1}, opstrategy = 1, exprs = 0x21c2c28, cmpfns = 0x21c2bc8, 
  nullkeys = 0x0}
(gdb) p *(PartitionPruneStepOp *)context.steps->head->next->next->data.ptr_value
$32 = {step = {type = T_PartitionPruneStepCombine, step_id = 2}, opstrategy = 0, exprs = 0x21c2a10, cmpfns = 0x7e, 
  nullkeys = 0x10}
(gdb) 

gen_partprune_steps->回到prune_append_rel_partitions

(gdb) n
542 }
(gdb) 
prune_append_rel_partitions (rel=0x20fab08) at partprune.c:574
574     if (contradictory)
(gdb) 

prune_append_rel_partitions->設定上下文環境

(gdb) 
578     context.strategy = rel->part_scheme->strategy;
(gdb) 
579     context.partnatts = rel->part_scheme->partnatts;
...

prune_append_rel_partitions->呼叫get_matching_partitions,獲取匹配的分割槽編號(Indexes)
結果為5,即陣列下標為0和2的Rel(part_rels陣列)

597     partindexes = get_matching_partitions(&context, pruning_steps);
(gdb) 
600     i = -1;
(gdb) p partindexes
$33 = (Bitmapset *) 0x21c2ff8
(gdb) p *partindexes
$34 = {nwords = 1, words = 0x21c2ffc}
(gdb) p *partindexes->words
$35 = 5

prune_append_rel_partitions->生成Relids
結果為40,即8+32,即3號和5號Rel

(gdb) n
601     result = NULL;
(gdb) 
602     while ((i = bms_next_member(partindexes, i)) >= 0)
(gdb) 
603         result = bms_add_member(result, rel->part_rels[i]->relid);
(gdb) p i
$39 = 0
(gdb) n
602     while ((i = bms_next_member(partindexes, i)) >= 0)
(gdb) 
603         result = bms_add_member(result, rel->part_rels[i]->relid);
(gdb) p i
$40 = 2
(gdb) n
602     while ((i = bms_next_member(partindexes, i)) >= 0)
(gdb) 
605     return result;
(gdb) p result
$41 = (Relids) 0x21c3018
(gdb) p *result
$42 = {nwords = 1, words = 0x21c301c}
(gdb) p result->words[0]
$43 = 40

prune_append_rel_partitions->完成呼叫

606 }
(gdb) 
set_append_rel_size (root=0x2120378, rel=0x20fab08, rti=1, rte=0x20fa3d0) at allpaths.c:922
922         did_pruning = true;
(gdb) 

DONE!

四、參考資料

Parallel Append implementation
Partition Elimination in PostgreSQL 11

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374790/,如需轉載,請註明出處,否則將追究法律責任。

相關文章