PostgreSQL 原始碼解讀(33)- 查詢語句#18(查詢優化-表示式預處理#3)
本節簡單介紹了PG查詢優化表示式預處理中的規範化過程。規範化具體的做法一是忽略NULL以及OR中的False,And中的True(實現函式find_duplicate_ors),二是拉平謂詞(實現函式:pull_ors/pull_ands),三是清除重複的ORs(實現函式process_duplicate_ors)。這些函式位於檔案src/backend/optimizer/prep/prepqual.c中。
一、布林代數基礎
規範化處理基於布林/邏輯代數運算的相關基本定律:
冪等律
A∪A = A
A∩A = A
交換律
A∪B = B∪A
A∩B = B∩A
結合律
A∪(B∪С) = (A∪B) ∪ С
A∩(B∩С) = (A∩B)∩ С
吸收律
A∪(A∩B) = A
A∩(A∪B) = A
分配律
A∪(B∩С)=(A∪B)∩(A∪С)
A∩(B∪С)=(A∩B)∪(A∩С)
么元律
0∪A = A
1∩A = A
1∪A = 1
0∩A = 0
補餘律
A∪A' = 1
A∩A' = 0
二、基本概念
PG原始碼對規範化表示式的註釋如下:
/*
* canonicalize_qual
* Convert a qualification expression to the most useful form.
* 轉換為最常規形式的表示式
*
* This is primarily intended to be used on top-level WHERE (or JOIN/ON)
* clauses. It can also be used on top-level CHECK constraints, for which
* pass is_check = true. DO NOT call it on any expression that is not known
* to be one or the other, as it might apply inappropriate simplifications.
* 規範化注意用於最高層的Where/Check(輸入引數:is_check = true)語句中,如果不是這類語句,不要使用否則會產生不匹配的簡化(PG這裡加入了DEBUG資訊)
*
* The name of this routine is a holdover from a time when it would try to
* force the expression into canonical AND-of-ORs or OR-of-ANDs form.
* Eventually, we recognized that that had more theoretical purity than
* actual usefulness, and so now the transformation doesn't involve any
* notion of reaching a canonical form.
*
* NOTE: we assume the input has already been through eval_const_expressions
* and therefore possesses AND/OR flatness. Formerly this function included
* its own flattening logic, but that requires a useless extra pass over the
* tree.
*
* Returns the modified qualification.
*/
忽略NULL以及OR中的False/AND中的TRUE
Where條件語句中的NULL/FALSE/TRUE,如能忽略,忽略之.如:NULL OR FALSE OR dwbh = '1001',則忽略NULL/ FALSE
testdb=# explain verbose select * from t_dwxx where NULL OR FALSE OR dwbh = '1001' ;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on public.t_dwxx (cost=0.00..12.00 rows=1 width=474)
Output: dwmc, dwbh, dwdz
Filter: ((t_dwxx.dwbh)::text = '1001'::text)
(3 rows)
拉平謂詞
SQL語句中的X1 OR/AND (X2 OR/AND X3),拉平簡化為OR/AND(X1,X2,X3)
X1 OR/AND (X2 OR/AND X3),在查詢樹中為樹狀結構,第一層節點是BoolExpr,該Node中的args連結串列有2個元素,args->1=X1,args->2=BoolExpr,args->2->1=X2,args->2->2=X3,組成樹狀結構.簡化後args->1/2/3=X1/X2/X3,所有條件處於同一個層次上,並不是樹狀結構.
testdb=# explain select * from t_dwxx where dwbh = '1001' OR (dwbh = '1002' OR dwbh = '1003');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on t_dwxx (cost=0.00..12.80 rows=3 width=474)
Filter: (((dwbh)::text = '1001'::text) OR ((dwbh)::text = '1002'::text) OR ((dwbh)::text = '1003'::text))
(2 rows)
清除重複ORs
清除重複ORs的數學基礎是布林(邏輯)代數:
(X1 AND X2) OR (X1 AND X3) 應用分配律可以改寫為 X1 AND (X2 OR X3),這樣改寫的目的是把X1抽取出來,為後續下推謂詞X1作準備.
如(dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003')條件,會改寫為dwbh = '1001' AND (dwbh = '1002' OR dwbh = '1003')
testdb=# explain verbose select * from t_dwxx where (dwbh = '1001' AND dwbh = '1002') OR (dwbh = '1001' AND dwbh = '1003');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
--------
Seq Scan on public.t_dwxx (cost=0.00..12.80 rows=1 width=474)
Output: dwmc, dwbh, dwdz
Filter: (((t_dwxx.dwbh)::text = '1001'::text) AND (((t_dwxx.dwbh)::text = '1002'::text) OR ((t_dwxx.dwbh)::text = '1003'::
text)))
(3 rows)
三、原始碼解讀
主函式入口:
subquery_planner
/*
輸入:
glob-PlannerGlobal
parse-Query結構體指標
parent_root-父PlannerInfo Root節點
hasRecursion-是否遞迴?
tuple_fraction-掃描Tuple比例
輸出:
PlannerInfo指標
*/
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
PlannerInfo *parent_root,
bool hasRecursion, double tuple_fraction)
{
PlannerInfo *root;//返回值
List *newWithCheckOptions;//
List *newHaving;//Having子句
bool hasOuterJoins;//是否存在Outer Join?
RelOptInfo *final_rel;//
ListCell *l;//臨時變數
/* Create a PlannerInfo data structure for this subquery */
root = makeNode(PlannerInfo);//構造返回值
root->parse = parse;
root->glob = glob;
root->query_level = parent_root ? parent_root->query_level + 1 : 1;
root->parent_root = parent_root;
root->plan_params = NIL;
root->outer_params = NULL;
root->planner_cxt = CurrentMemoryContext;
root->init_plans = NIL;
root->cte_plan_ids = NIL;
root->multiexpr_params = NIL;
root->eq_classes = NIL;
root->append_rel_list = NIL;
root->rowMarks = NIL;
memset(root->upper_rels, 0, sizeof(root->upper_rels));
memset(root->upper_targets, 0, sizeof(root->upper_targets));
root->processed_tlist = NIL;
root->grouping_map = NULL;
root->minmax_aggs = NIL;
root->qual_security_level = 0;
root->inhTargetKind = INHKIND_NONE;
root->hasRecursion = hasRecursion;
if (hasRecursion)
root->wt_param_id = SS_assign_special_param(root);
else
root->wt_param_id = -1;
root->non_recursive_path = NULL;
root->partColsUpdated = false;
/*
* If there is a WITH list, process each WITH query and build an initplan
* SubPlan structure for it.
*/
if (parse->cteList)
SS_process_ctes(root);//處理With 語句
/*
* Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
* to transform them into joins. Note that this step does not descend
* into subqueries; if we pull up any subqueries below, their SubLinks are
* processed just before pulling them up.
*/
if (parse->hasSubLinks)
pull_up_sublinks(root); //上拉子連結
/*
* Scan the rangetable for set-returning functions, and inline them if
* possible (producing subqueries that might get pulled up next).
* Recursion issues here are handled in the same way as for SubLinks.
*/
inline_set_returning_functions(root);//
/*
* Check to see if any subqueries in the jointree can be merged into this
* query.
*/
pull_up_subqueries(root);//上拉子查詢
/*
* If this is a simple UNION ALL query, flatten it into an appendrel. We
* do this now because it requires applying pull_up_subqueries to the leaf
* queries of the UNION ALL, which weren't touched above because they
* weren't referenced by the jointree (they will be after we do this).
*/
if (parse->setOperations)
flatten_simple_union_all(root);//扁平化處理UNION ALL
/*
* Detect whether any rangetable entries are RTE_JOIN kind; if not, we can
* avoid the expense of doing flatten_join_alias_vars(). Also check for
* outer joins --- if none, we can skip reduce_outer_joins(). And check
* for LATERAL RTEs, too. This must be done after we have done
* pull_up_subqueries(), of course.
*/
//判斷RTE中是否存在RTE_JOIN?
root->hasJoinRTEs = false;
root->hasLateralRTEs = false;
hasOuterJoins = false;
foreach(l, parse->rtable)
{
RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
if (rte->rtekind == RTE_JOIN)
{
root->hasJoinRTEs = true;
if (IS_OUTER_JOIN(rte->jointype))
hasOuterJoins = true;
}
if (rte->lateral)
root->hasLateralRTEs = true;
}
/*
* Preprocess RowMark information. We need to do this after subquery
* pullup (so that all non-inherited RTEs are present) and before
* inheritance expansion (so that the info is available for
* expand_inherited_tables to examine and modify).
*/
//預處理RowMark資訊
preprocess_rowmarks(root);
/*
* Expand any rangetable entries that are inheritance sets into "append
* relations". This can add entries to the rangetable, but they must be
* plain base relations not joins, so it's OK (and marginally more
* efficient) to do it after checking for join RTEs. We must do it after
* pulling up subqueries, else we'd fail to handle inherited tables in
* subqueries.
*/
//展開繼承表
expand_inherited_tables(root);
/*
* Set hasHavingQual to remember if HAVING clause is present. Needed
* because preprocess_expression will reduce a constant-true condition to
* an empty qual list ... but "HAVING TRUE" is not a semantic no-op.
*/
//是否存在Having表示式
root->hasHavingQual = (parse->havingQual != NULL);
/* Clear this flag; might get set in distribute_qual_to_rels */
root->hasPseudoConstantQuals = false;
/*
* Do expression preprocessing on targetlist and quals, as well as other
* random expressions in the querytree. Note that we do not need to
* handle sort/group expressions explicitly, because they are actually
* part of the targetlist.
*/
//預處理表示式:targetList(投影列)
parse->targetList = (List *)
preprocess_expression(root, (Node *) parse->targetList,
EXPRKIND_TARGET);
/* Constant-folding might have removed all set-returning functions */
if (parse->hasTargetSRFs)
parse->hasTargetSRFs = expression_returns_set((Node *) parse->targetList);
newWithCheckOptions = NIL;
foreach(l, parse->withCheckOptions)//witch Check Options
{
WithCheckOption *wco = lfirst_node(WithCheckOption, l);
wco->qual = preprocess_expression(root, wco->qual,
EXPRKIND_QUAL);
if (wco->qual != NULL)
newWithCheckOptions = lappend(newWithCheckOptions, wco);
}
parse->withCheckOptions = newWithCheckOptions;
//返回列資訊returningList
parse->returningList = (List *)
preprocess_expression(root, (Node *) parse->returningList,
EXPRKIND_TARGET);
//預處理條件表示式
preprocess_qual_conditions(root, (Node *) parse->jointree);
//預處理Having表示式
parse->havingQual = preprocess_expression(root, parse->havingQual,
EXPRKIND_QUAL);
//視窗函式
foreach(l, parse->windowClause)
{
WindowClause *wc = lfirst_node(WindowClause, l);
/* partitionClause/orderClause are sort/group expressions */
wc->startOffset = preprocess_expression(root, wc->startOffset,
EXPRKIND_LIMIT);
wc->endOffset = preprocess_expression(root, wc->endOffset,
EXPRKIND_LIMIT);
}
//Limit子句
parse->limitOffset = preprocess_expression(root, parse->limitOffset,
EXPRKIND_LIMIT);
parse->limitCount = preprocess_expression(root, parse->limitCount,
EXPRKIND_LIMIT);
//On Conflict子句
if (parse->onConflict)
{
parse->onConflict->arbiterElems = (List *)
preprocess_expression(root,
(Node *) parse->onConflict->arbiterElems,
EXPRKIND_ARBITER_ELEM);
parse->onConflict->arbiterWhere =
preprocess_expression(root,
parse->onConflict->arbiterWhere,
EXPRKIND_QUAL);
parse->onConflict->onConflictSet = (List *)
preprocess_expression(root,
(Node *) parse->onConflict->onConflictSet,
EXPRKIND_TARGET);
parse->onConflict->onConflictWhere =
preprocess_expression(root,
parse->onConflict->onConflictWhere,
EXPRKIND_QUAL);
/* exclRelTlist contains only Vars, so no preprocessing needed */
}
//集合操作(AppendRelInfo)
root->append_rel_list = (List *)
preprocess_expression(root, (Node *) root->append_rel_list,
EXPRKIND_APPINFO);
//RTE
/* Also need to preprocess expressions within RTEs */
foreach(l, parse->rtable)
{
RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
int kind;
ListCell *lcsq;
if (rte->rtekind == RTE_RELATION)
{
if (rte->tablesample)
rte->tablesample = (TableSampleClause *)
preprocess_expression(root,
(Node *) rte->tablesample,
EXPRKIND_TABLESAMPLE);//資料表取樣語句
}
else if (rte->rtekind == RTE_SUBQUERY)//子查詢
{
/*
* We don't want to do all preprocessing yet on the subquery's
* expressions, since that will happen when we plan it. But if it
* contains any join aliases of our level, those have to get
* expanded now, because planning of the subquery won't do it.
* That's only possible if the subquery is LATERAL.
*/
if (rte->lateral && root->hasJoinRTEs)
rte->subquery = (Query *)
flatten_join_alias_vars(root, (Node *) rte->subquery);
}
else if (rte->rtekind == RTE_FUNCTION)//函式
{
/* Preprocess the function expression(s) fully */
kind = rte->lateral ? EXPRKIND_RTFUNC_LATERAL : EXPRKIND_RTFUNC;
rte->functions = (List *)
preprocess_expression(root, (Node *) rte->functions, kind);
}
else if (rte->rtekind == RTE_TABLEFUNC)//TABLE FUNC
{
/* Preprocess the function expression(s) fully */
kind = rte->lateral ? EXPRKIND_TABLEFUNC_LATERAL : EXPRKIND_TABLEFUNC;
rte->tablefunc = (TableFunc *)
preprocess_expression(root, (Node *) rte->tablefunc, kind);
}
else if (rte->rtekind == RTE_VALUES)//VALUES子句
{
/* Preprocess the values lists fully */
kind = rte->lateral ? EXPRKIND_VALUES_LATERAL : EXPRKIND_VALUES;
rte->values_lists = (List *)
preprocess_expression(root, (Node *) rte->values_lists, kind);
}
/*
* Process each element of the securityQuals list as if it were a
* separate qual expression (as indeed it is). We need to do it this
* way to get proper canonicalization of AND/OR structure. Note that
* this converts each element into an implicit-AND sublist.
*/
foreach(lcsq, rte->securityQuals)
{
lfirst(lcsq) = preprocess_expression(root,
(Node *) lfirst(lcsq),
EXPRKIND_QUAL);
}
}
...//其他
return root;
}
preprocess_expression
/*
* preprocess_expression
* Do subquery_planner's preprocessing work for an expression,
* which can be a targetlist, a WHERE clause (including JOIN/ON
* conditions), a HAVING clause, or a few other things.
*/
static Node *
preprocess_expression(PlannerInfo *root, Node *expr, int kind)
{
/*
* Fall out quickly if expression is empty. This occurs often enough to
* be worth checking. Note that null->null is the correct conversion for
* implicit-AND result format, too.
*/
if (expr == NULL)
return NULL;
/*
* If the query has any join RTEs, replace join alias variables with
* base-relation variables. We must do this first, since any expressions
* we may extract from the joinaliasvars lists have not been preprocessed.
* For example, if we did this after sublink processing, sublinks expanded
* out from join aliases would not get processed. But we can skip this in
* non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since
* they can't contain any Vars of the current query level.
*/
if (root->hasJoinRTEs &&
!(kind == EXPRKIND_RTFUNC ||
kind == EXPRKIND_VALUES ||
kind == EXPRKIND_TABLESAMPLE ||
kind == EXPRKIND_TABLEFUNC))
expr = flatten_join_alias_vars(root, expr);//扁平化處理joinaliasvars,上節已介紹
/*
* Simplify constant expressions.
*
* Note: an essential effect of this is to convert named-argument function
* calls to positional notation and insert the current actual values of
* any default arguments for functions. To ensure that happens, we *must*
* process all expressions here. Previous PG versions sometimes skipped
* const-simplification if it didn't seem worth the trouble, but we can't
* do that anymore.
*
* Note: this also flattens nested AND and OR expressions into N-argument
* form. All processing of a qual expression after this point must be
* careful to maintain AND/OR flatness --- that is, do not generate a tree
* with AND directly under AND, nor OR directly under OR.
*/
expr = eval_const_expressions(root, expr);//簡化常量表示式
/*
* If it's a qual or havingQual, canonicalize it.
*/
if (kind == EXPRKIND_QUAL)
{
expr = (Node *) canonicalize_qual((Expr *) expr, false);//表示式規範化
#ifdef OPTIMIZER_DEBUG
printf("After canonicalize_qual()\n");
pprint(expr);
#endif
}
/* Expand SubLinks to SubPlans */
if (root->parse->hasSubLinks)//擴充套件子連結為子計劃
expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));
/*
* XXX do not insert anything here unless you have grokked the comments in
* SS_replace_correlation_vars ...
*/
/* Replace uplevel vars with Param nodes (this IS possible in VALUES) */
if (root->query_level > 1)
expr = SS_replace_correlation_vars(root, expr);//使用Param節點替換上層的Vars
/*
* If it's a qual or havingQual, convert it to implicit-AND format. (We
* don't want to do this before eval_const_expressions, since the latter
* would be unable to simplify a top-level AND correctly. Also,
* SS_process_sublinks expects explicit-AND format.)
*/
if (kind == EXPRKIND_QUAL)
expr = (Node *) make_ands_implicit((Expr *) expr);
return expr;
}
canonicalize_qual
/*
* canonicalize_qual
* Convert a qualification expression to the most useful form.
*
* This is primarily intended to be used on top-level WHERE (or JOIN/ON)
* clauses. It can also be used on top-level CHECK constraints, for which
* pass is_check = true. DO NOT call it on any expression that is not known
* to be one or the other, as it might apply inappropriate simplifications.
*
* The name of this routine is a holdover from a time when it would try to
* force the expression into canonical AND-of-ORs or OR-of-ANDs form.
* Eventually, we recognized that that had more theoretical purity than
* actual usefulness, and so now the transformation doesn't involve any
* notion of reaching a canonical form.
*
* NOTE: we assume the input has already been through eval_const_expressions
* and therefore possesses AND/OR flatness. Formerly this function included
* its own flattening logic, but that requires a useless extra pass over the
* tree.
*
* Returns the modified qualification.
*/
Expr *
canonicalize_qual(Expr *qual, bool is_check)//規範化表示式
{
Expr *newqual;
/* Quick exit for empty qual */
if (qual == NULL)
return NULL;
/* This should not be invoked on quals in implicit-AND format */
Assert(!IsA(qual, List));
/*
* Pull up redundant subclauses in OR-of-AND trees. We do this only
* within the top-level AND/OR structure; there's no point in looking
* deeper. Also remove any NULL constants in the top-level structure.
*/
newqual = find_duplicate_ors(qual, is_check);//執行實際處理邏輯
return newqual;
}
find_duplicate_ors
/*
* find_duplicate_ors
* Given a qualification tree with the NOTs pushed down, search for
* OR clauses to which the inverse OR distributive law might apply.
* Only the top-level AND/OR structure is searched.
*
* While at it, we remove any NULL constants within the top-level AND/OR
* structure, eg in a WHERE clause, "x OR NULL::boolean" is reduced to "x".
* In general that would change the result, so eval_const_expressions can't
* do it; but at top level of WHERE, we don't need to distinguish between
* FALSE and NULL results, so it's valid to treat NULL::boolean the same
* as FALSE and then simplify AND/OR accordingly. Conversely, in a top-level
* CHECK constraint, we may treat a NULL the same as TRUE.
*
* Returns the modified qualification. AND/OR flatness is preserved.
*/
static Expr *
find_duplicate_ors(Expr *qual, bool is_check)
{
if (or_clause((Node *) qual))//OR語句
{
List *orlist = NIL;
ListCell *temp;
/* Recurse */
foreach(temp, ((BoolExpr *) qual)->args)//遍歷args連結串列
{
Expr *arg = (Expr *) lfirst(temp);//獲取連結串列中的元素
arg = find_duplicate_ors(arg, is_check);//遞迴呼叫
/* Get rid of any constant inputs */
if (arg && IsA(arg, Const))//arg為常量
{
Const *carg = (Const *) arg;
if (is_check)//Check語句
{
/* Within OR in CHECK, drop constant FALSE */
if (!carg->constisnull && !DatumGetBool(carg->constvalue))
continue;//不為NULL而且為FALSE,繼續迴圈
//arg為NULL或者TURE,即NULL OR TRUE,簡化為TRUE
/* Constant TRUE or NULL, so OR reduces to TRUE */
return (Expr *) makeBoolConst(true, false);
}
else//Where條件語句
{
/* Within OR in WHERE, drop constant FALSE or NULL */
if (carg->constisnull || !DatumGetBool(carg->constvalue))
continue;//arg為NULL或者FALSE,繼續迴圈
//arg不為NULL而且為TRUE,即TRUE常量,直接返回arg
/* Constant TRUE, so OR reduces to TRUE */
return arg;
}
}
//加入結果連結串列
orlist = lappend(orlist, arg);
}
/* Flatten any ORs pulled up to just below here */
orlist = pull_ors(orlist);//扁平化ORs
/* Now we can look for duplicate ORs */
return process_duplicate_ors(orlist);//處理重複的ORs
}
else if (and_clause((Node *) qual))//AND語句
{
List *andlist = NIL;
ListCell *temp;
/* Recurse */
foreach(temp, ((BoolExpr *) qual)->args)//遍歷連結串列
{
Expr *arg = (Expr *) lfirst(temp);
arg = find_duplicate_ors(arg, is_check);
/* Get rid of any constant inputs */
if (arg && IsA(arg, Const))
{
Const *carg = (Const *) arg;
if (is_check)
{
/* Within AND in CHECK, drop constant TRUE or NULL */
if (carg->constisnull || DatumGetBool(carg->constvalue))
continue;//CHECK語句
/* Constant FALSE, so AND reduces to FALSE */
//不為空且值為FALSE,返回FALSE
return arg;
}
else
{
/* Within AND in WHERE, drop constant TRUE */
if (!carg->constisnull && DatumGetBool(carg->constvalue))
continue;
/* Constant FALSE or NULL, so AND reduces to FALSE */
//NULL OR FALSE,返回FALSE
return (Expr *) makeBoolConst(false, false);
}
}
andlist = lappend(andlist, arg);//加入到結果列表
}
/* Flatten any ANDs introduced just below here */
andlist = pull_ands(andlist);//扁平化處理AND
/* AND of no inputs reduces to TRUE */
if (andlist == NIL)//為空指標
return (Expr *) makeBoolConst(true, false);//返回TRUE
/* Single-expression AND just reduces to that expression */
if (list_length(andlist) == 1)//單個表示式
return (Expr *) linitial(andlist);//返回此表示式連結串列
/* Else we still need an AND node */
return make_andclause(andlist);//否則返回結果連結串列
}
else
return qual;//非AND/OR語句,直接返回結果
}
process_duplicate_ors
/*
* process_duplicate_ors
* Given a list of exprs which are ORed together, try to apply
* the inverse OR distributive law.
*
* Returns the resulting expression (could be an AND clause, an OR
* clause, or maybe even a single subexpression).
*/
static Expr *
process_duplicate_ors(List *orlist)
{
List *reference = NIL;
int num_subclauses = 0;
List *winners;
List *neworlist;
ListCell *temp;
/* OR of no inputs reduces to FALSE */
if (orlist == NIL)
return (Expr *) makeBoolConst(false, false);
/* Single-expression OR just reduces to that expression */
if (list_length(orlist) == 1)
return (Expr *) linitial(orlist);
/*
* Choose the shortest AND clause as the reference list --- obviously, any
* subclause not in this clause isn't in all the clauses. If we find a
* clause that's not an AND, we can treat it as a one-element AND clause,
* which necessarily wins as shortest.
*/
//遍歷OR連結串列,找到AND語句中約束條件最少的那個表示式
//與求解最小公約數同理,公共的謂詞只可能在此最小的表示式中產生
foreach(temp, orlist)
{
Expr *clause = (Expr *) lfirst(temp);
if (and_clause((Node *) clause))//AND語句
{
List *subclauses = ((BoolExpr *) clause)->args;
int nclauses = list_length(subclauses);
if (reference == NIL || nclauses < num_subclauses)
{
reference = subclauses;
num_subclauses = nclauses;
}
}
else//單個約束條件或者帶有表示式的約束條件,如(X1 AND X2) OR X3等
{
reference = list_make1(clause);
break;
}
}
/*
* Just in case, eliminate any duplicates in the reference list.
*/
reference = list_union(NIL, reference);//去掉重複的謂詞
/*
* Check each element of the reference list to see if it's in all the OR
* clauses. Build a new list of winning clauses.
*/
winners = NIL;
foreach(temp, reference)//遍歷連結串列
{
Expr *refclause = (Expr *) lfirst(temp);
bool win = true;
ListCell *temp2;
foreach(temp2, orlist)
{
Expr *clause = (Expr *) lfirst(temp2);
if (and_clause((Node *) clause))//該謂詞是否在連結串列中存在?
{
if (!list_member(((BoolExpr *) clause)->args, refclause))
{
win = false;
break;
}
}
else//該謂詞是否與單個條件表示式等價?
{
if (!equal(refclause, clause))
{
win = false;
break;
}
}
}
if (win)//找到了公共的謂詞
winners = lappend(winners, refclause);//加入到結果中
}
/*
* If no winners, we can't transform the OR
*/
if (winners == NIL)
return make_orclause(orlist);//如果找到,原樣返回
/*
* Generate new OR list consisting of the remaining sub-clauses.
*
* 找到了,產生新的條件
*
* If any clause degenerates to empty, then we have a situation like (A
* AND B) OR (A), which can be reduced to just A --- that is, the
* additional conditions in other arms of the OR are irrelevant.
*
* Note that because we use list_difference, any multiple occurrences of a
* winning clause in an AND sub-clause will be removed automatically.
*/
neworlist = NIL;
foreach(temp, orlist)//遍歷OR連結串列
{
Expr *clause = (Expr *) lfirst(temp);
if (and_clause((Node *) clause))//AND語句
{
List *subclauses = ((BoolExpr *) clause)->args;//獲取條件語句引數
subclauses = list_difference(subclauses, winners);//剔除相同部分
if (subclauses != NIL)//成功剔除,產生新的AND語句
{
if (list_length(subclauses) == 1)
neworlist = lappend(neworlist, linitial(subclauses));
else
neworlist = lappend(neworlist, make_andclause(subclauses));
}
else
{
neworlist = NIL; /* degenerate case, see above */
break;
}
}
else//不是AND語句
{
if (!list_member(winners, clause))//單個條件語句,直接新增條件
neworlist = lappend(neworlist, clause);
else
//OR語句?公共部分已提出,無需加入其他條件,直接返回
//根據吸收律,X AND (X OR B) 等價於X
{
neworlist = NIL; /* degenerate case, see above */
break;
}
}
}
/*
* Append reduced OR to the winners list, if it's not degenerate, handling
* the special case of one element correctly (can that really happen?).
* Also be careful to maintain AND/OR flatness in case we pulled up a
* sub-sub-OR-clause.
*/
if (neworlist != NIL)//新產生的連結串列
{
if (list_length(neworlist) == 1)
winners = lappend(winners, linitial(neworlist));
else
winners = lappend(winners, make_orclause(pull_ors(neworlist)));//拉平OR
}
/*
* And return the constructed AND clause, again being wary of a single
* element and AND/OR flatness.
*/
//返回結果
if (list_length(winners) == 1)
return (Expr *) linitial(winners);
else
return make_andclause(pull_ands(winners));//拉平AND
}
pull_ors
/*
* pull_ors
* Recursively flatten nested OR clauses into a single or-clause list.
*
* Input is the arglist of an OR clause.
* Returns the rebuilt arglist (note original list structure is not touched).
*/
static List *
pull_ors(List *orlist)
{
List *out_list = NIL;
ListCell *arg;
foreach(arg, orlist)
{
Node *subexpr = (Node *) lfirst(arg);
/*
* Note: we can destructively concat the subexpression's arglist
* because we know the recursive invocation of pull_ors will have
* built a new arglist not shared with any other expr. Otherwise we'd
* need a list_copy here.
*/
if (or_clause(subexpr))
out_list = list_concat(out_list,
pull_ors(((BoolExpr *) subexpr)->args));//遞迴拉平
else
out_list = lappend(out_list, subexpr);
}
return out_list;
}
pull_ands
/*
* pull_ands
* Recursively flatten nested AND clauses into a single and-clause list.
*
* Input is the arglist of an AND clause.
* Returns the rebuilt arglist (note original list structure is not touched).
*/
static List *
pull_ands(List *andlist)
{
List *out_list = NIL;
ListCell *arg;
foreach(arg, andlist)
{
Node *subexpr = (Node *) lfirst(arg);
/*
* Note: we can destructively concat the subexpression's arglist
* because we know the recursive invocation of pull_ands will have
* built a new arglist not shared with any other expr. Otherwise we'd
* need a list_copy here.
*/
if (and_clause(subexpr))
out_list = list_concat(out_list,
pull_ands(((BoolExpr *) subexpr)->args));//遞迴拉平
else
out_list = lappend(out_list, subexpr);
}
return out_list;
}
四、跟蹤分析
測試指令碼:
select * from t_dwxx
where (FALSE OR dwbh = '1001')
AND ((dwbh = '1001' AND dwbh = '1002')
OR (dwbh = '1001' AND dwbh = '1003'));
該語句經規範化後與以下SQL語句無異:
select * from t_dwxx
where dwbh = '1001' AND (dwbh='1002' OR dwbh='1003');
-- 執行計劃
testdb=# explain verbose select * from t_dwxx
where (FALSE OR dwbh = '1001')
AND ((dwbh = '1001' AND dwbh = '1002')
OR (dwbh = '1001' AND dwbh = '1003'));
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
--------
Seq Scan on public.t_dwxx (cost=0.00..12.80 rows=1 width=474)
Output: dwmc, dwbh, dwdz
Filter: (((t_dwxx.dwbh)::text = '1001'::text) AND (((t_dwxx.dwbh)::text = '1002'::text) OR ((t_dwxx.dwbh)::text = '1003'::
text)))
(3 rows)
gdb跟蹤:
(gdb) b find_duplicate_ors
Breakpoint 1 at 0x7811b4: file prepqual.c, line 418.
(gdb) c
Continuing.
Breakpoint 1, find_duplicate_ors (qual=0x2727528, is_check=false) at prepqual.c:418
418 if (or_clause((Node *) qual))
#輸入引數,BoolExpr
#args連結串列有兩個元素,一個是(FALSE OR dwbh = '1001'),
#另外一個是((dwbh = '1001' AND dwbh = '1002')
OR (dwbh = '1001' AND dwbh = '1003'))
(gdb) p *(BoolExpr *)qual
$2 = {xpr = {type = T_BoolExpr}, boolop = AND_EXPR, args = 0x2726c88, location = -1}
(gdb) p *((BoolExpr *)qual)->args
$3 = {type = T_List, length = 2, head = 0x2726c68, tail = 0x2727508}
...
#進入AND分支
462 else if (and_clause((Node *) qual))
...
#獲得連結串列第一個元素
470 Expr *arg = (Expr *) lfirst(temp);
#再次進入find_duplicate_ors
#FALSE OR dwbh='1001'在此前已被簡化為dwbh='1001',所以在這裡直接返回
Breakpoint 1, find_duplicate_ors (qual=0x2726bc8, is_check=false) at prepqual.c:418
418 if (or_clause((Node *) qual))
(gdb) n
462 else if (and_clause((Node *) qual))
(gdb)
#
515 return qual;
(gdb) n
516 }
(gdb)
475 if (arg && IsA(arg, Const))
...
497 andlist = lappend(andlist, arg);
(gdb) n
468 foreach(temp, ((BoolExpr *) qual)->args)
(gdb) n
470 Expr *arg = (Expr *) lfirst(temp);
(gdb)
472 arg = find_duplicate_ors(arg, is_check);
#連結串列的下一個元素,即
#((dwbh = '1001' AND dwbh = '1002')
OR (dwbh = '1001' AND dwbh = '1003'))
#由兩個BoolExpr組成
(gdb) p *(BoolExpr *)arg
$23 = {xpr = {type = T_BoolExpr}, boolop = OR_EXPR, args = 0x27270d8, location = -1}
(gdb) p *((BoolExpr *)arg)->args
$24 = {type = T_List, length = 2, head = 0x27270b8, tail = 0x27274b8}
(gdb) p *(Node *)((BoolExpr *)arg)->args->head->data.ptr_value
$25 = {type = T_BoolExpr}
(gdb) p *(Node *)((BoolExpr *)arg)->args->head->next->data.ptr_value
$26 = {type = T_BoolExpr}
...
#args左右兩邊的的arg處理完畢
424 foreach(temp, ((BoolExpr *) qual)->args)
(gdb)
457 orlist = pull_ors(orlist);
...
#進入process_duplicate_ors
460 return process_duplicate_ors(orlist);
(gdb) step
process_duplicate_ors (orlist=0x2727858) at prepqual.c:529
529 List *reference = NIL;
...
#獲得最少謂詞的連結串列
(gdb) p *reference
$36 = {type = T_List, length = 2, head = 0x27278a8, tail = 0x27278f8}
...
#獲得了公共的謂詞winner!即dwbh = '1001'
(gdb) p *winners
$40 = {type = T_List, length = 1, head = 0x2727918, tail = 0x2727918}
...
(gdb)
canonicalize_qual (qual=0x2727528, is_check=false) at prepqual.c:309
309 return newqual;
(gdb) p *newqual
$43 = {type = T_BoolExpr}
(gdb) p *(BoolExpr *)newqual
$45 = {xpr = {type = T_BoolExpr}, boolop = AND_EXPR, args = 0x2727c18, location = -1}
#DONE!
五、小結
1、優化的數學基礎:布林代數以及相關的定律;
2、表示式規範化的過程:表示式扁平化處理以及公共謂詞提取等的處理邏輯。
六、參考資料
布林代數
prepqual.c
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374879/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- PostgreSQL 原始碼解讀(31)- 查詢語句#16(查詢優化-表示式預處理#1)SQL原始碼優化
- PostgreSQL 原始碼解讀(32)- 查詢語句#17(查詢優化-表示式預處理#2)SQL原始碼優化
- PostgreSQL 原始碼解讀(34)- 查詢語句#19(查詢優化-表示式預處理#4)SQL原始碼優化
- PostgreSQL 原始碼解讀(30)- 查詢語句#15(查詢優化-扁平化處理UNION ALL)SQL原始碼優化
- PostgreSQL 原始碼解讀(18)- 查詢語句#3(SQL Parse)SQL原始碼
- PostgreSQL 原始碼解讀(29)- 查詢語句#14(查詢優化-上拉子查詢)SQL原始碼優化
- PostgreSQL 原始碼解讀(17)- 查詢語句#2(查詢優化基礎)SQL原始碼優化
- PostgreSQL 原始碼解讀(25)- 查詢語句#10(查詢優化概覽)SQL原始碼優化
- PostgreSQL 原始碼解讀(37)- 查詢語句#22(查詢優化-grouping_plan...SQL原始碼優化
- PostgreSQL 原始碼解讀(28)- 查詢語句#13(查詢優化-上拉子連結#3)SQL原始碼優化
- PostgreSQL 原始碼解讀(36)- 查詢語句#21(查詢優化-消除外連線)SQL原始碼優化
- PostgreSQL 原始碼解讀(48)- 查詢語句#33(query_planner函式#9)SQL原始碼函式
- PostgreSQL 原始碼解讀(35)- 查詢語句#20(查詢優化-簡化Having和Grou...SQL原始碼優化
- PostgreSQL 原始碼解讀(93)- 查詢語句#77(ExecHashJoin函式#3)SQL原始碼函式
- PostgreSQL 原始碼解讀(24)- 查詢語句#9(查詢重寫)SQL原始碼
- PostgreSQL 原始碼解讀(26)- 查詢語句#11(查詢優化-上拉子連結#1)SQL原始碼優化
- PostgreSQL 原始碼解讀(27)- 查詢語句#12(查詢優化-上拉子連結#2)SQL原始碼優化
- PostgreSQL 原始碼解讀(20)- 查詢語句#5(查詢樹Query詳解)SQL原始碼
- PostgreSQL 原始碼解讀(83)- 查詢語句#68(PortalStart函式)SQL原始碼函式
- PostgreSQL 原始碼解讀(40)- 查詢語句#25(query_planner函式#3)SQL原始碼函式
- PostgreSQL 原始碼解讀(164)- 查詢#84(表示式求值)SQL原始碼
- PostgreSQL 原始碼解讀(68)- 查詢語句#53(make_one_rel函式#18-...SQL原始碼函式
- PostgreSQL 原始碼解讀(82)- 查詢語句#67(PortalXXX系列函式)SQL原始碼函式
- PostgreSQL 原始碼解讀(89)- 查詢語句#74(SeqNext函式#2)SQL原始碼函式
- PostgreSQL 原始碼解讀(90)- 查詢語句#75(ExecHashJoin函式#1)SQL原始碼函式
- PostgreSQL 原始碼解讀(91)- 查詢語句#76(ExecHashJoin函式#2)SQL原始碼函式
- PostgreSQL 原始碼解讀(88)- 查詢語句#73(SeqNext函式#1)SQL原始碼函式
- PostgreSQL 原始碼解讀(19)- 查詢語句#4(ParseTree詳解)SQL原始碼
- PostgreSQL 原始碼解讀(73)- 查詢語句#58(grouping_planner函式...SQL原始碼函式
- PostgreSQL 原始碼解讀(79)- 查詢語句#64(create_plan函式#3-Se...SQL原始碼函式
- PostgreSQL 原始碼解讀(75)- 查詢語句#60(Review - standard_...SQL原始碼View
- PostgreSQL 原始碼解讀(74)- 查詢語句#59(Review - subquery_...SQL原始碼View
- PostgreSQL 原始碼解讀(42)- 查詢語句#27(等價類)SQL原始碼
- PostgreSQL 原始碼解讀(95)- 查詢語句#78(ExecHashJoin函式#4-H...SQL原始碼函式
- PostgreSQL 原始碼解讀(97)- 查詢語句#79(ExecHashJoin函式#5-H...SQL原始碼函式
- PostgreSQL 原始碼解讀(43)- 查詢語句#28(query_planner函式#5)SQL原始碼函式
- PostgreSQL 原始碼解讀(45)- 查詢語句#30(query_planner函式#6)SQL原始碼函式
- PostgreSQL 原始碼解讀(46)- 查詢語句#31(query_planner函式#7)SQL原始碼函式