PostgreSQL 原始碼解讀(102)- 分割槽表#8(資料查詢路由#5-構建APPEND訪問路徑)
本節介紹了PG如何為append relation(分割槽表)構建訪問路徑,主要的實現邏輯在函式set_rel_pathlist中實現,該函式首先判斷RTE是否分割槽表,如是則呼叫set_append_rel_pathlist函式對仍“存活”的子關係構建訪問路徑,然後把這些訪問路徑Merge到APPEND訪問路徑中。
一、資料結構
AppendRelInfo
當我們將可繼承表(分割槽表)或UNION-ALL子查詢展開為“追加關係”(本質上是子RTE的連結串列)時,為每個子RTE構建一個AppendRelInfo。
/*
* Append-relation info.
* Append-relation資訊.
*
* When we expand an inheritable table or a UNION-ALL subselect into an
* "append relation" (essentially, a list of child RTEs), we build an
* AppendRelInfo for each child RTE. The list of AppendRelInfos indicates
* which child RTEs must be included when expanding the parent, and each node
* carries information needed to translate Vars referencing the parent into
* Vars referencing that child.
* 當我們將可繼承表(分割槽表)或UNION-ALL子查詢展開為“追加關係”(本質上是子RTE的連結串列)時,
* 為每個子RTE構建一個AppendRelInfo。
* AppendRelInfos連結串列指示在展開父節點時必須包含哪些子rte,
* 每個節點具有將引用父節點的Vars轉換為引用該子節點的Vars所需的所有資訊。
*
* These structs are kept in the PlannerInfo node's append_rel_list.
* Note that we just throw all the structs into one list, and scan the
* whole list when desiring to expand any one parent. We could have used
* a more complex data structure (eg, one list per parent), but this would
* be harder to update during operations such as pulling up subqueries,
* and not really any easier to scan. Considering that typical queries
* will not have many different append parents, it doesn't seem worthwhile
* to complicate things.
* 這些結構體儲存在PlannerInfo節點的append_rel_list中。
* 注意,只是將所有的結構體放入一個連結串列中,並在希望展開任何父類時掃描整個連結串列。
* 本可以使用更復雜的資料結構(例如,每個父節點一個列表),
* 但是在提取子查詢之類的操作中更新它會更困難,
* 而且實際上也不會更容易掃描。
* 考慮到典型的查詢不會有很多不同的附加項,因此似乎不值得將事情複雜化。
*
* Note: after completion of the planner prep phase, any given RTE is an
* append parent having entries in append_rel_list if and only if its
* "inh" flag is set. We clear "inh" for plain tables that turn out not
* to have inheritance children, and (in an abuse of the original meaning
* of the flag) we set "inh" for subquery RTEs that turn out to be
* flattenable UNION ALL queries. This lets us avoid useless searches
* of append_rel_list.
* 注意:計劃準備階段完成後,
* 當且僅當它的“inh”標誌已設定時,給定的RTE是一個append parent在append_rel_list中的一個條目。
* 我們為沒有child的平面表清除“inh”標記,
* 同時(有濫用標記的嫌疑)為UNION ALL查詢中的子查詢RTEs設定“inh”標記。
* 這樣可以避免對append_rel_list進行無用的搜尋。
*
* Note: the data structure assumes that append-rel members are single
* baserels. This is OK for inheritance, but it prevents us from pulling
* up a UNION ALL member subquery if it contains a join. While that could
* be fixed with a more complex data structure, at present there's not much
* point because no improvement in the plan could result.
* 注意:資料結構假定附加的rel成員是獨立的baserels。
* 這對於繼承來說是可以的,但是如果UNION ALL member子查詢包含一個join,
* 那麼它將阻止我們提取UNION ALL member子查詢。
* 雖然可以用更復雜的資料結構解決這個問題,但目前沒有太大意義,因為該計劃可能不會有任何改進。
*/
typedef struct AppendRelInfo
{
NodeTag type;
/*
* These fields uniquely identify this append relationship. There can be
* (in fact, always should be) multiple AppendRelInfos for the same
* parent_relid, but never more than one per child_relid, since a given
* RTE cannot be a child of more than one append parent.
* 這些欄位惟一地標識這個append relationship。
* 對於同一個parent_relid可以有(實際上應該總是)多個AppendRelInfos,
* 但是每個child_relid不能有多個AppendRelInfos,
* 因為給定的RTE不能是多個append parent的子節點。
*/
Index parent_relid; /* parent rel的RT索引;RT index of append parent rel */
Index child_relid; /* child rel的RT索引;RT index of append child rel */
/*
* For an inheritance appendrel, the parent and child are both regular
* relations, and we store their rowtype OIDs here for use in translating
* whole-row Vars. For a UNION-ALL appendrel, the parent and child are
* both subqueries with no named rowtype, and we store InvalidOid here.
* 對於繼承appendrel,父類和子類都是普通關係,
* 我們將它們的rowtype OIDs儲存在這裡,用於轉換whole-row Vars。
* 對於UNION-ALL appendrel,父查詢和子查詢都是沒有指定行型別的子查詢,
* 我們在這裡儲存InvalidOid。
*/
Oid parent_reltype; /* OID of parent's composite type */
Oid child_reltype; /* OID of child's composite type */
/*
* The N'th element of this list is a Var or expression representing the
* child column corresponding to the N'th column of the parent. This is
* used to translate Vars referencing the parent rel into references to
* the child. A list element is NULL if it corresponds to a dropped
* column of the parent (this is only possible for inheritance cases, not
* UNION ALL). The list elements are always simple Vars for inheritance
* cases, but can be arbitrary expressions in UNION ALL cases.
* 這個列表的第N個元素是一個Var或表示式,表示與父元素的第N列對應的子列。
* 這用於將引用parent rel的Vars轉換為對子rel的引用。
* 如果連結串列元素與父元素的已刪除列相對應,則該元素為NULL
* (這隻適用於繼承情況,而不是UNION ALL)。
* 對於繼承情況,連結串列元素總是簡單的變數,但是可以是UNION ALL情況下的任意表示式。
*
* Notice we only store entries for user columns (attno > 0). Whole-row
* Vars are special-cased, and system columns (attno < 0) need no special
* translation since their attnos are the same for all tables.
* 注意,我們只儲存使用者列的條目(attno > 0)。
* Whole-row Vars是大小寫敏感的,系統列(attno < 0)不需要特別的轉換,
* 因為它們的attno對所有表都是相同的。
*
* Caution: the Vars have varlevelsup = 0. Be careful to adjust as needed
* when copying into a subquery.
* 注意:Vars的varlevelsup = 0。
* 在將資料複製到子查詢時,要注意根據需要進行調整。
*/
//child's Vars中的表示式
List *translated_vars; /* Expressions in the child's Vars */
/*
* We store the parent table's OID here for inheritance, or InvalidOid for
* UNION ALL. This is only needed to help in generating error messages if
* an attempt is made to reference a dropped parent column.
* 我們將父表的OID儲存在這裡用於繼承,
* 如為UNION ALL,則這裡儲存的是InvalidOid。
* 只有在試圖引用已刪除的父列時,才需要這樣做來幫助生成錯誤訊息。
*/
Oid parent_reloid; /* OID of parent relation */
} AppendRelInfo;
RelOptInfo
規劃器/最佳化器使用的關係資訊結構體
參見PostgreSQL 原始碼解讀(99)- 分割槽表#5(資料查詢路由#2-RelOptInfo資料結構)
二、原始碼解讀
set_rel_pathlist函式為基礎關係構建訪問路徑.
//是否DUMMY訪問路徑
#define IS_DUMMY_PATH(p) \
(IsA((p), AppendPath) && ((AppendPath *) (p))->subpaths == NIL)
/* A relation that's been proven empty will have one path that is dummy */
//已被證明為空的關係會含有一個虛擬dummy的訪問路徑
#define IS_DUMMY_REL(r) \
((r)->cheapest_total_path != NULL && \
IS_DUMMY_PATH((r)->cheapest_total_path))
/*
* set_rel_pathlist
* Build access paths for a base relation
* 為基礎關係構建訪問路徑
*/
static void
set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
if (IS_DUMMY_REL(rel))
{
/* We already proved the relation empty, so nothing more to do */
//不需要做任何處理
}
else if (rte->inh)
{
/* It's an "append relation", process accordingly */
//append relation,呼叫set_append_rel_pathlist處理
set_append_rel_pathlist(root, rel, rti, rte);
}
else
{//其他型別的關係
switch (rel->rtekind)
{
case RTE_RELATION://基礎關係
if (rte->relkind == RELKIND_FOREIGN_TABLE)
{
/* Foreign table */
//外部表
set_foreign_pathlist(root, rel, rte);
}
else if (rte->tablesample != NULL)
{
/* Sampled relation */
//資料表取樣
set_tablesample_rel_pathlist(root, rel, rte);
}
else
{
/* Plain relation */
//常規關係
set_plain_rel_pathlist(root, rel, rte);
}
break;
case RTE_SUBQUERY:
/* Subquery --- fully handled during set_rel_size */
//子查詢
break;
case RTE_FUNCTION:
/* RangeFunction */
set_function_pathlist(root, rel, rte);
break;
case RTE_TABLEFUNC:
/* Table Function */
set_tablefunc_pathlist(root, rel, rte);
break;
case RTE_VALUES:
/* Values list */
set_values_pathlist(root, rel, rte);
break;
case RTE_CTE:
/* CTE reference --- fully handled during set_rel_size */
break;
case RTE_NAMEDTUPLESTORE:
/* tuplestore reference --- fully handled during set_rel_size */
break;
default:
elog(ERROR, "unexpected rtekind: %d", (int) rel->rtekind);
break;
}
}
/*
* If this is a baserel, we should normally consider gathering any partial
* paths we may have created for it.
* 如為基礎關係,通常來說需要考慮聚集gathering所有的部分路徑(先前已構建)
*
* However, if this is an inheritance child, skip it. Otherwise, we could
* end up with a very large number of gather nodes, each trying to grab
* its own pool of workers. Instead, we'll consider gathering partial
* paths for the parent appendrel.
* 但是,如果這是一個繼承關係中的子表,忽略它.
* 否則的話,我們最好可能會有很大量的Gather節點,每一個都嘗試grab它自己worker的輸出.
* 相反我們的處理是為父append relation生成gathering部分訪問路徑.
*
* Also, if this is the topmost scan/join rel (that is, the only baserel),
* we postpone this until the final scan/join targelist is available (see
* grouping_planner).
* 另外,如果這是最高層的scan/join關係(基礎關係),
* 在完成了最後的scan/join投影列處理後才進行相應的處理.
*/
if (rel->reloptkind == RELOPT_BASEREL &&
bms_membership(root->all_baserels) != BMS_SINGLETON)
generate_gather_paths(root, rel, false);
/*
* Allow a plugin to editorialize on the set of Paths for this base
* relation. It could add new paths (such as CustomPaths) by calling
* add_path(), or delete or modify paths added by the core code.
* 呼叫鉤子函式.
* 外掛可以透過呼叫add_path新增新的訪問路徑(比如CustomPaths等),
* 或者刪除/修改核心程式碼生成的訪問路徑.
*/
if (set_rel_pathlist_hook)
(*set_rel_pathlist_hook) (root, rel, rti, rte);
/* Now find the cheapest of the paths for this rel */
//為rel找到成本最低的訪問路徑
set_cheapest(rel);
#ifdef OPTIMIZER_DEBUG
debug_print_rel(root, rel);
#endif
}
/*
* set_append_rel_pathlist
* Build access paths for an "append relation"
* 為append relation構建訪問路徑
*/
static void
set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte)
{
int parentRTindex = rti;
List *live_childrels = NIL;
ListCell *l;
/*
* Generate access paths for each member relation, and remember the
* non-dummy children.
* 為每個成員關係生成訪問路徑,並"記住"非虛擬子節點。
*/
foreach(l, root->append_rel_list)//遍歷連結串列
{
AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);//獲取AppendRelInfo
int childRTindex;
RangeTblEntry *childRTE;
RelOptInfo *childrel;
/* append_rel_list contains all append rels; ignore others */
//append_rel_list含有所有的append relations,忽略其他的rels
if (appinfo->parent_relid != parentRTindex)
continue;
/* Re-locate the child RTE and RelOptInfo */
//重新定位子RTE和RelOptInfo
childRTindex = appinfo->child_relid;
childRTE = root->simple_rte_array[childRTindex];
childrel = root->simple_rel_array[childRTindex];
/*
* If set_append_rel_size() decided the parent appendrel was
* parallel-unsafe at some point after visiting this child rel, we
* need to propagate the unsafety marking down to the child, so that
* we don't generate useless partial paths for it.
* 如果set_append_rel_size()函式在訪問了子關係後,
* 在某些點上斷定父append relation是非並行安全的,
* 我們需要分發不安全的標記到子關係中,以免產生無用的部分訪問路徑.
*/
if (!rel->consider_parallel)
childrel->consider_parallel = false;
/*
* Compute the child's access paths.
* "計算"子關係的訪問路徑
*/
set_rel_pathlist(root, childrel, childRTindex, childRTE);
/*
* If child is dummy, ignore it.
* 如為虛擬關係,忽略之
*/
if (IS_DUMMY_REL(childrel))
continue;
/* Bubble up childrel's partitioned children. */
//
if (rel->part_scheme)
rel->partitioned_child_rels =
list_concat(rel->partitioned_child_rels,
list_copy(childrel->partitioned_child_rels));
/*
* Child is live, so add it to the live_childrels list for use below.
* 新增到live_childrels連結串列中
*/
live_childrels = lappend(live_childrels, childrel);
}
/* Add paths to the append relation. */
//新增訪問路徑到append relation中
add_paths_to_append_rel(root, rel, live_childrels);
}
/*
* add_paths_to_append_rel
* Generate paths for the given append relation given the set of non-dummy
* child rels.
* 基於非虛擬子關係集合為給定的append relation生成訪問路徑
*
* The function collects all parameterizations and orderings supported by the
* non-dummy children. For every such parameterization or ordering, it creates
* an append path collecting one path from each non-dummy child with given
* parameterization or ordering. Similarly it collects partial paths from
* non-dummy children to create partial append paths.
* 該函式收集所有的非虛擬關係支援的引數化和排序訪問路徑.
* 對於每一個引數化或排序的訪問路徑,它建立一個附加路徑,
* 從每個具有給定引數化或排序的非虛擬子節點收集相關資訊。
* 類似的,從非虛擬子關係中收集部分訪問路徑用以建立部分append路徑.
*/
void
add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels)
{
List *subpaths = NIL;
bool subpaths_valid = true;
List *partial_subpaths = NIL;
List *pa_partial_subpaths = NIL;
List *pa_nonpartial_subpaths = NIL;
bool partial_subpaths_valid = true;
bool pa_subpaths_valid;
List *all_child_pathkeys = NIL;
List *all_child_outers = NIL;
ListCell *l;
List *partitioned_rels = NIL;
double partial_rows = -1;
/* If appropriate, consider parallel append */
//如可以,考慮並行append
pa_subpaths_valid = enable_parallel_append && rel->consider_parallel;
/*
* AppendPath generated for partitioned tables must record the RT indexes
* of partitioned tables that are direct or indirect children of this
* Append rel.
* 為分割槽表生成的AppendPath必須記錄屬於該分割槽表(Append Rel)的直接或間接子關係的RT索引
*
* AppendPath may be for a sub-query RTE (UNION ALL), in which case, 'rel'
* itself does not represent a partitioned relation, but the child sub-
* queries may contain references to partitioned relations. The loop
* below will look for such children and collect them in a list to be
* passed to the path creation function. (This assumes that we don't need
* to look through multiple levels of subquery RTEs; if we ever do, we
* could consider stuffing the list we generate here into sub-query RTE's
* RelOptInfo, just like we do for partitioned rels, which would be used
* when populating our parent rel with paths. For the present, that
* appears to be unnecessary.)
* AppendPath可能是子查詢RTE(UNION ALL),在這種情況下,rel自身並不代表分割槽表,
* 但child sub-queries可能含有分割槽表的依賴.
* 以下的迴圈會尋找這樣的子關係,儲存在連結串列中,並作為引數傳給訪問路徑建立函式.
* (這是假設我們不需要遍歷多個層次的子查詢RTEs,如果我們曾經這樣做了,
* 這好比分割槽表的做法,可以考慮把生成的連結串列放到子查詢的RTE's RelOptInfo結構體中,
* 用於使用訪問路徑填充父關係.不過目前來說,這樣做是不需要的.)
*/
if (rel->part_scheme != NULL)
{
if (IS_SIMPLE_REL(rel))
partitioned_rels = list_make1(rel->partitioned_child_rels);
else if (IS_JOIN_REL(rel))
{
int relid = -1;
List *partrels = NIL;
/*
* For a partitioned joinrel, concatenate the component rels'
* partitioned_child_rels lists.
* 對於分割槽連線rel,連線rels的partitioned_child_rels連結串列
*
*/
while ((relid = bms_next_member(rel->relids, relid)) >= 0)
{
RelOptInfo *component;
Assert(relid >= 1 && relid < root->simple_rel_array_size);
component = root->simple_rel_array[relid];
Assert(component->part_scheme != NULL);
Assert(list_length(component->partitioned_child_rels) >= 1);
partrels =
list_concat(partrels,
list_copy(component->partitioned_child_rels));
}
partitioned_rels = list_make1(partrels);
}
Assert(list_length(partitioned_rels) >= 1);
}
/*
* For every non-dummy child, remember the cheapest path. Also, identify
* all pathkeys (orderings) and parameterizations (required_outer sets)
* available for the non-dummy member relations.
* 對於每一個非虛擬子關係,記錄成本最低的訪問路徑.
* 同時,為每一個非虛擬成員關係標識所有的路徑鍵(排序)和可用的引數化資訊(required_outer集合)
*/
foreach(l, live_childrels)//遍歷
{
RelOptInfo *childrel = lfirst(l);
ListCell *lcp;
Path *cheapest_partial_path = NULL;
/*
* For UNION ALLs with non-empty partitioned_child_rels, accumulate
* the Lists of child relations.
* 對於包含非空的partitioned_child_rels的UNION ALLs操作,
* 累積子關係連結串列
*/
if (rel->rtekind == RTE_SUBQUERY && childrel->partitioned_child_rels != NIL)
partitioned_rels = lappend(partitioned_rels,
childrel->partitioned_child_rels);
/*
* If child has an unparameterized cheapest-total path, add that to
* the unparameterized Append path we are constructing for the parent.
* If not, there's no workable unparameterized path.
* 如果子關係存在非引數化的總成本最低的訪問路徑,
* 新增此路徑到我們為父關係構建的非引數化的Append訪問路徑中.
*
* With partitionwise aggregates, the child rel's pathlist may be
* empty, so don't assume that a path exists here.
* 使用partitionwise聚合,子關係的訪問路徑連結串列可能是空的,不能假設其中存在訪問路徑
*/
if (childrel->pathlist != NIL &&
childrel->cheapest_total_path->param_info == NULL)
accumulate_append_subpath(childrel->cheapest_total_path,
&subpaths, NULL);
else
subpaths_valid = false;
/* Same idea, but for a partial plan. */
//同樣的思路,處理並行處理中的部分計劃
if (childrel->partial_pathlist != NIL)
{
cheapest_partial_path = linitial(childrel->partial_pathlist);
accumulate_append_subpath(cheapest_partial_path,
&partial_subpaths, NULL);
}
else
partial_subpaths_valid = false;
/*
* Same idea, but for a parallel append mixing partial and non-partial
* paths.
* 同樣的,處理並行append混合並行/非並行訪問路徑
*/
if (pa_subpaths_valid)
{
Path *nppath = NULL;
nppath =
get_cheapest_parallel_safe_total_inner(childrel->pathlist);
if (cheapest_partial_path == NULL && nppath == NULL)
{
/* Neither a partial nor a parallel-safe path? Forget it. */
//不是部分路徑,也不是並行安全的路徑,跳過
pa_subpaths_valid = false;
}
else if (nppath == NULL ||
(cheapest_partial_path != NULL &&
cheapest_partial_path->total_cost < nppath->total_cost))
{
/* Partial path is cheaper or the only option. */
//部分路徑成本更低或者是唯一的選項
Assert(cheapest_partial_path != NULL);
accumulate_append_subpath(cheapest_partial_path,
&pa_partial_subpaths,
&pa_nonpartial_subpaths);
}
else
{
/*
* Either we've got only a non-partial path, or we think that
* a single backend can execute the best non-partial path
* faster than all the parallel backends working together can
* execute the best partial path.
* 這時候,要麼得到了一個非並行訪問路徑,或者我們認為一個單獨的後臺程式
* 執行最好的非並行訪問路徑會比索引的並行程式一起執行最好的部分路徑還要好.
*
* It might make sense to be more aggressive here. Even if
* the best non-partial path is more expensive than the best
* partial path, it could still be better to choose the
* non-partial path if there are several such paths that can
* be given to different workers. For now, we don't try to
* figure that out.
* 在這裡,採取更積極的態度是有道理的.
* 甚至最好的非部分路徑比最好的並行部分路徑成本更高,仍然需要選擇非並行路徑,
* 如果多個這樣的路徑可能會分派到不同的worker上.
* 現在不需要指出這一點.
*/
accumulate_append_subpath(nppath,
&pa_nonpartial_subpaths,
NULL);
}
}
/*
* Collect lists of all the available path orderings and
* parameterizations for all the children. We use these as a
* heuristic to indicate which sort orderings and parameterizations we
* should build Append and MergeAppend paths for.
* 收集子關係所有可用的排序和引數化路徑連結串列.
* 我們採用啟發式的規則判斷Append和MergeAppend訪問路徑使用哪個排序和引數化資訊
*/
foreach(lcp, childrel->pathlist)
{
Path *childpath = (Path *) lfirst(lcp);
List *childkeys = childpath->pathkeys;
Relids childouter = PATH_REQ_OUTER(childpath);
/* Unsorted paths don't contribute to pathkey list */
//未排序的訪問路徑,不需要分發到路徑鍵連結串列中
if (childkeys != NIL)
{
ListCell *lpk;
bool found = false;
/* Have we already seen this ordering? */
foreach(lpk, all_child_pathkeys)
{
List *existing_pathkeys = (List *) lfirst(lpk);
if (compare_pathkeys(existing_pathkeys,
childkeys) == PATHKEYS_EQUAL)
{
found = true;
break;
}
}
if (!found)
{
/* No, so add it to all_child_pathkeys */
all_child_pathkeys = lappend(all_child_pathkeys,
childkeys);
}
}
/* Unparameterized paths don't contribute to param-set list */
//非引數訪問路徑無需分發到引數化集合連結串列中
if (childouter)
{
ListCell *lco;
bool found = false;
/* Have we already seen this param set? */
foreach(lco, all_child_outers)
{
Relids existing_outers = (Relids) lfirst(lco);
if (bms_equal(existing_outers, childouter))
{
found = true;
break;
}
}
if (!found)
{
/* No, so add it to all_child_outers */
all_child_outers = lappend(all_child_outers,
childouter);
}
}
}
}
/*
* If we found unparameterized paths for all children, build an unordered,
* unparameterized Append path for the rel. (Note: this is correct even
* if we have zero or one live subpath due to constraint exclusion.)
* 如存在子關係的非引數化訪問路徑,構建未排序/未引數化的Append訪問路徑.
* (注意:如果存在約束排除,我們只剩下有0或1個存活的subpath,這樣的處理也說OK的)
*/
if (subpaths_valid)
add_path(rel, (Path *) create_append_path(root, rel, subpaths, NIL,
NULL, 0, false,
partitioned_rels, -1));
/*
* Consider an append of unordered, unparameterized partial paths. Make
* it parallel-aware if possible.
* 嘗試未排序/未引數化的部分Append訪問路徑.
* 如可能,構建parallel-aware訪問路徑.
*/
if (partial_subpaths_valid)
{
AppendPath *appendpath;
ListCell *lc;
int parallel_workers = 0;
/* Find the highest number of workers requested for any subpath. */
//為子訪問路徑尋找最多數量的wokers
foreach(lc, partial_subpaths)
{
Path *path = lfirst(lc);
parallel_workers = Max(parallel_workers, path->parallel_workers);
}
Assert(parallel_workers > 0);
/*
* If the use of parallel append is permitted, always request at least
* log2(# of children) workers. We assume it can be useful to have
* extra workers in this case because they will be spread out across
* the children. The precise formula is just a guess, but we don't
* want to end up with a radically different answer for a table with N
* partitions vs. an unpartitioned table with the same data, so the
* use of some kind of log-scaling here seems to make some sense.
* 如允許使用並行append,那麼至少需要log2(子關係個數)個workers.
* 經過擴充套件後,假定可以使用額外的workers.
* 並不存在精確的計算公式,目前只是猜測而已,但是對於有相同資料的N個分割槽的分割槽表和非分割槽表來說,
* 答案是不一樣的,因此在這裡使用對數的計算方法是OK的.
*/
if (enable_parallel_append)
{
parallel_workers = Max(parallel_workers,
fls(list_length(live_childrels)));//上限值
parallel_workers = Min(parallel_workers,
max_parallel_workers_per_gather);//下限值
}
Assert(parallel_workers > 0);
/* Generate a partial append path. */
//生成並行部分append訪問路徑
appendpath = create_append_path(root, rel, NIL, partial_subpaths,
NULL, parallel_workers,
enable_parallel_append,
partitioned_rels, -1);
/*
* Make sure any subsequent partial paths use the same row count
* estimate.
* 確保所有的子部分路徑使用相同的行數估算
*/
partial_rows = appendpath->path.rows;
/* Add the path. */
//新增路徑
add_partial_path(rel, (Path *) appendpath);
}
/*
* Consider a parallel-aware append using a mix of partial and non-partial
* paths. (This only makes sense if there's at least one child which has
* a non-partial path that is substantially cheaper than any partial path;
* otherwise, we should use the append path added in the previous step.)
* 使用混合的部分和非部分並行的append.
*/
if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
{
AppendPath *appendpath;
ListCell *lc;
int parallel_workers = 0;
/*
* Find the highest number of workers requested for any partial
* subpath.
*/
foreach(lc, pa_partial_subpaths)
{
Path *path = lfirst(lc);
parallel_workers = Max(parallel_workers, path->parallel_workers);
}
/*
* Same formula here as above. It's even more important in this
* instance because the non-partial paths won't contribute anything to
* the planned number of parallel workers.
*/
parallel_workers = Max(parallel_workers,
fls(list_length(live_childrels)));
parallel_workers = Min(parallel_workers,
max_parallel_workers_per_gather);
Assert(parallel_workers > 0);
appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
pa_partial_subpaths,
NULL, parallel_workers, true,
partitioned_rels, partial_rows);
add_partial_path(rel, (Path *) appendpath);
}
/*
* Also build unparameterized MergeAppend paths based on the collected
* list of child pathkeys.
* 基於收集的子路徑鍵,構建非引數化的MergeAppend訪問路徑
*/
if (subpaths_valid)
generate_mergeappend_paths(root, rel, live_childrels,
all_child_pathkeys,
partitioned_rels);
/*
* Build Append paths for each parameterization seen among the child rels.
* (This may look pretty expensive, but in most cases of practical
* interest, the child rels will expose mostly the same parameterizations,
* so that not that many cases actually get considered here.)
* 為每一個引數化的子關係構建Append訪問路徑.
* (看起來成本客觀,但在大多數情況下,子關係使用同樣的引數化資訊,因此實際上並不是經常發現)
*
* The Append node itself cannot enforce quals, so all qual checking must
* be done in the child paths. This means that to have a parameterized
* Append path, we must have the exact same parameterization for each
* child path; otherwise some children might be failing to check the
* moved-down quals. To make them match up, we can try to increase the
* parameterization of lesser-parameterized paths.
*/
foreach(l, all_child_outers)
{
Relids required_outer = (Relids) lfirst(l);
ListCell *lcr;
/* Select the child paths for an Append with this parameterization */
subpaths = NIL;
subpaths_valid = true;
foreach(lcr, live_childrels)
{
RelOptInfo *childrel = (RelOptInfo *) lfirst(lcr);
Path *subpath;
if (childrel->pathlist == NIL)
{
/* failed to make a suitable path for this child */
subpaths_valid = false;
break;
}
subpath = get_cheapest_parameterized_child_path(root,
childrel,
required_outer);
if (subpath == NULL)
{
/* failed to make a suitable path for this child */
subpaths_valid = false;
break;
}
accumulate_append_subpath(subpath, &subpaths, NULL);
}
if (subpaths_valid)
add_path(rel, (Path *)
create_append_path(root, rel, subpaths, NIL,
required_outer, 0, false,
partitioned_rels, -1));
}
}
三、跟蹤分析
測試指令碼如下
testdb=# explain verbose select * from t_hash_partition where c1 = 1 OR c1 = 2;
QUERY PLAN
-------------------------------------------------------------------------------------
Append (cost=0.00..30.53 rows=6 width=200)
-> Seq Scan on public.t_hash_partition_1 (cost=0.00..15.25 rows=3 width=200)
Output: t_hash_partition_1.c1, t_hash_partition_1.c2, t_hash_partition_1.c3
Filter: ((t_hash_partition_1.c1 = 1) OR (t_hash_partition_1.c1 = 2))
-> Seq Scan on public.t_hash_partition_3 (cost=0.00..15.25 rows=3 width=200)
Output: t_hash_partition_3.c1, t_hash_partition_3.c2, t_hash_partition_3.c3
Filter: ((t_hash_partition_3.c1 = 1) OR (t_hash_partition_3.c1 = 2))
(7 rows)
啟動gdb,設定斷點
(gdb) b set_rel_pathlist
Breakpoint 1 at 0x796823: file allpaths.c, line 425.
(gdb) c
Continuing.
Breakpoint 1, set_rel_pathlist (root=0x1f1e400, rel=0x1efaba0, rti=1, rte=0x1efa3d0) at allpaths.c:425
425 if (IS_DUMMY_REL(rel))
(gdb)
透過rte->inh判斷是否分割槽表或者UNION ALL
(gdb) p rte->inh
$1 = true
(gdb)
進入set_append_rel_pathlist函式
(gdb) n
429 else if (rte->inh)
(gdb)
432 set_append_rel_pathlist(root, rel, rti, rte);
(gdb) step
set_append_rel_pathlist (root=0x1f1e400, rel=0x1efaba0, rti=1, rte=0x1efa3d0) at allpaths.c:1296
1296 int parentRTindex = rti;
遍歷子關係
(gdb) n
1297 List *live_childrels = NIL;
(gdb)
1304 foreach(l, root->append_rel_list)
(gdb)
(gdb) p *root->append_rel_list
$2 = {type = T_List, length = 6, head = 0x1fc1f98, tail = 0x1fc2ae8}
獲取AppendRelInfo,判斷父關係是否正在處理的父關係
(gdb) n
1306 AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
(gdb)
1312 if (appinfo->parent_relid != parentRTindex)
(gdb) p *appinfo
$3 = {type = T_AppendRelInfo, parent_relid = 1, child_relid = 3, parent_reltype = 16988, child_reltype = 16991,
translated_vars = 0x1fc1e60, parent_reloid = 16986}
(gdb)
獲取子關係的相關資訊,遞迴呼叫set_rel_pathlist
(gdb) n
1316 childRTindex = appinfo->child_relid;
(gdb)
1317 childRTE = root->simple_rte_array[childRTindex];
(gdb)
1318 childrel = root->simple_rel_array[childRTindex];
(gdb)
1326 if (!rel->consider_parallel)
(gdb)
1332 set_rel_pathlist(root, childrel, childRTindex, childRTE);
(gdb)
(gdb)
Breakpoint 1, set_rel_pathlist (root=0x1f1e400, rel=0x1f1c8a0, rti=3, rte=0x1efa658) at allpaths.c:425
425 if (IS_DUMMY_REL(rel))
(gdb) finish
Run till exit from #0 set_rel_pathlist (root=0x1f1e400, rel=0x1f1c8a0, rti=3, rte=0x1efa658) at allpaths.c:425
set_append_rel_pathlist (root=0x1f1e400, rel=0x1efaba0, rti=1, rte=0x1efa3d0) at allpaths.c:1337
如為虛擬關係,則忽略之
1337 if (IS_DUMMY_REL(childrel))
該子關係不是虛擬關係,繼續處理,加入到rel->partitioned_child_rels和live_childrels連結串列中
(gdb) n
1341 if (rel->part_scheme)
(gdb)
1344 list_copy(childrel->partitioned_child_rels));
(gdb)
1343 list_concat(rel->partitioned_child_rels,
(gdb)
1342 rel->partitioned_child_rels =
(gdb)
1349 live_childrels = lappend(live_childrels, childrel);
(gdb) p *rel->partitioned_child_rels
$4 = {type = T_IntList, length = 1, head = 0x1fc4d78, tail = 0x1fc4d78}
(gdb) p rel->partitioned_child_rels->head->data.int_value
$6 = 1
(gdb)
(gdb) n
1304 foreach(l, root->append_rel_list)
(gdb) p live_childrels
$7 = (List *) 0x1fd0a60
(gdb) p *live_childrels
$8 = {type = T_List, length = 1, head = 0x1fd0a38, tail = 0x1fd0a38}
(gdb) p *(Node *)live_childrels->head->data.ptr_value
$9 = {type = T_RelOptInfo}
(gdb) p *(RelOptInfo *)live_childrels->head->data.ptr_value
$10 = {type = T_RelOptInfo, reloptkind = RELOPT_OTHER_MEMBER_REL, relids = 0x1fc3590, rows = 3, consider_startup = false,
consider_param_startup = false, consider_parallel = true, reltarget = 0x1fc35b0, pathlist = 0x1fd0940, ppilist = 0x0,
partial_pathlist = 0x1fd09a0, cheapest_startup_path = 0x1fc44f8, cheapest_total_path = 0x1fc44f8,
cheapest_unique_path = 0x0, cheapest_parameterized_paths = 0x1fd0a00, direct_lateral_relids = 0x0, lateral_relids = 0x0,
relid = 3, reltablespace = 0, rtekind = RTE_RELATION, min_attr = -7, max_attr = 3, attr_needed = 0x1fc2e38,
attr_widths = 0x1fc3628, lateral_vars = 0x0, lateral_referencers = 0x0, indexlist = 0x0, statlist = 0x0, pages = 10,
tuples = 350, allvisfrac = 0, subroot = 0x0, subplan_params = 0x0, rel_parallel_workers = -1, serverid = 0, userid = 0,
useridiscurrent = false, fdwroutine = 0x0, fdw_private = 0x0, unique_for_rels = 0x0, non_unique_for_rels = 0x0,
baserestrictinfo = 0x1fc68d8, baserestrictcost = {startup = 0, per_tuple = 0.0050000000000000001},
baserestrict_min_security = 0, joininfo = 0x0, has_eclass_joins = false, consider_partitionwise_join = false,
top_parent_relids = 0x1fc3608, part_scheme = 0x0, nparts = 0, boundinfo = 0x0, partition_qual = 0x0, part_rels = 0x0,
partexprs = 0x0, nullable_partexprs = 0x0, partitioned_child_rels = 0x0}
對於虛擬子關係(上一節介紹的被裁剪的分割槽),直接跳過
(gdb) n
1306 AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(l);
(gdb)
1312 if (appinfo->parent_relid != parentRTindex)
(gdb)
1316 childRTindex = appinfo->child_relid;
(gdb)
1317 childRTE = root->simple_rte_array[childRTindex];
(gdb)
1318 childrel = root->simple_rel_array[childRTindex];
(gdb)
1326 if (!rel->consider_parallel)
(gdb)
1332 set_rel_pathlist(root, childrel, childRTindex, childRTE);
(gdb)
1337 if (IS_DUMMY_REL(childrel))
(gdb)
1338 continue;
設定斷點,進入add_paths_to_append_rel函式
(gdb) b add_paths_to_append_rel
Breakpoint 2 at 0x797d88: file allpaths.c, line 1372.
(gdb) c
Continuing.
Breakpoint 2, add_paths_to_append_rel (root=0x1f1cdb8, rel=0x1fc1800, live_childrels=0x1fcfb10) at allpaths.c:1372
1372 List *subpaths = NIL;
(gdb)
輸入引數,其中rel是父關係,live_childrels是經裁剪後仍存活的分割槽(子關係)
(gdb) n
1373 bool subpaths_valid = true;
(gdb) p *rel
$11 = {type = T_RelOptInfo, reloptkind = RELOPT_BASEREL, relids = 0x1fc1a18, rows = 6, consider_startup = false,
consider_param_startup = false, consider_parallel = true, reltarget = 0x1fc1a38, pathlist = 0x0, ppilist = 0x0,
partial_pathlist = 0x0, cheapest_startup_path = 0x0, cheapest_total_path = 0x0, cheapest_unique_path = 0x0,
cheapest_parameterized_paths = 0x0, direct_lateral_relids = 0x0, lateral_relids = 0x0, relid = 1, reltablespace = 0,
rtekind = RTE_RELATION, min_attr = -7, max_attr = 3, attr_needed = 0x1fc1a90, attr_widths = 0x1fc1b28,
lateral_vars = 0x0, lateral_referencers = 0x0, indexlist = 0x0, statlist = 0x0, pages = 0, tuples = 6, allvisfrac = 0,
subroot = 0x0, subplan_params = 0x0, rel_parallel_workers = -1, serverid = 0, userid = 0, useridiscurrent = false,
fdwroutine = 0x0, fdw_private = 0x0, unique_for_rels = 0x0, non_unique_for_rels = 0x0, baserestrictinfo = 0x1fc3e20,
baserestrictcost = {startup = 0, per_tuple = 0}, baserestrict_min_security = 0, joininfo = 0x0, has_eclass_joins = false,
consider_partitionwise_join = false, top_parent_relids = 0x0, part_scheme = 0x1fc1b80, nparts = 6, boundinfo = 0x1fc1d30,
partition_qual = 0x0, part_rels = 0x1fc2000, partexprs = 0x1fc1f08, nullable_partexprs = 0x1fc1fe0,
partitioned_child_rels = 0x1fc3ea0}
初始化變數
(gdb) n
1374 List *partial_subpaths = NIL;
(gdb)
1375 List *pa_partial_subpaths = NIL;
(gdb)
1376 List *pa_nonpartial_subpaths = NIL;
(gdb)
1377 bool partial_subpaths_valid = true;
(gdb)
1379 List *all_child_pathkeys = NIL;
(gdb)
1380 List *all_child_outers = NIL;
(gdb)
1382 List *partitioned_rels = NIL;
(gdb)
1383 double partial_rows = -1;
(gdb)
1386 pa_subpaths_valid = enable_parallel_append && rel->consider_parallel;
(gdb)
1404 if (rel->part_scheme != NULL)
(gdb) p pa_subpaths_valid
$17 = true
構建partitioned_rels連結串列
(gdb) n
1406 if (IS_SIMPLE_REL(rel))
(gdb)
1407 partitioned_rels = list_make1(rel->partitioned_child_rels);
(gdb)
1433 Assert(list_length(partitioned_rels) >= 1);
(gdb)
1441 foreach(l, live_childrels)
(gdb) p *partitioned_rels
$18 = {type = T_List, length = 1, head = 0x1fcff40, tail = 0x1fcff40}
開始遍歷live_childrels,對於每一個非虛擬子關係,記錄成本最低的訪問路徑.
如果子關係存在非引數化的總成本最低的訪問路徑,新增此路徑到我們為父關係構建的非引數化的Append訪問路徑中.
(gdb) n
1443 RelOptInfo *childrel = lfirst(l);
(gdb)
1445 Path *cheapest_partial_path = NULL;
(gdb)
1451 if (rel->rtekind == RTE_SUBQUERY && childrel->partitioned_child_rels != NIL)
(gdb)
1463 if (childrel->pathlist != NIL &&
(gdb)
1464 childrel->cheapest_total_path->param_info == NULL)
(gdb)
1463 if (childrel->pathlist != NIL &&
(gdb)
1465 accumulate_append_subpath(childrel->cheapest_total_path,
同樣的思路,處理並行處理中的部分計劃
(gdb) n
1471 if (childrel->partial_pathlist != NIL)
(gdb)
1473 cheapest_partial_path = linitial(childrel->partial_pathlist);
(gdb)
1474 accumulate_append_subpath(cheapest_partial_path,
同樣的,處理並行append混合並行/非並行訪問路徑
(gdb) n
1486 Path *nppath = NULL;
(gdb)
1489 get_cheapest_parallel_safe_total_inner(childrel->pathlist);
(gdb)
1488 nppath =
(gdb)
1491 if (cheapest_partial_path == NULL && nppath == NULL)
(gdb)
1496 else if (nppath == NULL ||
(gdb)
1498 cheapest_partial_path->total_cost < nppath->total_cost))
(gdb)
1497 (cheapest_partial_path != NULL &&
(gdb)
1501 Assert(cheapest_partial_path != NULL);
(gdb)
1502 accumulate_append_subpath(cheapest_partial_path,
收集子關係所有可用的排序和引數化路徑連結串列.
(gdb)
1534 foreach(lcp, childrel->pathlist)
(gdb)
(gdb) n
1536 Path *childpath = (Path *) lfirst(lcp);
(gdb)
1537 List *childkeys = childpath->pathkeys;
(gdb)
1538 Relids childouter = PATH_REQ_OUTER(childpath);
(gdb)
1541 if (childkeys != NIL)
(gdb)
1567 if (childouter)
(gdb)
1534 foreach(lcp, childrel->pathlist)
繼續下一個子關係,完成處理
...
(gdb)
1441 foreach(l, live_childrels)
(gdb)
1598 if (subpaths_valid)
如存在子關係的非引數化訪問路徑,構建未排序/未引數化的Append訪問路徑.
(gdb) n
1599 add_path(rel, (Path *) create_append_path(root, rel, subpaths, NIL,
(gdb) p *rel->pathlist
Cannot access memory at address 0x0
(gdb) n
1607 if (partial_subpaths_valid)
(gdb) p *rel->pathlist
$22 = {type = T_List, length = 1, head = 0x1fd0230, tail = 0x1fd0230}
嘗試未排序/未引數化的部分Append訪問路徑.如可能,構建parallel-aware訪問路徑.
...
(gdb)
1641 appendpath = create_append_path(root, rel, NIL, partial_subpaths,
(gdb)
1650 partial_rows = appendpath->path.rows;
(gdb)
1653 add_partial_path(rel, (Path *) appendpath);
(gdb)
使用混合的部分和非部分並行的append.
1662 if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
(gdb)
基於收集的子路徑鍵,構建非引數化的MergeAppend訪問路徑
1701 if (subpaths_valid)
(gdb)
1702 generate_mergeappend_paths(root, rel, live_childrels,
完成呼叫
(gdb)
1719 foreach(l, all_child_outers)
(gdb)
1757 }
(gdb)
set_append_rel_pathlist (root=0x1f1cdb8, rel=0x1fc1800, rti=1, rte=0x1efa3d0) at allpaths.c:1354
1354 }
(gdb) p *rel->pathlist
$23 = {type = T_List, length = 1, head = 0x1fd0230, tail = 0x1fd0230}
(gdb)
(gdb) p *(Node *)rel->pathlist->head->data.ptr_value
$24 = {type = T_AppendPath}
(gdb) p *(AppendPath *)rel->pathlist->head->data.ptr_value
$25 = {path = {type = T_AppendPath, pathtype = T_Append, parent = 0x1fc1800, pathtarget = 0x1fc1a38, param_info = 0x0,
parallel_aware = false, parallel_safe = true, parallel_workers = 0, rows = 6, startup_cost = 0,
total_cost = 30.530000000000001, pathkeys = 0x0}, partitioned_rels = 0x1fd01f8, subpaths = 0x1fcffc8,
first_partial_path = 2}
結束呼叫
(gdb) n
set_rel_pathlist (root=0x1f1cdb8, rel=0x1fc1800, rti=1, rte=0x1efa3d0) at allpaths.c:495
495 if (rel->reloptkind == RELOPT_BASEREL &&
(gdb)
496 bms_membership(root->all_baserels) != BMS_SINGLETON)
(gdb)
495 if (rel->reloptkind == RELOPT_BASEREL &&
(gdb)
504 if (set_rel_pathlist_hook)
(gdb)
508 set_cheapest(rel);
(gdb)
513 }
(gdb)
DONE!
四、參考資料
Parallel Append implementation
Partition Elimination in PostgreSQL 11
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/6906/viewspace-2374788/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- PostgreSQL 原始碼解讀(98)- 分割槽表#4(資料查詢路由#1-“擴充套件”分割槽表)SQL原始碼路由套件
- PostgreSQL 原始碼解讀(99)- 分割槽表#5(資料查詢路由#2-RelOptInfo數...SQL原始碼路由
- PostgreSQL 原始碼解讀(103)- 分割槽表#9(資料查詢路由#6-APPEND初始化和實現)SQL原始碼路由APP
- PostgreSQL 原始碼解讀(100)- 分割槽表#6(資料查詢路由#3-prune part...SQL原始碼路由
- PostgreSQL 原始碼解讀(92)- 分割槽表#1(資料插入路由#1)SQL原始碼路由
- PostgreSQL 原始碼解讀(94)- 分割槽表#2(資料插入路由#2)SQL原始碼路由
- PostgreSQL 原始碼解讀(101)- 分割槽表#7(資料查詢路由#4-prune part...SQL原始碼路由
- PostgreSQL 原始碼解讀(96)- 分割槽表#3(資料插入路由#3-獲取分割槽鍵值)SQL原始碼路由
- PostgreSQL 原始碼解讀(76)- 查詢語句#61(獲取成本最低的訪問路徑)SQL原始碼
- PostgreSQL:內建分割槽表SQL
- PostgreSQL 原始碼解讀(205)- 查詢#118(資料結構RangeTblEntry)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(178)- 查詢#95(聚合函式)#1相關資料結構SQL原始碼函式資料結構
- PostgreSQL 原始碼解讀(204)- 查詢#117(資料結構SelectStmt&Value)SQL原始碼資料結構
- Oracle查詢Interval partition分割槽表內資料Oracle
- PostgreSQL 原始碼解讀(8)- 插入資料#7(ExecutePlan)SQL原始碼
- 分割槽表分割槽索引查詢效率探究索引
- 如何查詢分割槽表的分割槽及子分割槽
- PostgreSQL 原始碼解讀(55)- 查詢語句#40(make_one_rel函式#5-索...SQL原始碼函式
- PostgreSQL 原始碼解讀(161)- 查詢#81(如何構建表示式解析步驟#1)SQL原始碼
- PostgreSQL 原始碼解讀(162)- 查詢#82(如何構建表示式解析步驟#2)SQL原始碼
- PostgreSQL 原始碼解讀(163)- 查詢#83(如何構建表示式解析步驟#3)SQL原始碼
- postgresql分割槽表修改資料表欄位SQL
- PostgreSQL 原始碼解讀(206)- 查詢#119(資料結構RangSubselect等)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(44)- 查詢語句#29(等價類相關資料結構)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(23)- 查詢語句#8(PlannedStmt與QUERY P...SQL原始碼
- PostgreSQL 原始碼解讀(24)- 查詢語句#9(查詢重寫)SQL原始碼
- postgresql分割槽表SQL
- PostgreSQL 原始碼解讀(207)- 查詢#120(資料結構FromExpr&JoinExpr)SQL原始碼資料結構
- PostgreSQL 原始碼解讀(20)- 查詢語句#5(查詢樹Query詳解)SQL原始碼
- PostgreSQL10.0內建分割槽表SQL
- PostgreSQL 原始碼解讀(187)- 查詢#103(聚合函式#8 - Struct Review)SQL原始碼函式StructView
- PostgreSQL 原始碼解讀(47)- 查詢語句#32(query_planner函式#8)SQL原始碼函式
- PostgreSQL 原始碼解讀(186)- 查詢#102(聚合函式#7-advance_aggregates)SQL原始碼函式
- PostgreSQL 原始碼解讀(112)- WAL#8(XLogCtrl資料結構)SQL原始碼GC資料結構
- PostgreSQL 原始碼解讀(19)- 查詢語句#4(ParseTree詳解)SQL原始碼
- PostgreSQL 原始碼解讀(25)- 查詢語句#10(查詢優化概覽)SQL原始碼優化
- PostgreSQL 原始碼解讀(17)- 查詢語句#2(查詢優化基礎)SQL原始碼優化
- Laravel 原始碼閱讀指南 -- Database 查詢構建器Laravel原始碼Database