【GreatSQL最佳化器-05】條件過濾condition_fanout_filter
一、condition_fanout_filter介紹
GreatSQL 的最佳化器對於 join 的表需要根據行數和 cost 來確定最後哪張表先執行哪張表後執行,這裡面就涉及到預估滿足條件的表資料,condition_fanout_filter
會根據一系列方法計算出一個資料過濾百分比,這個比百分比就是 filtered 係數,這個值區間在[0,1],值越小代表過濾效果越好。用這個係數乘以總的行數就可以得出最後需要掃描的錶行數的數量,可以大幅節省開銷和執行時間。
這個功能是由 OPTIMIZER_SWITCH_COND_FANOUT_FILTER這個OPTIMIZER_SWITCH
來控制的,預設是開啟的。因此一般情況下不需要特意去關閉,但是如果遇到執行特別慢的一些情況可以考慮關閉。
下面用一個簡單的例子來說明 condition_fanout_filter
是什麼:
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT,date1 DATETIME);
INSERT INTO t1 VALUES (1,10,'2021-03-25 16:44:00.123456'),(2,1,'2022-03-26 16:44:00.123456'),(3,4,'2023-03-27 16:44:00.123456'),(5,5,'2024-03-25 16:44:00.123456');
CREATE TABLE t2 (cc1 INT PRIMARY KEY, cc2 INT);
INSERT INTO t2 VALUES (1,3),(2,1),(3,2),(4,3),(5,15);
# 為了檢視過濾係數,需要建立一張沒有主鍵的表用來做過濾估計。
CREATE TABLE t3 (ccc1 INT, ccc2 varchar(100));
INSERT INTO t3 VALUES (1,'aa1'),(2,'bb1'),(3,'cc1'),(4,'dd1'),(null,'ee');
CREATE INDEX idx1 ON t1(c2);
CREATE INDEX idx2 ON t1(c2,date1);
CREATE INDEX idx2_1 ON t2(cc2);
CREATE INDEX idx3_1 ON t3(ccc1);
看到下面的 t3 的過濾百分比46.66%,意味著兩張表 join 一共20行,因為 t3 的過濾百分比為 46.66%,因此最後只需要讀取 20*46.66%=9 行資料。
注意,這裡沒有建立直方圖,因此結果不包含直方圖過濾的因素,關於直方圖後面會專門開一章講。
greatsql> EXPLAIN SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1 <3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | idx2 | 11 | NULL | 4 | 100.00 | Using index |
| 1 | SIMPLE | t3 | NULL | ALL | idx3_1 | NULL | NULL | NULL | 5 | 46.66 | Using where; Using join buffer (hash join) |
# 這裡顯示的值就是 t3 的過濾資料百分比
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1<3;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3)) (cost=3.65 rows=9)
-> Inner hash join (no condition) (cost=3.65 rows=9) # 這裡結果為讀取9行資料,跟上面算出來的資料一致
-> Table scan on t3 (cost=0.12 rows=5)
-> Hash
-> Index scan on t1 using idx2 (cost=1.40 rows=4)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
二、best_access_path程式碼解釋
condition_fanout_filte
的計算在 best_access_path
函式實現,用來預估不同 join 連線時候的 join 表的讀取行數和可能的 cost。
void Optimize_table_order::best_access_path(JOIN_TAB *tab,
const table_map remaining_tables,
const uint idx, bool disable_jbuf,
const double prefix_rowcount,
POSITION *pos) {
# 如果根據前面的結果keyuse_array陣列有值的話,那麼根據find_best_ref()函式先找出最優索引,按照索引的方式計算cost
if (tab->keyuse() != nullptr &&
(table->file->ha_table_flags() & HA_NO_INDEX_ACCESS) == 0)
best_ref =
find_best_ref(tab, remaining_tables, idx, prefix_rowcount,
&found_condition, &ref_depend_map, &used_key_parts);
# 最主要計算下面3個值
pos->filter_effect = filter_effect = std:: (1.0, tab->found_records * calculate_condition_filter() / rows_after_filtering);
pos->rows_fetched = rows_fetched = rows_after_filtering;
pos->read_cost = scan_read_cost = calculate_scan_cost();
}
下面是程式碼裡面涉及的計算公式,這裡是 keyuse_array 陣列為空的情況,也就是掃描方式 "access_type" 非 "eq_ref" 和 "ref" 的情況,或者說是沒有找到最優索引的情況。keyuse_array 陣列有值的情況,在函式find_best_ref()計算,結果有可能也走下面的計算方式,詳細在後面的章節詳細介紹。
關鍵引數 | 解釋 | 值 |
---|---|---|
rows_fetched | 總共需要讀取多少行 | rows_after_filtering 見下表一 |
filter_effect | 條件過濾百分比係數 | std::min(1.0, tab->found_records * calculate_condition_filter() / rows_after_filtering) 見下表一和二 |
read_cost | 讀的開銷 | calculate_scan_cost() 見下表四 |
prefix_rowcount | join左表的行數,意味著多少行會被join到右表 對於第一張表來說prefix_rowcount=1 | 第一張表:prefix_rowcount = rows_fetched * filter_effect 非第一張表:prefix_rowcount = 上一張表的prefix_rowcount * rows_fetched * filter_effect |
prefix_cost | join左表的cost,row_evaluate_cost()計算公式=0.1 * 行數,0.1是讀一行的開銷 | 第一張表:read_cost + cm->row_evaluate_cost(prefix_rowcount) 非第一張表:上一張表的prefix_cost + read_cost + cm->row_evaluate_cost(上一張表的prefix_rowcount*rows_fetched) |
表一,rows_after_filtering 計算方式
場景 | 解釋 | 值 |
---|---|---|
OPTIMIZER_SWITCH_COND_FANOUT_FILTER模式(預設ON) | 條件過濾模式開啟 | tab->found_records * calculate_condition_filter() 見下表二 |
table->quick_condition_rows != tab->found_records | 透過別的方法獲取的表滿足的行數 | table->quick_condition_rows |
keyuse_array有值(參考前面<<最佳化器02>>) | REF掃描模式 | tab->found_records * 0.75 |
以上都不符合。預設情況 | 全錶行數 | tab->found_records |
表二,calculate_condition_filter() 計算方式
場景 | 值 | 說明 |
---|---|---|
滿足條件的表的行數為0 | filter = COND_FILTER_ALLPASS 見表三 | |
使用索引 | filter = filter * std::min(table->quick_rows[keyno] / tab->records() , 1.0) | |
帶有條件並且條件裡涉及不含有索引的列 | filter = filter * Item_cond->get_filtering_effect() 見表三 | |
合併結果 | filter = max(filter, 1.0 / tab->records()) | |
(filter * fanout) < 0.05 | filter = 0.05 / fanout | fanout是滿足條件的表的行數 |
注:filter是條件過濾百分比,這個值區間在[0,1],這個值越小代表過濾效果越好
表三,get_filtering_effect() 算出來的過濾係數
Item的係數 | 解釋 | 係數 | 說明 |
---|---|---|---|
COND_FILTER_ALLPASS | Always true | 1.0f | 代表全部資料都符合,全部都要掃描 |
COND_FILTER_EQUALITY | col1 = col2 | 0.1f | 代表預估10%資料符合條件 |
COND_FILTER_INEQUALITY | col1 > col2 | 0.3333f | 代表預估1/3資料符合條件 |
COND_FILTER_BETWEEN | col1 BETWEEN a AND b | 0.1111f | 代表預估1/9資料符合條件 |
表四,calculate_scan_cost() 計算方式
場景 | 值 | 說明 |
---|---|---|
如果是範圍掃描 | prefix_rowcount * (tab->range_scan()->cost + cost_model->row_evaluate_cost(tab->found_records - *rows_after_filtering)) | |
非join buffer模式 | prefix_rowcount * (single_scan_read_cost + cost_model->row_evaluate_cost(tab->records() - *rows_after_filtering)) single_scan_read_cost計算如下: force_index模式: table->file->read_cost(tab->ref().key, 1,tab->records() 見表五) 非force_index模式: table->file->table_scan_cost() 見表五 | |
join buffer模式(預設ON) | buffer_count * (single_scan_read_cost+ cost_model->row_evaluate_cost(tab->records() - *rows_after_filtering)) buffer_count計算如下: single_scan_read_cost計算如下: 1.0 + ((double)cache_record_length(join, idx) * prefix_rowcount / thd->variables.join_buff_size force_index模式: table->file->read_cost(tab->ref().key, 1,tab->records()) 見表五 非force_index模式: table->file->table_scan_cost() 見表五 | 預設這個模式開啟 |
表五 引擎計算相關
場景 | 值 | 說明 |
---|---|---|
table->file->read_cost(index, ranges, rows) | read_time(index, ranges, rows) *table->cost_model()->page_read_cost(1.0) | index:index序號,ranges:range範圍,rows:錶行數 |
table->file->table_scan_cost() | (stats.data_file_length / IO_SIZE + 2) * table->cost_model()->page_read_cost(1.0) | IO_SIZE=4096 |
三、實際例子說明
接下來看一個例子來說明上面的程式碼。這裡的例子不涉及keyuse_array陣列有值的情況。
greatsql> SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+----+------+---------------------+------+------+
| c1 | c2 | date1 | ccc1 | ccc2 |
+----+------+---------------------+------+------+
| 1 | 10 | 2021-03-25 16:44:00 | 1 | aa1 |
| 5 | 5 | 2024-03-25 16:44:00 | 1 | aa1 |
| 3 | 4 | 2023-03-27 16:44:00 | 1 | aa1 |
| 2 | 1 | 2022-03-26 16:44:00 | 1 | aa1 |
| 1 | 10 | 2021-03-25 16:44:00 | 2 | bb1 |
| 5 | 5 | 2024-03-25 16:44:00 | 2 | bb1 |
| 3 | 4 | 2023-03-27 16:44:00 | 2 | bb1 |
| 2 | 1 | 2022-03-26 16:44:00 | 2 | bb1 |
| 3 | 4 | 2023-03-27 16:44:00 | 3 | cc1 |
+----+------+---------------------+------+------+
greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3)) (cost=3.65 rows=9)
-> Inner hash join (no condition) (cost=3.65 rows=9)
-> Table scan on t3 (cost=0.12 rows=5)
-> Hash
-> Index scan on t1 using idx2 (cost=1.40 rows=4)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
greatsql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
"considered_execution_plans": [
{
"plan_prefix": [
],
"table": "`t1`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 4,
"filtering_effect": [
],
"final_filtering_effect": 1,
"access_type": "scan",
"resulting_rows": 4,# 這個值就是rows_after_filtering
"cost": 0.65, # 計算公式=read_cost(0.25)+cost_model->row_evaluate_cost(1 * rows_after_filtering)
"chosen": true
}
]
},
"condition_filtering_pct": 100, # 這個值就是filter_effect * 100
"rows_for_plan": 4, # 這個值就是prefix_rowcount=rows_to_scan * filter_effect / 100
"cost_for_plan": 0.65, # 這個值就是prefix_cost=read_cost(0.25) + cm->row_evaluate_cost(prefix_rowcount=4)
"rest_of_plan": [
{
"plan_prefix": [ # 當前左連線表為t1
"`t1`"
],
"table": "`t3`", # 右連線表為t3
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 5,
"filtering_effect": [
],
"final_filtering_effect": 1,
"access_type": "scan",
"using_join_cache": true,
"buffers_needed": 1,
"resulting_rows": 5, # 這個值就是rows_after_filtering
"cost": 2.25005, # 計算公式read_cost(0.25)+cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering=4 * 5)
"chosen": true
}
]
},
"condition_filtering_pct": 46.664, # 這個值計算過程見下面<<附錄:t3的filter_effect值計算>> ※bug?用explain看到的和直接敲命令看到的不一樣,前者46.664後者為100,而"rows_for_plan"會變為沒有過濾的20
"rows_for_plan": 9.3328, # 這個值就是prefix_rowcount=4(上一張表的prefix_rowcount) * 5 * 46.664 / 100
"cost_for_plan": 2.90005, # 這個值就是prefix_cost=0.65(上一張表的prefix_cost)+read_cost(0.25) + cm->row_evaluate_cost(20=4*5)
"chosen": true
}
]
},
{
"plan_prefix": [
],
"table": "`t3`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 5,
"filtering_effect": [
],
"final_filtering_effect": 1,
"access_type": "scan",
"resulting_rows": 5,
"cost": 0.75, # 計算公式read_cost(0.25)+5行*0.1
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 5, # 這個值就是prefix_rowcount
"cost_for_plan": 0.75, # 這個值就是prefix_cost
"rest_of_plan": [
{
"plan_prefix": [ # 當前左連線表為t3
"`t3`"
],
"table": "`t1`", # 右連線表為t1
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 4,
"filtering_effect": [
],
"final_filtering_effect": 1,
"access_type": "scan",
"using_join_cache": true,
"buffers_needed": 1,
"resulting_rows": 4,
"cost": 3.00776, # 計算公式read_cost(1.007)+cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering=5 * 4)
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 20, # 這個值就是prefix_rowcount=5(上一張表的prefix_rowcount) * 4 * 100 / 100
"cost_for_plan": 3.75776, # 這個值就是prefix_cost=0.75(上一張表的prefix_cost)+read_cost(1.007) + cm->row_evaluate_cost(20=5*4)
"pruned_by_cost": true # 因為這裡算出來的3.75776 > 2.90005,因此被裁剪掉了
}
]
}
]
},
{
"attaching_conditions_to_tables": { # 新增一些附加條件
"original_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
"attached_conditions_computation": [
],
"attached_conditions_summary": [ # t1作為驅動表,執行全表掃描,因此不需要任何條件
{
"table": "`t1`",
"attached": null
},
{
"table": "`t3`",
"attached": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))" # t3作為連線表,按照條件過濾
}
]
}
},
{
"finalizing_table_conditions": [
{
"table": "`t3`",
"original_table_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
"final_table_condition ": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
}
]
},
{
"refine_plan": [
{
"table": "`t1`"
},
{
"table": "`t3`"
}
]
}
]
}
},
{
"join_explain": {
"select#": 1,
"steps": [
]
}
}
]
} |
# 附錄:t3的filter_effect值計算
# 這個函式計算t1.c1=t3.ccc1和t3.ccc1<3的filter_effect值,計算過程見下面
float Item_cond_or::get_filtering_effect(THD *thd, table_map filter_for_table,
table_map read_tables,
const MY_BITMAP *fields_to_ignore,
double rows_in_table) {
while ((item = it++)) {
const float cur_filter = item->get_filtering_effect(
thd, filter_for_table, read_tables, fields_to_ignore, rows_in_table);
# 第一次:計算t1.c1=t3.ccc1,返回 1/5=20%,其中5是總行數。
# 第二次:計算t3.ccc1<3,返回COND_FILTER_INEQUALITY=0.333
# 最後的filter=0.2+0.333 - (0.2*0.333)=0.4664
filter = filter + cur_filter - (filter * cur_filter);
}
return filter;
}
下面把OPTIMIZER_SWITCH_COND_FANOUT_FILTER
模式關閉看看執行效果什麼變化
greatsql> explain SELECT /*+ set_var(optimizer_switch='condition_fanout_filter=off') */ * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | idx2 | 11 | NULL | 4 | 100.00 | Using index |
| 1 | SIMPLE | t3 | NULL | ALL | idx3_1 | NULL | NULL | NULL | 5 | 100.00 | Using where; Using join buffer (hash join) |
# 看到這裡選擇了不做條件過濾,全表資料連線
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
greatsql> EXPLAIN FORMAT=TREE SELECT /*+ set_var(optimizer_switch='condition_fanout_filter=off') */ * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3)) (cost=3.65 rows=20)# 因為表裡資料少,跟剛才比cost沒有變化,但是要是資料大的話就很明顯
-> Inner hash join (no condition) (cost=3.65 rows=20) # 這裡變為了20行,意味著讀取全部行資料
-> Table scan on t3 (cost=0.19 rows=5)
-> Hash
-> Index scan on t1 using idx2 (cost=1.40 rows=4)
|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{
"considered_execution_plans": [
{
"plan_prefix": [
],
"table": "`t1`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 4,
"access_type": "scan",
"resulting_rows": 4,
"cost": 1.4,
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 4,
"cost_for_plan": 1.4,
"rest_of_plan": [
{
"plan_prefix": [ # 當前左連線表為t1
"`t1`"
],
"table": "`t3`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 5,
"access_type": "scan",
"using_join_cache": true,
"buffers_needed": 1,
"resulting_rows": 5,
"cost": 2.25005,
"chosen": true
}
]
},
"condition_filtering_pct": 100, # 這裡的過濾係數變為100,說明不進行條件過濾,讀取全部行
"rows_for_plan": 20, # 由上面的9行變為全部資料連線,20行資料
"cost_for_plan": 3.65005,
"chosen": true
}
]
},
{
"plan_prefix": [
],
"table": "`t3`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 5,
"access_type": "scan",
"resulting_rows": 5,
"cost": 0.75,
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 5,
"cost_for_plan": 0.75,
"rest_of_plan": [
{
"plan_prefix": [
"`t3`"
],
"table": "`t1`",
"best_access_path": {
"considered_access_paths": [
{
"rows_to_scan": 4,
"access_type": "scan",
"using_join_cache": true,
"buffers_needed": 1,
"resulting_rows": 4,
"cost": 3.00776,
"chosen": true
}
]
},
"condition_filtering_pct": 100,
"rows_for_plan": 20,
"cost_for_plan": 3.75776,
"pruned_by_cost": true
}
]
}
]
},
{
"attaching_conditions_to_tables": {
"original_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
"attached_conditions_computation": [
],
"attached_conditions_summary": [
{
"table": "`t1`",
"attached": null
},
{
"table": "`t3`",
"attached": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
}
]
}
},
{
"finalizing_table_conditions": [
{
"table": "`t3`",
"original_table_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
"final_table_condition ": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
}
]
},
{
"refine_plan": [
{
"table": "`t1`"
},
{
"table": "`t3`"
}
]
}
]
}
},
{
"join_explain": {
"select#": 1,
"steps": [
]
}
}
]
}
現在把之前03的內容拿過來回顧一下當時的結果,看看為何是當時的結果。從下面結果可以看出,當存在keyuse的時候,最佳化器最後走哪條索引是透過find_best_ref()函式來決定的。
greatsql> EXPLAIN SELECT * FROM t1 join t2 ON t1.c1=t2.cc1 and t1.c1<5;
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 3 | 100.00 | Using where |
| 1 | SIMPLE | t2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | db1.t1.c1 | 1 | 100.00 | NULL |
# t2這個結果看下面的解釋
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+
{\
"considered_execution_plans": [\
{\
"plan_prefix": [\
],\
"table": "`t1`",\
"best_access_path": {\
"considered_access_paths": [\
{\
"access_type": "ref",\ # t1表不選擇"ref"方式,因為cost比"range"方式大
"index": "PRIMARY",\
"usable": false,\
"chosen": false\
},\
{\
"rows_to_scan": 3,\
"filtering_effect": [\
],\
"final_filtering_effect": 1,\
"access_type": "range",\ # t1表最後選擇了"range"方式進行掃描
"range_details": {\
"used_index": "PRIMARY"\
},\
"resulting_rows": 3,\
"cost": 0.860732,\
"chosen": true\
}\
]\
},\
"condition_filtering_pct": 100,\
"rows_for_plan": 3,\
"cost_for_plan": 0.860732,\
"rest_of_plan": [\
"rest_of_plan": [\
{\
"plan_prefix": [\
"`t1`"\
],\
"table": "`t2`",\
"best_access_path": {\
"considered_access_paths": [\
{\
"access_type": "eq_ref",\ # 在t1 join t2的連線方式裡,判斷用"eq_ref"方式掃描cost更小。這裡透過find_best_ref()獲取
"index": "PRIMARY",\
"rows": 1,\
"cost": 1.05,\
"chosen": true,\
"cause": "clustered_pk_chosen_by_heuristics"\
},\
{\
"access_type": "range",\
"range_details": {\
"used_index": "PRIMARY"\
},\
"chosen": false,\
"cause": "heuristic_index_cheaper"\
}\
]\
},\
"condition_filtering_pct": 100,\
"rows_for_plan": 3,\
"cost_for_plan": 1.91073,\
"chosen": true\
}\
{\
"refine_plan": [\ # 根據最後比較結果,按照t1,t2方式連線cost最小,因此選擇這種順序進行join
{\
"table": "`t1`"\
},\
{\
"table": "`t2`"\
}\
]\
}\
四、總結
從上面最佳化器最早的步驟我們認識了條件過濾condition_fanout_filter
的定義以及使用方法,也學會了計算join時候的cost,然後根據cost選擇最優的連線順序。回到開頭提到的這個功能在一些場景有可能會帶來選擇錯誤的問題,這個下一期展開細講。用這種過濾方法算出來的結果也不精確,GreatSQL 用了另一個功能直方圖來做資料統計,讓結果更加精確,這個直方圖的分析放後面章節講。
Enjoy GreatSQL 😃
關於 GreatSQL
GreatSQL是適用於金融級應用的國內自主開源資料庫,具備高效能、高可靠、高易用性、高安全等多個核心特性,可以作為MySQL或Percona Server的可選替換,用於線上生產環境,且完全免費併相容MySQL或Percona Server。
相關連結: GreatSQL社群 Gitee GitHub Bilibili
GreatSQL社群:
社群部落格有獎徵稿詳情:https://greatsql.cn/thread-100-1-1.html
技術交流群:
微信:掃碼新增
GreatSQL社群助手
微信好友,傳送驗證資訊加群
。