【GreatSQL最佳化器-05】條件過濾condition_fanout_filter

GreatSQL發表於2024-12-06

原文網址 : https://www.cnblogs.com/greatsql/p/18589995

【GreatSQL最佳化器-05】條件過濾condition_fanout_filter

一、condition_fanout_filter介紹

GreatSQL 的最佳化器對於 join 的表需要根據行數和 cost 來確定最後哪張表先執行哪張表後執行，這裡面就涉及到預估滿足條件的表資料，condition_fanout_filter 會根據一系列方法計算出一個資料過濾百分比，這個比百分比就是 filtered 係數，這個值區間在[0,1]，值越小代表過濾效果越好。用這個係數乘以總的行數就可以得出最後需要掃描的錶行數的數量，可以大幅節省開銷和執行時間。

這個功能是由 OPTIMIZER_SWITCH_COND_FANOUT_FILTER這個OPTIMIZER_SWITCH 來控制的，預設是開啟的。因此一般情況下不需要特意去關閉，但是如果遇到執行特別慢的一些情況可以考慮關閉。

下面用一個簡單的例子來說明 condition_fanout_filter 是什麼：

CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT,date1 DATETIME);
INSERT INTO t1 VALUES (1,10,'2021-03-25 16:44:00.123456'),(2,1,'2022-03-26 16:44:00.123456'),(3,4,'2023-03-27 16:44:00.123456'),(5,5,'2024-03-25 16:44:00.123456');
CREATE TABLE t2 (cc1 INT PRIMARY KEY, cc2 INT);
INSERT INTO t2 VALUES (1,3),(2,1),(3,2),(4,3),(5,15);
# 為了檢視過濾係數，需要建立一張沒有主鍵的表用來做過濾估計。
CREATE TABLE t3 (ccc1 INT, ccc2 varchar(100));
INSERT INTO t3 VALUES (1,'aa1'),(2,'bb1'),(3,'cc1'),(4,'dd1'),(null,'ee');
CREATE INDEX idx1 ON t1(c2);
CREATE INDEX idx2 ON t1(c2,date1);
CREATE INDEX idx2_1 ON t2(cc2);
CREATE INDEX idx3_1 ON t3(ccc1);

看到下面的 t3 的過濾百分比46.66%，意味著兩張表 join 一共20行，因為 t3 的過濾百分比為 46.66%，因此最後只需要讀取 20*46.66%=9 行資料。

注意，這裡沒有建立直方圖，因此結果不包含直方圖過濾的因素，關於直方圖後面會專門開一章講。

greatsql> EXPLAIN SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1 <3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY       | idx2 | 11      | NULL |    4 |   100.00 | Using index                                |
|  1 | SIMPLE      | t3    | NULL       | ALL   | idx3_1        | NULL | NULL    | NULL |    5 |    46.66 | Using where; Using join buffer (hash join) |
# 這裡顯示的值就是 t3 的過濾資料百分比
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+

greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1<3;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3))  (cost=3.65 rows=9)
    -> Inner hash join (no condition)  (cost=3.65 rows=9) # 這裡結果為讀取9行資料，跟上面算出來的資料一致
        -> Table scan on t3  (cost=0.12 rows=5)
        -> Hash
            -> Index scan on t1 using idx2  (cost=1.40 rows=4)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

二、best_access_path程式碼解釋

condition_fanout_filte的計算在 best_access_path函式實現，用來預估不同 join 連線時候的 join 表的讀取行數和可能的 cost。

void Optimize_table_order::best_access_path(JOIN_TAB *tab,
                                            const table_map remaining_tables,
                                            const uint idx, bool disable_jbuf,
                                            const double prefix_rowcount,
                                            POSITION *pos) {
# 如果根據前面的結果keyuse_array陣列有值的話，那麼根據find_best_ref()函式先找出最優索引，按照索引的方式計算cost
  if (tab->keyuse() != nullptr &&
      (table->file->ha_table_flags() & HA_NO_INDEX_ACCESS) == 0)
    best_ref =
        find_best_ref(tab, remaining_tables, idx, prefix_rowcount,
                      &found_condition, &ref_depend_map, &used_key_parts);
  # 最主要計算下面3個值
  pos->filter_effect = filter_effect = std::  (1.0, tab->found_records * calculate_condition_filter() / rows_after_filtering);
  pos->rows_fetched = rows_fetched = rows_after_filtering;
  pos->read_cost = scan_read_cost = calculate_scan_cost();    
}

下面是程式碼裡面涉及的計算公式，這裡是 keyuse_array 陣列為空的情況，也就是掃描方式 "access_type" 非 "eq_ref" 和 "ref" 的情況，或者說是沒有找到最優索引的情況。keyuse_array 陣列有值的情況，在函式find_best_ref()計算，結果有可能也走下面的計算方式，詳細在後面的章節詳細介紹。

關鍵引數	解釋	值
rows_fetched	總共需要讀取多少行	rows_after_filtering 見下表一
filter_effect	條件過濾百分比係數	std::min(1.0, tab->found_records * calculate_condition_filter() / rows_after_filtering) 見下表一和二
read_cost	讀的開銷	calculate_scan_cost() 見下表四
prefix_rowcount	join左表的行數，意味著多少行會被join到右表對於第一張表來說prefix_rowcount=1	第一張表：prefix_rowcount = rows_fetched * filter_effect 非第一張表：prefix_rowcount = 上一張表的prefix_rowcount * rows_fetched * filter_effect
prefix_cost	join左表的cost，row_evaluate_cost()計算公式=0.1 * 行數，0.1是讀一行的開銷	第一張表：read_cost + cm->row_evaluate_cost(prefix_rowcount) 非第一張表：上一張表的prefix_cost + read_cost + cm->row_evaluate_cost(上一張表的prefix_rowcount*rows_fetched)

表一，rows_after_filtering 計算方式

場景	解釋	值
OPTIMIZER_SWITCH_COND_FANOUT_FILTER模式(預設ON)	條件過濾模式開啟	tab->found_records * calculate_condition_filter() 見下表二
table->quick_condition_rows != tab->found_records	透過別的方法獲取的表滿足的行數	table->quick_condition_rows
keyuse_array有值(參考前面<<最佳化器02>>)	REF掃描模式	tab->found_records * 0.75
以上都不符合。預設情況	全錶行數	tab->found_records

表二，calculate_condition_filter() 計算方式

場景	值	說明
滿足條件的表的行數為0	filter = COND_FILTER_ALLPASS 見表三
使用索引	filter = filter * std::min(table->quick_rows[keyno] / tab->records() , 1.0)
帶有條件並且條件裡涉及不含有索引的列	filter = filter * Item_cond->get_filtering_effect() 見表三
合併結果	filter = max(filter, 1.0 / tab->records())
(filter * fanout) < 0.05	filter = 0.05 / fanout	fanout是滿足條件的表的行數

注：filter是條件過濾百分比，這個值區間在[0,1]，這個值越小代表過濾效果越好

表三，get_filtering_effect() 算出來的過濾係數

Item的係數	解釋	係數	說明
COND_FILTER_ALLPASS	Always true	1.0f	代表全部資料都符合，全部都要掃描
COND_FILTER_EQUALITY	col1 = col2	0.1f	代表預估10%資料符合條件
COND_FILTER_INEQUALITY	col1 > col2	0.3333f	代表預估1/3資料符合條件
COND_FILTER_BETWEEN	col1 BETWEEN a AND b	0.1111f	代表預估1/9資料符合條件

表四，calculate_scan_cost() 計算方式

場景	值	說明
如果是範圍掃描	prefix_rowcount * (tab->range_scan()->cost + cost_model->row_evaluate_cost(tab->found_records - *rows_after_filtering))
非join buffer模式	prefix_rowcount * (single_scan_read_cost + cost_model->row_evaluate_cost(tab->records() - *rows_after_filtering)) single_scan_read_cost計算如下： force_index模式: table->file->read_cost(tab->ref().key, 1,tab->records() 見表五) 非force_index模式: table->file->table_scan_cost() 見表五
join buffer模式(預設ON)	buffer_count * (single_scan_read_cost+ cost_model->row_evaluate_cost(tab->records() - rows_after_filtering)) buffer_count計算如下： single_scan_read_cost計算如下： 1.0 + ((double)cache_record_length(join, idx) prefix_rowcount / thd->variables.join_buff_size force_index模式: table->file->read_cost(tab->ref().key, 1,tab->records()) 見表五非force_index模式: table->file->table_scan_cost() 見表五	預設這個模式開啟

表五引擎計算相關

場景	值	說明
table->file->read_cost(index, ranges, rows)	read_time(index, ranges, rows) *table->cost_model()->page_read_cost(1.0)	index:index序號，ranges:range範圍，rows:錶行數
table->file->table_scan_cost()	(stats.data_file_length / IO_SIZE + 2) * table->cost_model()->page_read_cost(1.0)	IO_SIZE=4096

三、實際例子說明

接下來看一個例子來說明上面的程式碼。這裡的例子不涉及keyuse_array陣列有值的情況。

greatsql> SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+----+------+---------------------+------+------+
| c1 | c2   | date1               | ccc1 | ccc2 |
+----+------+---------------------+------+------+
|  1 |   10 | 2021-03-25 16:44:00 |    1 | aa1  |
|  5 |    5 | 2024-03-25 16:44:00 |    1 | aa1  |
|  3 |    4 | 2023-03-27 16:44:00 |    1 | aa1  |
|  2 |    1 | 2022-03-26 16:44:00 |    1 | aa1  |
|  1 |   10 | 2021-03-25 16:44:00 |    2 | bb1  |
|  5 |    5 | 2024-03-25 16:44:00 |    2 | bb1  |
|  3 |    4 | 2023-03-27 16:44:00 |    2 | bb1  |
|  2 |    1 | 2022-03-26 16:44:00 |    2 | bb1  |
|  3 |    4 | 2023-03-27 16:44:00 |    3 | cc1  |
+----+------+---------------------+------+------+

greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3))  (cost=3.65 rows=9)
    -> Inner hash join (no condition)  (cost=3.65 rows=9)
        -> Table scan on t3  (cost=0.12 rows=5)
        -> Hash
            -> Index scan on t1 using idx2  (cost=1.40 rows=4)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

greatsql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 4,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 4,# 這個值就是rows_after_filtering
                      "cost": 0.65, # 計算公式=read_cost(0.25)+cost_model->row_evaluate_cost(1 * rows_after_filtering)
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100, # 這個值就是filter_effect * 100
                "rows_for_plan": 4, # 這個值就是prefix_rowcount=rows_to_scan * filter_effect / 100
                "cost_for_plan": 0.65, # 這個值就是prefix_cost=read_cost(0.25) + cm->row_evaluate_cost(prefix_rowcount=4)
                "rest_of_plan": [
                  {
                    "plan_prefix": [ # 當前左連線表為t1
                      "`t1`"
                    ],
                    "table": "`t3`", # 右連線表為t3
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 5,
                          "filtering_effect": [
                          ],
                          "final_filtering_effect": 1,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 5, # 這個值就是rows_after_filtering
                          "cost": 2.25005, # 計算公式read_cost(0.25)+cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering=4 * 5)
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 46.664, # 這個值計算過程見下面<<附錄：t3的filter_effect值計算>> ※bug？用explain看到的和直接敲命令看到的不一樣，前者46.664後者為100，而"rows_for_plan"會變為沒有過濾的20
                    "rows_for_plan": 9.3328, # 這個值就是prefix_rowcount=4(上一張表的prefix_rowcount) * 5 * 46.664 / 100
                    "cost_for_plan": 2.90005, # 這個值就是prefix_cost=0.65(上一張表的prefix_cost)+read_cost(0.25) + cm->row_evaluate_cost(20=4*5)
                    "chosen": true
                  }
                ]
              },
              {
                "plan_prefix": [
                ],
                "table": "`t3`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 5,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 5,
                      "cost": 0.75, # 計算公式read_cost(0.25)+5行*0.1
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 5, # 這個值就是prefix_rowcount
                "cost_for_plan": 0.75, # 這個值就是prefix_cost
                "rest_of_plan": [
                  {
                    "plan_prefix": [ # 當前左連線表為t3
                      "`t3`"
                    ],
                    "table": "`t1`", # 右連線表為t1
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 4,
                          "filtering_effect": [
                          ],
                          "final_filtering_effect": 1,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 4,
                          "cost": 3.00776, # 計算公式read_cost(1.007)+cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering=5 * 4)
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 100,
                    "rows_for_plan": 20, # 這個值就是prefix_rowcount=5(上一張表的prefix_rowcount) * 4 * 100 / 100
                    "cost_for_plan": 3.75776, # 這個值就是prefix_cost=0.75(上一張表的prefix_cost)+read_cost(1.007) + cm->row_evaluate_cost(20=5*4)
                    "pruned_by_cost": true # 因為這裡算出來的3.75776 > 2.90005，因此被裁剪掉了
                  }
                ]
              }
            ]
          },
          {
            "attaching_conditions_to_tables": { # 新增一些附加條件
              "original_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [ # t1作為驅動表，執行全表掃描，因此不需要任何條件
                {
                  "table": "`t1`",
                  "attached": null
                },
                {
                  "table": "`t3`",
                  "attached": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))" #  t3作為連線表，按照條件過濾
                }
              ]
            }
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`t3`",
                "original_table_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
                "final_table_condition   ": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
              }
            ]
          },
          {
            "refine_plan": [
              {
                "table": "`t1`"
              },
              {
                "table": "`t3`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
} |           

# 附錄：t3的filter_effect值計算
# 這個函式計算t1.c1=t3.ccc1和t3.ccc1<3的filter_effect值，計算過程見下面
float Item_cond_or::get_filtering_effect(THD *thd, table_map filter_for_table,
                                         table_map read_tables,
                                         const MY_BITMAP *fields_to_ignore,
                                         double rows_in_table) {
  while ((item = it++)) {
    const float cur_filter = item->get_filtering_effect(
        thd, filter_for_table, read_tables, fields_to_ignore, rows_in_table);
    # 第一次：計算t1.c1=t3.ccc1，返回 1/5=20%，其中5是總行數。
    # 第二次：計算t3.ccc1<3，返回COND_FILTER_INEQUALITY=0.333
    # 最後的filter=0.2+0.333 - (0.2*0.333)=0.4664
    filter = filter + cur_filter - (filter * cur_filter);
  }
  return filter;
}

下面把OPTIMIZER_SWITCH_COND_FANOUT_FILTER模式關閉看看執行效果什麼變化

greatsql> explain SELECT /*+ set_var(optimizer_switch='condition_fanout_filter=off') */ * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY       | idx2 | 11      | NULL |    4 |   100.00 | Using index                                |
|  1 | SIMPLE      | t3    | NULL       | ALL   | idx3_1        | NULL | NULL    | NULL |    5 |   100.00 | Using where; Using join buffer (hash join) |
                                                                                            # 看到這裡選擇了不做條件過濾，全表資料連線
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+

greatsql> EXPLAIN FORMAT=TREE SELECT /*+ set_var(optimizer_switch='condition_fanout_filter=off') */ * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3))  (cost=3.65 rows=20)# 因為表裡資料少，跟剛才比cost沒有變化，但是要是資料大的話就很明顯
    -> Inner hash join (no condition)  (cost=3.65 rows=20) # 這裡變為了20行，意味著讀取全部行資料
        -> Table scan on t3  (cost=0.19 rows=5)
        -> Hash
            -> Index scan on t1 using idx2  (cost=1.40 rows=4)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 4,
                      "access_type": "scan",
                      "resulting_rows": 4,
                      "cost": 1.4,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 4,
                "cost_for_plan": 1.4,
                "rest_of_plan": [
                  {
                    "plan_prefix": [ # 當前左連線表為t1
                      "`t1`"
                    ],
                    "table": "`t3`",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 5,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 5,
                          "cost": 2.25005,
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 100, # 這裡的過濾係數變為100，說明不進行條件過濾，讀取全部行
                    "rows_for_plan": 20, # 由上面的9行變為全部資料連線，20行資料
                    "cost_for_plan": 3.65005, 
                    "chosen": true
                  }
                ]
              },
              {
                "plan_prefix": [
                ],
                "table": "`t3`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 5,
                      "access_type": "scan",
                      "resulting_rows": 5,
                      "cost": 0.75,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 5,
                "cost_for_plan": 0.75,
                "rest_of_plan": [
                  {
                    "plan_prefix": [
                      "`t3`"
                    ],
                    "table": "`t1`",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 4,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 4,
                          "cost": 3.00776,
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 100,
                    "rows_for_plan": 20,
                    "cost_for_plan": 3.75776,
                    "pruned_by_cost": true
                  }
                ]
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`t1`",
                  "attached": null
                },
                {
                  "table": "`t3`",
                  "attached": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
                }
              ]
            }
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`t3`",
                "original_table_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
                "final_table_condition   ": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
              }
            ]
          },
          {
            "refine_plan": [
              {
                "table": "`t1`"
              },
              {
                "table": "`t3`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}

現在把之前03的內容拿過來回顧一下當時的結果，看看為何是當時的結果。從下面結果可以看出，當存在keyuse的時候，最佳化器最後走哪條索引是透過find_best_ref()函式來決定的。

greatsql> EXPLAIN SELECT * FROM t1 join t2 ON t1.c1=t2.cc1 and t1.c1<5;
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref       | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | range  | PRIMARY       | PRIMARY | 4       | NULL      |    3 |   100.00 | Using where |
|  1 | SIMPLE      | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | db1.t1.c1 |    1 |   100.00 | NULL        |
                                   # t2這個結果看下面的解釋
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+

          {\
            "considered_execution_plans": [\
              {\
                "plan_prefix": [\
                ],\
                "table": "`t1`",\
                "best_access_path": {\
                  "considered_access_paths": [\
                    {\
                      "access_type": "ref",\ # t1表不選擇"ref"方式，因為cost比"range"方式大
                      "index": "PRIMARY",\
                      "usable": false,\
                      "chosen": false\
                    },\
                    {\
                      "rows_to_scan": 3,\
                      "filtering_effect": [\
                      ],\
                      "final_filtering_effect": 1,\
                      "access_type": "range",\ # t1表最後選擇了"range"方式進行掃描
                      "range_details": {\
                        "used_index": "PRIMARY"\
                      },\
                      "resulting_rows": 3,\
                      "cost": 0.860732,\
                      "chosen": true\
                    }\
                  ]\
                },\
                "condition_filtering_pct": 100,\
                "rows_for_plan": 3,\
                "cost_for_plan": 0.860732,\
                "rest_of_plan": [\
                "rest_of_plan": [\
                  {\
                    "plan_prefix": [\
                      "`t1`"\
                    ],\
                    "table": "`t2`",\
                    "best_access_path": {\
                      "considered_access_paths": [\
                        {\
                          "access_type": "eq_ref",\ # 在t1 join t2的連線方式裡，判斷用"eq_ref"方式掃描cost更小。這裡透過find_best_ref()獲取
                          "index": "PRIMARY",\
                          "rows": 1,\
                          "cost": 1.05,\
                          "chosen": true,\
                          "cause": "clustered_pk_chosen_by_heuristics"\
                        },\
                        {\
                          "access_type": "range",\
                          "range_details": {\
                            "used_index": "PRIMARY"\
                          },\
                          "chosen": false,\
                          "cause": "heuristic_index_cheaper"\
                        }\
                      ]\
                    },\
                    "condition_filtering_pct": 100,\
                    "rows_for_plan": 3,\
                    "cost_for_plan": 1.91073,\
                    "chosen": true\
                  }\
          {\
            "refine_plan": [\ # 根據最後比較結果，按照t1,t2方式連線cost最小，因此選擇這種順序進行join
              {\
                "table": "`t1`"\
              },\
              {\
                "table": "`t2`"\
              }\
            ]\
          }\

四、總結

從上面最佳化器最早的步驟我們認識了條件過濾condition_fanout_filter的定義以及使用方法，也學會了計算join時候的cost，然後根據cost選擇最優的連線順序。回到開頭提到的這個功能在一些場景有可能會帶來選擇錯誤的問題，這個下一期展開細講。用這種過濾方法算出來的結果也不精確，GreatSQL 用了另一個功能直方圖來做資料統計，讓結果更加精確，這個直方圖的分析放後面章節講。

Enjoy GreatSQL 😃

關於 GreatSQL

GreatSQL是適用於金融級應用的國內自主開源資料庫，具備高效能、高可靠、高易用性、高安全等多個核心特性，可以作為MySQL或Percona Server的可選替換，用於線上生產環境，且完全免費併相容MySQL或Percona Server。

相關連結： GreatSQL社群 Gitee GitHub Bilibili

GreatSQL社群：

社群部落格有獎徵稿詳情：https://greatsql.cn/thread-100-1-1.html

技術交流群：

微信：掃碼新增GreatSQL社群助手微信好友，傳送驗證資訊加群。

條件過濾檢索
2024-11-07
最佳化數倉業務檢視：過濾條件傳遞
2023-02-08
SpamSieve for mac(郵件過濾器)
2020-11-03
Mac過濾器
java8 多條件的filter過濾
2024-05-22
JavaFilter
MySQL全面瓦解7：查詢的過濾條件
2020-11-09
MySql
《MySQL 入門教程》第 08 篇過濾條件
2022-01-30
MySql
[20210420]19c奇怪的過濾條件.txt
2021-04-20
Stream中filter過濾條件問題記錄
2020-12-04
Filter
34. 過濾條件、多表查詢、子查詢
2024-10-28
【TcaplusDB知識庫】條件過濾說明與更新
2021-11-23
【GreatSQL最佳化器-01】const_table
2024-11-08
SQL
mysql多條件過濾查詢之mysq高階查詢
2021-12-30
MySql
帶你讀 MySQL 原始碼：where 條件怎麼過濾記錄？
2023-04-25
MySql原始碼
【GreatSQL最佳化器-02】索引和Sargable謂詞
2024-11-15
SQL索引
【GreatSQL最佳化器-03】查詢開銷估算
2024-11-20
SQL
過濾器
2024-03-07
過濾器
SpamSieve for mac(郵件過濾工具)
2021-10-15
Mac
4、過濾器的使用及自定義過濾器
2024-03-21
過濾器
點雲濾波器與過濾器
2022-09-13
過濾器
CAN過濾器
2018-06-23
過濾器
Filter過濾器
2020-10-26
Filter過濾器
vue 過濾器
2020-11-02
Vue過濾器
NetCore過濾器
2019-04-24
NetCore過濾器
代理過濾器
2019-04-22
過濾器
Vue過濾器
2024-09-03
Vue過濾器
DataV過濾器
2024-05-24
過濾器
hbase過濾器
2021-12-06
過濾器
Xor過濾器：比布隆Bloom過濾器更快，更小
2019-12-20
過濾器OOM
JS中的陣列過濾，從簡單篩選到多條件篩選
2018-10-15
JS陣列
SpringBoot整合Jpa對資料進行排序、分頁、條件查詢和過濾
2023-05-12
Spring Boot排序
Spring Cloud Gateway ---GatewayFilter過濾器、過濾器工廠(入門)
2019-03-12
SpringCloudGatewayFilter過濾器
PHP 過濾器（Filter）
2018-11-25
PHP過濾器Filter
vue---過濾器
2018-09-07
Vue過濾器
Vue中過濾器
2020-03-16
Vue過濾器
vue filters過濾器
2018-03-07
VueFilter過濾器
布隆過濾器
2021-12-31
過濾器
SpringSecurity過濾器原理
2021-12-02
SpringGse過濾器
GreatSQL最佳化技巧：半連線(semijoin)最佳化
2024-04-17
SQL

【GreatSQL最佳化器-05】條件過濾condition_fanout_filter

【GreatSQL最佳化器-05】條件過濾condition_fanout_filter

一、condition_fanout_filter介紹

二、best_access_path程式碼解釋

三、實際例子說明

四、總結

關於 GreatSQL

GreatSQL社群：

技術交流群：

相關文章