【GreatSQL最佳化器-05】條件過濾condition_fanout_filter

GreatSQL發表於2024-12-06

【GreatSQL最佳化器-05】條件過濾condition_fanout_filter

一、condition_fanout_filter介紹

GreatSQL 的最佳化器對於 join 的表需要根據行數和 cost 來確定最後哪張表先執行哪張表後執行,這裡面就涉及到預估滿足條件的表資料,condition_fanout_filter 會根據一系列方法計算出一個資料過濾百分比,這個比百分比就是 filtered 係數,這個值區間在[0,1],值越小代表過濾效果越好。用這個係數乘以總的行數就可以得出最後需要掃描的錶行數的數量,可以大幅節省開銷和執行時間。

這個功能是由 OPTIMIZER_SWITCH_COND_FANOUT_FILTER這個OPTIMIZER_SWITCH 來控制的,預設是開啟的。因此一般情況下不需要特意去關閉,但是如果遇到執行特別慢的一些情況可以考慮關閉。

下面用一個簡單的例子來說明 condition_fanout_filter 是什麼:

CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 INT,date1 DATETIME);
INSERT INTO t1 VALUES (1,10,'2021-03-25 16:44:00.123456'),(2,1,'2022-03-26 16:44:00.123456'),(3,4,'2023-03-27 16:44:00.123456'),(5,5,'2024-03-25 16:44:00.123456');
CREATE TABLE t2 (cc1 INT PRIMARY KEY, cc2 INT);
INSERT INTO t2 VALUES (1,3),(2,1),(3,2),(4,3),(5,15);
# 為了檢視過濾係數,需要建立一張沒有主鍵的表用來做過濾估計。
CREATE TABLE t3 (ccc1 INT, ccc2 varchar(100));
INSERT INTO t3 VALUES (1,'aa1'),(2,'bb1'),(3,'cc1'),(4,'dd1'),(null,'ee');
CREATE INDEX idx1 ON t1(c2);
CREATE INDEX idx2 ON t1(c2,date1);
CREATE INDEX idx2_1 ON t2(cc2);
CREATE INDEX idx3_1 ON t3(ccc1);

看到下面的 t3 的過濾百分比46.66%,意味著兩張表 join 一共20行,因為 t3 的過濾百分比為 46.66%,因此最後只需要讀取 20*46.66%=9 行資料。

注意,這裡沒有建立直方圖,因此結果不包含直方圖過濾的因素,關於直方圖後面會專門開一章講。

greatsql> EXPLAIN SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1 <3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY       | idx2 | 11      | NULL |    4 |   100.00 | Using index                                |
|  1 | SIMPLE      | t3    | NULL       | ALL   | idx3_1        | NULL | NULL    | NULL |    5 |    46.66 | Using where; Using join buffer (hash join) |
# 這裡顯示的值就是 t3 的過濾資料百分比
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+

greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 JOIN t3 ON t1.c1=t3.ccc1 OR t3.ccc1<3;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3))  (cost=3.65 rows=9)
    -> Inner hash join (no condition)  (cost=3.65 rows=9) # 這裡結果為讀取9行資料,跟上面算出來的資料一致
        -> Table scan on t3  (cost=0.12 rows=5)
        -> Hash
            -> Index scan on t1 using idx2  (cost=1.40 rows=4)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

二、best_access_path程式碼解釋

condition_fanout_filte的計算在 best_access_path函式實現,用來預估不同 join 連線時候的 join 表的讀取行數和可能的 cost。

void Optimize_table_order::best_access_path(JOIN_TAB *tab,
                                            const table_map remaining_tables,
                                            const uint idx, bool disable_jbuf,
                                            const double prefix_rowcount,
                                            POSITION *pos) {
# 如果根據前面的結果keyuse_array陣列有值的話,那麼根據find_best_ref()函式先找出最優索引,按照索引的方式計算cost
  if (tab->keyuse() != nullptr &&
      (table->file->ha_table_flags() & HA_NO_INDEX_ACCESS) == 0)
    best_ref =
        find_best_ref(tab, remaining_tables, idx, prefix_rowcount,
                      &found_condition, &ref_depend_map, &used_key_parts);
  # 最主要計算下面3個值
  pos->filter_effect = filter_effect = std::  (1.0, tab->found_records * calculate_condition_filter() / rows_after_filtering);
  pos->rows_fetched = rows_fetched = rows_after_filtering;
  pos->read_cost = scan_read_cost = calculate_scan_cost();    
}

下面是程式碼裡面涉及的計算公式,這裡是 keyuse_array 陣列為空的情況,也就是掃描方式 "access_type" 非 "eq_ref" 和 "ref" 的情況,或者說是沒有找到最優索引的情況。keyuse_array 陣列有值的情況,在函式find_best_ref()計算,結果有可能也走下面的計算方式,詳細在後面的章節詳細介紹。

關鍵引數 解釋
rows_fetched 總共需要讀取多少行 rows_after_filtering 見下表一
filter_effect 條件過濾百分比係數 std::min(1.0, tab->found_records * calculate_condition_filter() / rows_after_filtering) 見下表一和二
read_cost 讀的開銷 calculate_scan_cost() 見下表四
prefix_rowcount join左表的行數,意味著多少行會被join到右表 對於第一張表來說prefix_rowcount=1 第一張表:prefix_rowcount = rows_fetched * filter_effect 非第一張表:prefix_rowcount = 上一張表的prefix_rowcount * rows_fetched * filter_effect
prefix_cost join左表的cost,row_evaluate_cost()計算公式=0.1 * 行數,0.1是讀一行的開銷 第一張表:read_cost + cm->row_evaluate_cost(prefix_rowcount) 非第一張表:上一張表的prefix_cost + read_cost + cm->row_evaluate_cost(上一張表的prefix_rowcount*rows_fetched)

表一,rows_after_filtering 計算方式

場景 解釋
OPTIMIZER_SWITCH_COND_FANOUT_FILTER模式(預設ON) 條件過濾模式開啟 tab->found_records * calculate_condition_filter() 見下表二
table->quick_condition_rows != tab->found_records 透過別的方法獲取的表滿足的行數 table->quick_condition_rows
keyuse_array有值(參考前面<<最佳化器02>>) REF掃描模式 tab->found_records * 0.75
以上都不符合。預設情況 全錶行數 tab->found_records

表二,calculate_condition_filter() 計算方式

場景 說明
滿足條件的表的行數為0 filter = COND_FILTER_ALLPASS 見表三
使用索引 filter = filter * std::min(table->quick_rows[keyno] / tab->records() , 1.0)
帶有條件並且條件裡涉及不含有索引的列 filter = filter * Item_cond->get_filtering_effect() 見表三
合併結果 filter = max(filter, 1.0 / tab->records())
(filter * fanout) < 0.05 filter = 0.05 / fanout fanout是滿足條件的表的行數

注:filter是條件過濾百分比,這個值區間在[0,1],這個值越小代表過濾效果越好

表三,get_filtering_effect() 算出來的過濾係數

Item的係數 解釋 係數 說明
COND_FILTER_ALLPASS Always true 1.0f 代表全部資料都符合,全部都要掃描
COND_FILTER_EQUALITY col1 = col2 0.1f 代表預估10%資料符合條件
COND_FILTER_INEQUALITY col1 > col2 0.3333f 代表預估1/3資料符合條件
COND_FILTER_BETWEEN col1 BETWEEN a AND b 0.1111f 代表預估1/9資料符合條件

表四,calculate_scan_cost() 計算方式

場景 說明
如果是範圍掃描 prefix_rowcount * (tab->range_scan()->cost + cost_model->row_evaluate_cost(tab->found_records - *rows_after_filtering))
非join buffer模式 prefix_rowcount * (single_scan_read_cost + cost_model->row_evaluate_cost(tab->records() - *rows_after_filtering)) single_scan_read_cost計算如下: force_index模式: table->file->read_cost(tab->ref().key, 1,tab->records() 見表五) 非force_index模式: table->file->table_scan_cost() 見表五
join buffer模式(預設ON) buffer_count * (single_scan_read_cost+ cost_model->row_evaluate_cost(tab->records() - *rows_after_filtering)) buffer_count計算如下: single_scan_read_cost計算如下: 1.0 + ((double)cache_record_length(join, idx) * prefix_rowcount / thd->variables.join_buff_size force_index模式: table->file->read_cost(tab->ref().key, 1,tab->records()) 見表五 非force_index模式: table->file->table_scan_cost() 見表五 預設這個模式開啟

表五 引擎計算相關

場景 說明
table->file->read_cost(index, ranges, rows) read_time(index, ranges, rows) *table->cost_model()->page_read_cost(1.0) index:index序號,ranges:range範圍,rows:錶行數
table->file->table_scan_cost() (stats.data_file_length / IO_SIZE + 2) * table->cost_model()->page_read_cost(1.0) IO_SIZE=4096

三、實際例子說明

接下來看一個例子來說明上面的程式碼。這裡的例子不涉及keyuse_array陣列有值的情況。

greatsql> SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+----+------+---------------------+------+------+
| c1 | c2   | date1               | ccc1 | ccc2 |
+----+------+---------------------+------+------+
|  1 |   10 | 2021-03-25 16:44:00 |    1 | aa1  |
|  5 |    5 | 2024-03-25 16:44:00 |    1 | aa1  |
|  3 |    4 | 2023-03-27 16:44:00 |    1 | aa1  |
|  2 |    1 | 2022-03-26 16:44:00 |    1 | aa1  |
|  1 |   10 | 2021-03-25 16:44:00 |    2 | bb1  |
|  5 |    5 | 2024-03-25 16:44:00 |    2 | bb1  |
|  3 |    4 | 2023-03-27 16:44:00 |    2 | bb1  |
|  2 |    1 | 2022-03-26 16:44:00 |    2 | bb1  |
|  3 |    4 | 2023-03-27 16:44:00 |    3 | cc1  |
+----+------+---------------------+------+------+

greatsql> EXPLAIN FORMAT=TREE SELECT * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                       |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3))  (cost=3.65 rows=9)
    -> Inner hash join (no condition)  (cost=3.65 rows=9)
        -> Table scan on t3  (cost=0.12 rows=5)
        -> Hash
            -> Index scan on t1 using idx2  (cost=1.40 rows=4)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

greatsql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 4,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 4,# 這個值就是rows_after_filtering
                      "cost": 0.65, # 計算公式=read_cost(0.25)+cost_model->row_evaluate_cost(1 * rows_after_filtering)
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100, # 這個值就是filter_effect * 100
                "rows_for_plan": 4, # 這個值就是prefix_rowcount=rows_to_scan * filter_effect / 100
                "cost_for_plan": 0.65, # 這個值就是prefix_cost=read_cost(0.25) + cm->row_evaluate_cost(prefix_rowcount=4)
                "rest_of_plan": [
                  {
                    "plan_prefix": [ # 當前左連線表為t1
                      "`t1`"
                    ],
                    "table": "`t3`", # 右連線表為t3
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 5,
                          "filtering_effect": [
                          ],
                          "final_filtering_effect": 1,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 5, # 這個值就是rows_after_filtering
                          "cost": 2.25005, # 計算公式read_cost(0.25)+cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering=4 * 5)
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 46.664, # 這個值計算過程見下面<<附錄:t3的filter_effect值計算>> ※bug?用explain看到的和直接敲命令看到的不一樣,前者46.664後者為100,而"rows_for_plan"會變為沒有過濾的20
                    "rows_for_plan": 9.3328, # 這個值就是prefix_rowcount=4(上一張表的prefix_rowcount) * 5 * 46.664 / 100
                    "cost_for_plan": 2.90005, # 這個值就是prefix_cost=0.65(上一張表的prefix_cost)+read_cost(0.25) + cm->row_evaluate_cost(20=4*5)
                    "chosen": true
                  }
                ]
              },
              {
                "plan_prefix": [
                ],
                "table": "`t3`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 5,
                      "filtering_effect": [
                      ],
                      "final_filtering_effect": 1,
                      "access_type": "scan",
                      "resulting_rows": 5,
                      "cost": 0.75, # 計算公式read_cost(0.25)+5行*0.1
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 5, # 這個值就是prefix_rowcount
                "cost_for_plan": 0.75, # 這個值就是prefix_cost
                "rest_of_plan": [
                  {
                    "plan_prefix": [ # 當前左連線表為t3
                      "`t3`"
                    ],
                    "table": "`t1`", # 右連線表為t1
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 4,
                          "filtering_effect": [
                          ],
                          "final_filtering_effect": 1,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 4,
                          "cost": 3.00776, # 計算公式read_cost(1.007)+cost_model->row_evaluate_cost(prefix_rowcount * rows_after_filtering=5 * 4)
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 100,
                    "rows_for_plan": 20, # 這個值就是prefix_rowcount=5(上一張表的prefix_rowcount) * 4 * 100 / 100
                    "cost_for_plan": 3.75776, # 這個值就是prefix_cost=0.75(上一張表的prefix_cost)+read_cost(1.007) + cm->row_evaluate_cost(20=5*4)
                    "pruned_by_cost": true # 因為這裡算出來的3.75776 > 2.90005,因此被裁剪掉了
                  }
                ]
              }
            ]
          },
          {
            "attaching_conditions_to_tables": { # 新增一些附加條件
              "original_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [ # t1作為驅動表,執行全表掃描,因此不需要任何條件
                {
                  "table": "`t1`",
                  "attached": null
                },
                {
                  "table": "`t3`",
                  "attached": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))" #  t3作為連線表,按照條件過濾
                }
              ]
            }
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`t3`",
                "original_table_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
                "final_table_condition   ": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
              }
            ]
          },
          {
            "refine_plan": [
              {
                "table": "`t1`"
              },
              {
                "table": "`t3`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
} |           

# 附錄:t3的filter_effect值計算
# 這個函式計算t1.c1=t3.ccc1和t3.ccc1<3的filter_effect值,計算過程見下面
float Item_cond_or::get_filtering_effect(THD *thd, table_map filter_for_table,
                                         table_map read_tables,
                                         const MY_BITMAP *fields_to_ignore,
                                         double rows_in_table) {
  while ((item = it++)) {
    const float cur_filter = item->get_filtering_effect(
        thd, filter_for_table, read_tables, fields_to_ignore, rows_in_table);
    # 第一次:計算t1.c1=t3.ccc1,返回 1/5=20%,其中5是總行數。
    # 第二次:計算t3.ccc1<3,返回COND_FILTER_INEQUALITY=0.333
    # 最後的filter=0.2+0.333 - (0.2*0.333)=0.4664
    filter = filter + cur_filter - (filter * cur_filter);
  }
  return filter;
}

下面把OPTIMIZER_SWITCH_COND_FANOUT_FILTER模式關閉看看執行效果什麼變化

greatsql> explain SELECT /*+ set_var(optimizer_switch='condition_fanout_filter=off') */ * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                      |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY       | idx2 | 11      | NULL |    4 |   100.00 | Using index                                |
|  1 | SIMPLE      | t3    | NULL       | ALL   | idx3_1        | NULL | NULL    | NULL |    5 |   100.00 | Using where; Using join buffer (hash join) |
                                                                                            # 看到這裡選擇了不做條件過濾,全表資料連線
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+--------------------------------------------+

greatsql> EXPLAIN FORMAT=TREE SELECT /*+ set_var(optimizer_switch='condition_fanout_filter=off') */ * FROM t1 join t3 ON t1.c1=t3.ccc1 or t3.ccc1<3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((t3.ccc1 = t1.c1) or (t3.ccc1 < 3))  (cost=3.65 rows=20)# 因為表裡資料少,跟剛才比cost沒有變化,但是要是資料大的話就很明顯
    -> Inner hash join (no condition)  (cost=3.65 rows=20) # 這裡變為了20行,意味著讀取全部行資料
        -> Table scan on t3  (cost=0.19 rows=5)
        -> Hash
            -> Index scan on t1 using idx2  (cost=1.40 rows=4)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`t1`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 4,
                      "access_type": "scan",
                      "resulting_rows": 4,
                      "cost": 1.4,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 4,
                "cost_for_plan": 1.4,
                "rest_of_plan": [
                  {
                    "plan_prefix": [ # 當前左連線表為t1
                      "`t1`"
                    ],
                    "table": "`t3`",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 5,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 5,
                          "cost": 2.25005,
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 100, # 這裡的過濾係數變為100,說明不進行條件過濾,讀取全部行
                    "rows_for_plan": 20, # 由上面的9行變為全部資料連線,20行資料
                    "cost_for_plan": 3.65005, 
                    "chosen": true
                  }
                ]
              },
              {
                "plan_prefix": [
                ],
                "table": "`t3`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 5,
                      "access_type": "scan",
                      "resulting_rows": 5,
                      "cost": 0.75,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 5,
                "cost_for_plan": 0.75,
                "rest_of_plan": [
                  {
                    "plan_prefix": [
                      "`t3`"
                    ],
                    "table": "`t1`",
                    "best_access_path": {
                      "considered_access_paths": [
                        {
                          "rows_to_scan": 4,
                          "access_type": "scan",
                          "using_join_cache": true,
                          "buffers_needed": 1,
                          "resulting_rows": 4,
                          "cost": 3.00776,
                          "chosen": true
                        }
                      ]
                    },
                    "condition_filtering_pct": 100,
                    "rows_for_plan": 20,
                    "cost_for_plan": 3.75776,
                    "pruned_by_cost": true
                  }
                ]
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`t1`",
                  "attached": null
                },
                {
                  "table": "`t3`",
                  "attached": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
                }
              ]
            }
          },
          {
            "finalizing_table_conditions": [
              {
                "table": "`t3`",
                "original_table_condition": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))",
                "final_table_condition   ": "((`t3`.`ccc1` = `t1`.`c1`) or (`t3`.`ccc1` < 3))"
              }
            ]
          },
          {
            "refine_plan": [
              {
                "table": "`t1`"
              },
              {
                "table": "`t3`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}  

現在把之前03的內容拿過來回顧一下當時的結果,看看為何是當時的結果。從下面結果可以看出,當存在keyuse的時候,最佳化器最後走哪條索引是透過find_best_ref()函式來決定的。

greatsql> EXPLAIN SELECT * FROM t1 join t2 ON t1.c1=t2.cc1 and t1.c1<5;
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref       | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | range  | PRIMARY       | PRIMARY | 4       | NULL      |    3 |   100.00 | Using where |
|  1 | SIMPLE      | t2    | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | db1.t1.c1 |    1 |   100.00 | NULL        |
                                   # t2這個結果看下面的解釋
+----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+

          {\
            "considered_execution_plans": [\
              {\
                "plan_prefix": [\
                ],\
                "table": "`t1`",\
                "best_access_path": {\
                  "considered_access_paths": [\
                    {\
                      "access_type": "ref",\ # t1表不選擇"ref"方式,因為cost比"range"方式大
                      "index": "PRIMARY",\
                      "usable": false,\
                      "chosen": false\
                    },\
                    {\
                      "rows_to_scan": 3,\
                      "filtering_effect": [\
                      ],\
                      "final_filtering_effect": 1,\
                      "access_type": "range",\ # t1表最後選擇了"range"方式進行掃描
                      "range_details": {\
                        "used_index": "PRIMARY"\
                      },\
                      "resulting_rows": 3,\
                      "cost": 0.860732,\
                      "chosen": true\
                    }\
                  ]\
                },\
                "condition_filtering_pct": 100,\
                "rows_for_plan": 3,\
                "cost_for_plan": 0.860732,\
                "rest_of_plan": [\
                "rest_of_plan": [\
                  {\
                    "plan_prefix": [\
                      "`t1`"\
                    ],\
                    "table": "`t2`",\
                    "best_access_path": {\
                      "considered_access_paths": [\
                        {\
                          "access_type": "eq_ref",\ # 在t1 join t2的連線方式裡,判斷用"eq_ref"方式掃描cost更小。這裡透過find_best_ref()獲取
                          "index": "PRIMARY",\
                          "rows": 1,\
                          "cost": 1.05,\
                          "chosen": true,\
                          "cause": "clustered_pk_chosen_by_heuristics"\
                        },\
                        {\
                          "access_type": "range",\
                          "range_details": {\
                            "used_index": "PRIMARY"\
                          },\
                          "chosen": false,\
                          "cause": "heuristic_index_cheaper"\
                        }\
                      ]\
                    },\
                    "condition_filtering_pct": 100,\
                    "rows_for_plan": 3,\
                    "cost_for_plan": 1.91073,\
                    "chosen": true\
                  }\
          {\
            "refine_plan": [\ # 根據最後比較結果,按照t1,t2方式連線cost最小,因此選擇這種順序進行join
              {\
                "table": "`t1`"\
              },\
              {\
                "table": "`t2`"\
              }\
            ]\
          }\

四、總結

從上面最佳化器最早的步驟我們認識了條件過濾condition_fanout_filter的定義以及使用方法,也學會了計算join時候的cost,然後根據cost選擇最優的連線順序。回到開頭提到的這個功能在一些場景有可能會帶來選擇錯誤的問題,這個下一期展開細講。用這種過濾方法算出來的結果也不精確,GreatSQL 用了另一個功能直方圖來做資料統計,讓結果更加精確,這個直方圖的分析放後面章節講。


Enjoy GreatSQL 😃

關於 GreatSQL

GreatSQL是適用於金融級應用的國內自主開源資料庫,具備高效能、高可靠、高易用性、高安全等多個核心特性,可以作為MySQL或Percona Server的可選替換,用於線上生產環境,且完全免費併相容MySQL或Percona Server。

相關連結: GreatSQL社群 Gitee GitHub Bilibili

GreatSQL社群:

社群部落格有獎徵稿詳情:https://greatsql.cn/thread-100-1-1.html

image-20230105161905827

技術交流群:

微信:掃碼新增GreatSQL社群助手微信好友,傳送驗證資訊加群

image-20221030163217640

相關文章