MYSQL 5.6 5.7處理資料分佈不均的問題(eq_range_index_dive_limit引數)
處理資料分佈不均,orace資料庫使用額外的統計資料直方圖來完成,而MYSQL
中統計資料只有索引的不同值這樣一個統計資料,那麼我們製出如下資料:
mysql> select * from test.testf;
+------+----------+
| id | name |
+------+----------+
| 1 | gaopeng |
| 2 | gaopeng1 |
| 3 | gaopeng1 |
| 4 | gaopeng1 |
| 5 | gaopeng1 |
| 6 | gaopeng1 |
| 7 | gaopeng1 |
| 8 | gaopeng1 |
| 9 | gaopeng1 |
| 10 | gaopeng1 |
+------+----------+
10 rows in set (0.00 sec)
name 上有一個普通二級索引
mysql> analyze table test.testf;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.testf | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.21 sec)
分別作出如下執行計劃:
mysql> explain select * from test.testf where name='gaopeng';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | testf | NULL | ref | name | name | 63 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test.testf where name='gaopeng1';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testf | NULL | ALL | name | NULL | NULL | NULL | 10 | 90.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
可以看到執行計劃是正確的,name='gaopeng'的只有一行選擇了索引,name='gaopeng1'的有9行走了全表。
按理說如果只是記錄不同的那麼這兩個語句的選擇均為1/2,應該會造成執行計劃錯誤,而MYSQL 5.6 5.7中
都做了正確的選擇,那是為什麼呢?
其實原因就在於 eq_range_index_dive_limit這個引數,我們來看一下trace
T@2: | | | | | | | | | | | opt: (null): "gaopeng1 <= name <= | T@3: | | | | | | | | | | | opt: (null): "gaopeng <= name <= g
T@2: | | | | | | | | | | | opt: ranges: ending struct | T@3: | | | | | | | | | | | opt: ranges: ending struct
T@2: | | | | | | | | | | | opt: index_dives_for_eq_ranges: 1 | T@3: | | | | | | | | | | | opt: index_dives_for_eq_ranges: 1
T@2: | | | | | | | | | | | opt: rowid_ordered: 1 | T@3: | | | | | | | | | | | opt: rowid_ordered: 1
T@2: | | | | | | | | | | | opt: using_mrr: 0 | T@3: | | | | | | | | | | | opt: using_mrr: 0
T@2: | | | | | | | | | | | opt: index_only: 0 | T@3: | | | | | | | | | | | opt: index_only: 0
T@2: | | | | | | | | | | | opt: rows: 9 | T@3: | | | | | | | | | | | opt: rows: 1
T@2: | | | | | | | | | | | opt: cost: 11.81 | T@3: | | | | | | | | | | | opt: cost: 2.21
我們可以看到 index_dives_for_eq_ranges均為1,rows: 9 rows: 1都是正確的,那麼可以確定是index_dives_for_eq_ranges的作用,實際上
這是一個引數eq_range_index_dive_limit來決定的(equality range optimization of many-valued comparisions),預設為
mysql> show variables like '%eq%';
+--------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------+-------+
| eq_range_index_dive_limit | 200 |
在官方文件說這個取值是等值範圍比較的時候有多少個需要比較的值
如:
id=1 or id=2 or id=3 那麼他取值就是3+1=4
而這種方法會得到精確的資料,但是增加的是時間成本,如果將
eq_range_index_dive_limit 設定為1:則禁用此功能
eq_range_index_dive_limit 設定為0:則始終開啟
eq_range_index_dive_limit 設定為N:則滿足N-1個這樣的域。
那麼我們設定為eq_range_index_dive_limit=1 後看看
mysql> explain select * from test.testf where name='gaopeng1';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | testf | NULL | ref | name | name | 63 | const | 5 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test.testf where name='gaopeng';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | testf | NULL | ref | name | name | 63 | const | 5 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
可以看到執行計劃已經錯誤 name='gaopeng1' 明顯不應該使用索引,我們再來看看trace
T@3: | | | | | | | | | | | opt: ranges: ending struct
T@3: | | | | | | | | | | | opt: index_dives_for_eq_ranges: 0
T@3: | | | | | | | | | | | opt: rowid_ordered: 1
T@3: | | | | | | | | | | | opt: using_mrr: 0
T@3: | | | | | | | | | | | opt: index_only: 0
T@3: | | | | | | | | | | | opt: rows: 5
T@3: | | | | | | | | | | | opt: cost: 7.01
index_dives_for_eq_ranges: 0 rows: 5這個5就是10*1/2導致的,而index_dives_for_eq_ranges=0就是禁用了
在5.7官方文件 p1231頁也有相應說明
中統計資料只有索引的不同值這樣一個統計資料,那麼我們製出如下資料:
mysql> select * from test.testf;
+------+----------+
| id | name |
+------+----------+
| 1 | gaopeng |
| 2 | gaopeng1 |
| 3 | gaopeng1 |
| 4 | gaopeng1 |
| 5 | gaopeng1 |
| 6 | gaopeng1 |
| 7 | gaopeng1 |
| 8 | gaopeng1 |
| 9 | gaopeng1 |
| 10 | gaopeng1 |
+------+----------+
10 rows in set (0.00 sec)
name 上有一個普通二級索引
mysql> analyze table test.testf;
+------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+------------+---------+----------+----------+
| test.testf | analyze | status | OK |
+------------+---------+----------+----------+
1 row in set (0.21 sec)
分別作出如下執行計劃:
mysql> explain select * from test.testf where name='gaopeng';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | testf | NULL | ref | name | name | 63 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test.testf where name='gaopeng1';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testf | NULL | ALL | name | NULL | NULL | NULL | 10 | 90.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
可以看到執行計劃是正確的,name='gaopeng'的只有一行選擇了索引,name='gaopeng1'的有9行走了全表。
按理說如果只是記錄不同的那麼這兩個語句的選擇均為1/2,應該會造成執行計劃錯誤,而MYSQL 5.6 5.7中
都做了正確的選擇,那是為什麼呢?
其實原因就在於 eq_range_index_dive_limit這個引數,我們來看一下trace
T@2: | | | | | | | | | | | opt: (null): "gaopeng1 <= name <= | T@3: | | | | | | | | | | | opt: (null): "gaopeng <= name <= g
T@2: | | | | | | | | | | | opt: ranges: ending struct | T@3: | | | | | | | | | | | opt: ranges: ending struct
T@2: | | | | | | | | | | | opt: index_dives_for_eq_ranges: 1 | T@3: | | | | | | | | | | | opt: index_dives_for_eq_ranges: 1
T@2: | | | | | | | | | | | opt: rowid_ordered: 1 | T@3: | | | | | | | | | | | opt: rowid_ordered: 1
T@2: | | | | | | | | | | | opt: using_mrr: 0 | T@3: | | | | | | | | | | | opt: using_mrr: 0
T@2: | | | | | | | | | | | opt: index_only: 0 | T@3: | | | | | | | | | | | opt: index_only: 0
T@2: | | | | | | | | | | | opt: rows: 9 | T@3: | | | | | | | | | | | opt: rows: 1
T@2: | | | | | | | | | | | opt: cost: 11.81 | T@3: | | | | | | | | | | | opt: cost: 2.21
我們可以看到 index_dives_for_eq_ranges均為1,rows: 9 rows: 1都是正確的,那麼可以確定是index_dives_for_eq_ranges的作用,實際上
這是一個引數eq_range_index_dive_limit來決定的(equality range optimization of many-valued comparisions),預設為
mysql> show variables like '%eq%';
+--------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------+-------+
| eq_range_index_dive_limit | 200 |
在官方文件說這個取值是等值範圍比較的時候有多少個需要比較的值
如:
id=1 or id=2 or id=3 那麼他取值就是3+1=4
而這種方法會得到精確的資料,但是增加的是時間成本,如果將
eq_range_index_dive_limit 設定為1:則禁用此功能
eq_range_index_dive_limit 設定為0:則始終開啟
eq_range_index_dive_limit 設定為N:則滿足N-1個這樣的域。
那麼我們設定為eq_range_index_dive_limit=1 後看看
mysql> explain select * from test.testf where name='gaopeng1';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | testf | NULL | ref | name | name | 63 | const | 5 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test.testf where name='gaopeng';
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | testf | NULL | ref | name | name | 63 | const | 5 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
可以看到執行計劃已經錯誤 name='gaopeng1' 明顯不應該使用索引,我們再來看看trace
T@3: | | | | | | | | | | | opt: ranges: ending struct
T@3: | | | | | | | | | | | opt: index_dives_for_eq_ranges: 0
T@3: | | | | | | | | | | | opt: rowid_ordered: 1
T@3: | | | | | | | | | | | opt: using_mrr: 0
T@3: | | | | | | | | | | | opt: index_only: 0
T@3: | | | | | | | | | | | opt: rows: 5
T@3: | | | | | | | | | | | opt: cost: 7.01
index_dives_for_eq_ranges: 0 rows: 5這個5就是10*1/2導致的,而index_dives_for_eq_ranges=0就是禁用了
在5.7官方文件 p1231頁也有相應說明
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7728585/viewspace-2134179/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- mysql5.7安裝及問題處理MySql
- MySQL 5.7資料庫引數優化MySql資料庫優化
- MySQL 5.6,5.7的優化器對於count(*)的處理方式MySql優化
- mongodb資料庫範圍分片資料分佈不均勻MongoDB資料庫
- MySQL:5.6 升級 5.7MySql
- mysql的處理能力問題MySql
- 【資料庫】mysql5.6升級至5.7(物理方式)資料庫MySql
- 資料處理--pandas問題
- 【MySQL】MySQL 5.6 引數之 extra_portMySql
- 資料採集,微軟控制元件分頁問題的處理微軟控制元件
- CentOS tengine mysql 5.7 php 5.6CentOSMySqlPHP
- MySQL自定義變數處理行號問題MySql變數
- mysql的處理能力問題(2)MySql
- 資料庫升級到mysql5.6遇到的資料表ENGINE問題資料庫MySql
- MySQL 5.7 複製的過濾引數MySql
- MYSQL 5.6 安裝時cmake引數MySql
- 分類任務中的樣本不均衡問題
- mysql問題處理兩則MySql
- mysql 問題處理二則MySql
- mysql 5.7 執行緒阻塞處理MySql執行緒
- mysql5.6主從引數詳解MySql
- mysqldump 引數詳解(基於MySQL 5.6)MySql
- 函式訪問引數受限的處理方法函式
- 在資料分佈嚴重不均的列上使用繫結變數容易錯過更好的執行計劃變數
- 資料清理的遺留問題處理(二)
- mysql的日誌引數修改的問題.MySql
- 如何解決迴歸任務資料不均衡的問題?
- 多工學習中的資料分佈問題(一)
- 多工學習中的資料分佈問題(二)
- ceph解讀:crush分散式資料分佈的問題分散式
- 處理分頁的result型別問題型別
- 關於在request請求時,處理請求引數的問題
- MySQL:亂碼問題處理流程MySql
- MySQL OOM問題處理一則MySqlOOM
- MySQL資料庫引數MySql資料庫
- 關於mysql5.6 的排序問題.MySql排序
- MyBatis 引數處理MyBatis
- 大資料處理需留意哪些問題大資料