mysql loose index scan的實現

myownstars發表於2015-02-05
 Loose index scan

各種資料關於loose index scan的解釋很拗口,其實等同於oracleindex skip scan

對於複合索引(sex, id),前導列sex的值只有兩個:MF

(`F',98)

(`F',100)

(`F',102)

(`F',104)

(`M',101)

(`M',103)

(`M',105)

則索引可以邏輯的分成兩個子索引

當執行select  * from t where id = N時,等價於

select  * from t where id = N and sex =‘M’

union

select  * from t where id = N and sex = 'F'

進而有效利用到索引進行查詢,
如果換成mysql,則只能執行全表掃描;
不考慮欄位選擇性的前提下,即便將sql寫為select  * from t where id = N and sex in(‘M’,'F'),pre-5.6在索引掃描時也不會使用到id列過濾,而是在讀取出行記錄後再進行id=N的篩選。



但是5.0之後group by在特定條件下可能使用到loose index scan,以下是來自http://explainextended.com/2010/05/08/max-and-min-on-a-composite-index/

的案例。

CREATE TABLE log_table (

id INT NOT NULL PRIMARY KEY,

log_machine VARCHAR(20) NOT NULL,

log_time DATETIME NOT NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE INDEX ix_log_machine_time ON log_table (log_machine, log_time);

1

SELECT  MAX(log_time) FROM log_table

SELECT MAX(log_time) FROM log_table WHERE log_machine IN ('Machine 1')

這兩條sql都只需一次index seek便可返回,源於索引的有序排序,優化器意識到min/max位於最左/右塊,從而避免範圍掃描;

extra顯示Select tables optimized away 

2

SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1’,’Machine 2’,’Machine 3’,’Machine 4’);

執行計劃type range(extra顯示using where; using index),即執行索引範圍掃描,先讀取所有滿足log_machine約束的記錄,然後對其遍歷找出max value

改進

SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1’,’Machine 2’,’Machine 3’,’Machine 4’)  group by log_machine order by 1 desc limit 1;

這滿足group by選擇loose index scan的要求,執行計劃的extra顯示using index for group-by,執行效果等值於

SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1’)

Union

SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 2’)

…..

即對每個log_machine執行loose index scanrows從原來的82636下降為16(該表總共1,000,000條記錄)

Group by何時使用loose index scan?

適用條件:

1  針對單表操作

 Group by使用索引的最左字首列

3  只支援聚集函式min()/max()

4  Where條件出現的列必須為=constant操作 , 沒出現在group by中的索引列必須使用constant

5  不支援字首索引,即部分列索引 ,如index(c1(10))

執行計劃的extra應該顯示using index for group-by

假定表t1有個索引idx(c1,c2,c3)

SELECT c1, c2 FROM t1 GROUP BY c1, c2;

SELECT DISTINCT c1, c2 FROM t1;

SELECT c1, MIN(c2) FROM t1 GROUP BY c1;

SELECT c1, c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;

SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > const GROUP BY c1, c2;

SELECT c2 FROM t1 WHERE c1 < const GROUP BY c1, c2;

SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2

SELECT c1, c3 FROM t1 GROUP BY c1, c2;--無法使用鬆散索引

SELECT c1, c3 FROM t1  where c3= const GROUP BY c1, c2;則可以

緊湊索引掃描tight index scan

Group by在無法使用loose index scan,還可以選擇tight,若兩者都不可選,則只能藉助臨時表;

掃描索引時,須讀取所有滿足條件的索引鍵,要麼是全索引掃描,要麼是範圍索引掃描;

Group by的索引列不連續;或者不是從最左字首開始,但是where條件裡出現最左列;

SELECT c1, c2, c3 FROM t1 WHERE c2 = 'a' GROUP BY c1, c3;

SELECT c1, c2, c3 FROM t1 WHERE c1 = 'a' GROUP BY c2, c3;

http://blog.itpub.net/15480802/viewspace-757562/ 


5.6的改進

事實上,5.6index condition push down可以彌補loose index scan缺失帶來的效能損失。

KEY(age,zip)

mysql> explain SELECT  name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347);

+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+

| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows  | Extra       |

+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+

|  1 | SIMPLE      | people | range | age           | age  | 4       | NULL | 90556 | Using where |

+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+

1 row in set (0.01 sec)

根據key_len=4可以推測出sql只用到索引的第一列,即先通過索引查出滿足age (18,20)的行記錄,然後從server層篩選出滿足zip約束的行;

pre-5.6,對於複合索引,只有當引導列使用"="時才有機會在索引掃描時使用到後面的索引列。

mysql> explain SELECT  name FROM people WHERE age=18 AND zip IN (12345,12346, 12347);

+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+

| id | select_type | table  | type  | possible_keys | key  | key_len | ref  | rows | Extra       |

+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+

|  1 | SIMPLE      | people | range | age           | age  | 8       | NULL |    3 | Using where |

+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+

1 row in set (0.00 sec)

對比一下查詢效率

mysql> SELECT  sql_no_cache name FROM people WHERE age=19 AND zip IN (12345,12346, 12347);

+----------------------------------+

| name                             |

+----------------------------------+

| 888ba838661aff00bbbce114a2a22423 |

+----------------------------------+

1 row in set (0.06 sec)

mysql> SELECT  SQL_NO_CACHE name FROM people WHERE age BETWEEN 18 AND 22 AND zip IN (12345,12346, 12347);

+----------------------------------+

| name                             |

+----------------------------------+

| ed4481336eb9adca222fd404fa15658e |

| 888ba838661aff00bbbce114a2a22423 |

+----------------------------------+

2 rows in set (1 min 56.09 sec)

對於第二條sql,可以使用union改寫,

mysql> SELECT  name FROM people WHERE age=18 AND zip IN (12345,12346, 12347)

    -> UNION ALL

    -> SELECT  name FROM people WHERE age=19 AND zip IN (12345,12346, 12347)

    -> UNION ALL

    -> SELECT  name FROM people WHERE age=20 AND zip IN (12345,12346, 12347)

    -> UNION ALL

    -> SELECT  name FROM people WHERE age=21 AND zip IN (12345,12346, 12347)

    -> UNION ALL

-> SELECT  name FROM people WHERE age=22 AND zip IN (12345,12346, 12347);

mysql5.6引入了index condition pushdown,從優化器層面解決了此類問題。

http://www.percona.com/blog/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/

http://blog.itpub.net/15480802/viewspace-1117157/

 

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/15480802/viewspace-1427989/,如需轉載,請註明出處,否則將追究法律責任。

相關文章