Mysql最佳化器對in list的處理

Steven1981發表於2010-08-05

select * from table where id in (....)
這樣的查詢,是走範圍索引還是走等值索引?
select * from table where key_part1 in (....) and key_part2='XX';
這樣的查詢,第二部分還走不走索引?

測試目的,想知道,MYSQL對IN LIST是如何選擇執行計劃的;在單欄位索引和複合索引中;

[@more@]


mysql 5.1.40
os:rhel 5.4
engine=innodb
innodb_file_per_table


# 先來建立測試環境:
create table index_test ( id int auto_increment , col1 int ,col2 varchar(200) ,content varchar(500),primary key (id) ,key col1 (col1) ) engine=innodb default charset=latin1;
# repeat insert operation 12 times

insert into index_test (col1,col2) select @rownum:=@rownum+1,column_name from information_schema.COLUMNS c , (select @rownum:=0 ) id limit 500 ;


# 測試1:先測對主鍵的IN操作;

# 測試用例:
reset query cache; --清空QUERY_CAHCE
show status like 'Innodb_buffer_pool_read_requests' ; --用來查詢邏輯讀
select * from index_test where id in (2,10,1000,2000,9000);
show status like 'Innodb_buffer_pool_read_requests' ; --與前面的結果相減,就得到SQL執行所帶來的邏輯讀 ;
為了邏輯讀的準確性, 對同一個SQL你應該多跑幾次,以去掉物理讀 ;


: test 16:02:16> explain select * from index_test where id in (2,10,1000,2000);
+----+-------------+------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | index_test | range | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where |
+----+-------------+------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

# 從執行計劃看,走的是範圍條件;但我們看看實際情況 :
# 注意,為了減少篇幅,我把各個查詢的結果給刪減了。


select * from index_test where id in (2,10);
RESULTs: 2 rows
LIO  : 4

select * from index_test where id in (2,1000);
RESULTs: 2 rows
LIO  : 4

select * from index_test where id in (2,10,100);
RESULTs: 3 rows
LIO  : 6

select * from index_test where id in (2,10,1000,2000);
RESULTs: 4 rows
LIO  : 8

select * from index_test where id in (2,10,1000,2000,9000);
RESULTs: 5 rows
LIO  : 10

### 在這裡看到,邏輯讀根據IN LIST裡KEY的數量成線性增加,而沒有根據KEY值的大小變化,所以我們判斷,對主鍵的IN操作,其實都轉成了OR操作。


# 測試2:對非主鍵的IN操作;
# 測試用例:
reset query cache;
show status like 'Innodb_buffer_pool_read_requests' ;
select * from index_test where col1 in (100,500,300,400);
show status like 'Innodb_buffer_pool_read_requests' ;


: test 16:06:33> explain select * from index_test where col1 in (100,200);
+----+-------------+------------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | index_test | range | col1 | col1 | 5 | NULL | 24 | Using where |
+----+-------------+------------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

select * from index_test where col1 in (100,101);
RESULTs: 24 rows
LIO : 86

select * from index_test where col1 in (100,500);
RESULTs: 24 rows
LIO : 86

select * from index_test where col1 in (100,500,300);
RESULTs: 36 rows
LIO : 139

select * from index_test where col1 in (100,500,300,400);
RESULTs: 48 rows
LIO : 172

分析: 這個結果與測試1的結果是一樣的;


# 測試3:對複合索引的前列IN操作;
alter table index_test drop index col1 ,add index col1col2(col1,col2) ;
update index_test set content=concat(col2,col3,col1) ;

主要是測一下,索引的第一個欄位用IN後,最佳化器還會不會使用第二個欄位來進行索引搜尋;

: test 18:41:38> explain select content from index_test where col1 in (100,500,300,400) and col2='aaaa';
+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+
| 1 | SIMPLE | index_test | range | col1col2 | col1col2 | 208 | NULL | 4 | Using where |
+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+
1 row in set (0.00 sec)



select count(*) from index_test where col1 in (100,500,300,400) and col2='aaaa';
RESULTs: 0 rows
LIO : 24

select content from index_test where col1 in (100,500,300,400) and col2='aaaa';
RESULTs: 0 rows
LIO : 24

分析:
#我們發現,兩個查詢的邏輯讀是一樣,其實這已經表明最佳化器用上了索引的第二個欄位,在索引搜尋部分就完成了對COL2的過濾;

總結:MYSQL最佳化器對in list是轉成“or” 的“多個等值”查詢來處理的;並沒有轉成範圍查詢 ;

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/703656/viewspace-1036220/,如需轉載,請註明出處,否則將追究法律責任。

相關文章