通過例項來理解MySQL索引薦

索引的使用

首先建立庫之後，並建立表，表結構如下：

mysql> createdatabase test1;

Query OK, 0 rowsaffected (0.01 sec)

mysql> use test1;

Database changed

mysql> createtable yw (

-> id int unsigned not nullauto_increment,

-> c1 int not null default `0`,

-> c2 int not null default `0`,

-> c3 int not null default `0`,

-> c4 int not null default `0`,

-> c5 timestamp not null,

-> c6 varchar(200) not null default “,

-> primary key(id)

-> );

Query OK, 0 rowsaffected (0.01 sec)

匯入sql檔案

內容如下

[root@mysql_node1test]# cat suoyin_test.sql

drop table yw; #已將剛才建立的庫刪除了，然後又重新建立了一個庫

create table yw (

id int unsigned notnull primary key auto_increment,

c1 int not nulldefault `0`,

c2 int not nulldefault `0`,

c3 int not nulldefault `0`,

c4 int not nulldefault `0`,

c5 timestamp notnull,

c6 varchar(200) notnull default “

);

delimiter $$

drop procedure ifexists `insert_yw` $$

create procedure`insert_yw`(in row_num int )

begin

declare i int default 0;

while i < row_num do

insert into yw(c1, c2, c3,c4, c5,c6) values(floor(rand()*row_num),floor(rand()*row_num),floor(rand()*row_num),floor(rand()*row_num),now(),repeat(`wubx`, floor(rand()*20)));

set i = i+1;

END while;

end$$

delimiter ;

#插入300W條資料

callinsert_yw(3000000);

delimiter $$

drop procedure ifexists `update_yw` $$

create procedure`update_yw`(in row_num int )

begin

declare i int default 0;

while i < row_num do

update yw set c3= floor(rand()*row_num) whereid=i;

set i = i+1;

END while;

end$$

delimiter ;

更改引數

mysql> set globalinnodb_flush_log_at_trx_commit=2

匯入資料表

mysql> source/root/test/suoyin.sql

Query OK, 0 rowsaffected (0.11 sec)

Query OK, 0 rowsaffected (0.01 sec)

Query OK, 0 rowsaffected (0.00 sec)

Query OK, 1 row affected(4 min 20.75 sec)

Query OK, 0 rowsaffected (0.00 sec)

我們會發現匯入很慢，當然300W條資料也不小，所以我們的問題來了：

為什麼這個查詢這麼慢？

mysql> select *from yw a, (select c2 from yw where id=10) b where a.c2 =b.c2;

+———+———+——–+———+———+———————+——————————————————————+——–+

| id | c1 | c2 | c3 | c4 | c5 | c6 | c2 |

+———+———+——–+———+———+———————+——————————————————————+——–+

| 10 | 2833881 | 185188 | 1424297 | 565924 | 2014-09-24 14:30:31 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubx | 185188 |

| 1530223 | 1345871 |185188 | 2888330 | 1886085 | 2014-09-24 14:32:44 | wubxwubxwubxwubxwubx |185188 |

| 1623964 | 1289414 |185188 | 57699 | 2732932 | 2014-09-2414:32:52 | wubxwubxwubxwubxwubxwubxwubxwubxwubx | 185188 |

| 2825263 | 729557 | 185188 | 1737273 | 2130798 |2014-09-24 14:34:37 | wubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubx| 185188 |

+———+———+——–+———+———+———————+——————————————————————+——–+

4 rows in set (7.28 sec)

經過最後檢視，顯示的是7.28秒執行完成，一個很簡單的查詢但是執行完後會很慢，

那麼這裡我們看到一個2825263, 那麼我們將sql改為一個簡單的sql並檢視

這是一個非常簡單的sql，如果在有索引的300w的資料，應該是非常快的，但實際上的表結構跑這樣的sql還是很慢的，如下所示，總共用了7.96秒，如下所示：

mysql> select * from yw where c1 = 2825263 ;

+———+———+———+———+———+———————+——————————————————+

| id | c1 | c2 | c3 | c4 | c5 | c6 |

+———+———+———+———+———+———————+——————————————————+

| 1421241 | 2825263 |2015825 | 1603339 | 1969218 | 2014-09-24 14:32:35 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubx |

+———+———+———+———+———+———————+——————————————————+

1 row in set (7.96 sec)

之所以慢，是因為在全表掃描而造成的

這種情況下可以對其進行新增索引進行優化

再次追加300萬條資料

使用call insert_表名進行新增

mysql> call insert_yw(3000000);

Query OK, 1 rowaffected (4 min 21.74 sec)

大概在7分鐘將索引新增將300萬條資料載入完畢

PS：在生產環境中都要模擬百萬條的資料去進行測試

建立完後檢視索引大小，大概476M左右

[root@mysql_node1test1]# ll -th

總用量 477M

-rw-rw—-. 1 mysqlmysql 476M 9月 24 14:49 yw.ibd

-rw-rw—-. 1 mysqlmysql 8.6K 9月 24 14:30 yw.frm

-rw-rw—-. 1 mysqlmysql 61 9月 24 14:00 db.opt

mysql> desc select* from yw a, (select c2 from yw where id=10) b where a.c2 =b.c2;

+—-+————-+————+——–+—————+———+———+——+———+————-+

+—-+————-+————+——–+—————+———+———+——+———+————-+

| 2 | DERIVED | yw | const | PRIMARY | PRIMARY | 4 | | 1 | |

+—-+————-+————+——–+—————+———+———+——+———+————-+

3 rows in set (0.00sec)

索引的使用

索引的簡介

索引實際上是Btree結構

有些生產環境上尤其是在主從環境下用不到索引的，從而使得主從延遲，當發現從庫延遲，要先去定位是否是從庫上有sql寫入的時間是否沒有用到索引，如果是的話則加索引即可

這類情況在排查主從結構的時候特別多

而delete也是支援索引的，如果不進行索引，那麼也會進行全表掃描

比如在某場景下我們要批量刪除大量資料，通常建議使用工具或儲存過程去分段（批量）刪除資料，比如：

deletefrom tb where addtime&get;xxxx and addtime<xxxx;

使用這樣的語句去按段刪除

通過索引可以讓update selecet delete 都可以實現到加速，但新增索引的話對寫入影響較重

主建是不能去執行update的，生產中是不應該對update做索引的因為update會將表重新組織一遍並進行btree重排序，所以會非常慢

建立索引，並將其進行對比

將之前建立的表改名並新增新所索引

mysql> renametable yw to yw_1;

ERROR 2006 (HY000):MySQL server has gone away

No connection. Tryingto reconnect…

Connection id: 12

Current database:test1

Query OK, 0 rowsaffected (0.11 sec)

mysql> showtables;

+—————–+

| Tables_in_test1 |

+—————–+

| yw_1 |

+—————–+

1 row in set (0.00sec)

新建表

create table yw (

id int unsigned not null auto_increment,

c1 int not null default `0`,

c2 int not null default `0`,

c3 int not null default `0`,

c4 int not null default `0`,

c5 timestamp not null,

c6 varchar(200) not null default “,

primary key(`id`),

KEY `idx_c2`(`c2`),

key `idx_c3`(`c3`)

);

Query OK, 0 rowsaffected (0.03 sec)

檢視錶結構

mysql> desc yw;

+——-+——————+——+—–+——————-+—————————–+

+——-+——————+——+—–+——————-+—————————–+

| id | int(10) unsigned | NO | PRI | NULL | auto_increment |

| c1 | int(11) | NO | | 0 | |

| c2 | int(11) | NO | MUL | 0 | |

| c3 | int(11) | NO | MUL | 0 | |

| c4 | int(11) | NO | | 0 | |

| c6 | varchar(200) | NO | | | |

+——-+——————+——+—–+——————-+—————————–+

7 rows in set (0.07sec)

[root@mysql_node1test]# cat 2.sql

delimiter $$

drop procedure ifexists `update_yw` $$

create procedure`update_yw`(in row_num int )

begin

declare i int default 0;

while i < row_numdo

update yw set c3=floor(rand()*row_num) where id=i;

set i = i+1;

END while;

end$$

delimiter ;

匯入

mysql> source/root/test/2.sql

Query OK, 0 rowsaffected (0.00 sec)

再次插入300W條記錄，檢視用時時間

mysql> callinsert_yw(3000000);

Query OK, 1 rowaffected (8 min 11.57 sec)

將之前備份的表還原並再次執行，這裡插入一百萬條資料

mysql> renametable yw to yw_idx;

Query OK, 0 rowsaffected (0.06 sec)

mysql> renametable yw_1 to yw;

Query OK, 0 rowsaffected (0.01 sec)

mysql> showtables;

+—————–+

| Tables_in_test1 |

+—————–+

| yw |

| yw_idx |

+—————–+

2 rows in set (0.00sec)

這樣一個是帶索引，一個是不帶索引的

再次呼叫包含索引的結構

mysql> call update_yw(3000000);

Query OK, 1 rowaffected (4 min 32.31 sec)

與之對比如下：

表名	是否所用索引	執行過程所耗時間
yw	否	11.57 sec
yw_idx	是	32.31 sec

兩者間速度相差3倍左右

如果存在c3的索引的話，那麼執行以下sql語句：

select c3 from yw where id=1;

發現同樣是很慢的，因為在發生更新第三列的時候同時需要做索引的維護

索引同樣是Btree結構，如果發生任何變更的時候，會將Btree更新，重新排序，這樣就會重新開銷所以會慢

MySQL支援什麼樣的索引？

一般來講都5類

普通索引，唯一索引，主建，組合索引，全文索引（mysql5.6的特性）

全文搜尋第三方工具：sphinx

建立索引：

create index idx_xxx ontb(xxx);

更新索引：

alter table tb add indexidx_xxx(xxx);

刪除索引：

DROP [ONLINE|OFFLINE]INDEX index_name ON tbl_name

普通索引一般包含字首索引，如果前端部分很長可以建立字首索引（前字元區分開來，減少一下索引的長度，讓掃起來更省點IO），如下所示：

alter table yw add indxidx_c6_6(c6(6));

普通索引是的列是可以NULL的

唯一索引：

在設計中屬於一種約束，在使用中，設定欄位唯一的，或者是聯合索引

例：

select * from table_nameidx_xxx=xxx;

如果在普通索引中，在記錄中探測到下下條再判斷是否是需要的記錄，如果是則返回，所以普通索引是要往下多度幾次，這是普通索引的開銷

但唯一索引，只做等於匹配，不會再往下進行，其好處是比較節省IO，

唯一索引列可以允許有NULL,但只能有一個

主建

Innodb裡聚集class index key以為所有的資料以主建排序儲存

主建是不允許有null列的

組合索引（聯合索引）：

也被稱為

select * from yw where c4=XXXX order by c3;

使用explain檢視執行效能

mysql> explainselect * from yw_idx where c3=251609 order by c4;

+—-+————-+——–+——+—————+——–+———+——-+——+—————————–+

+—-+————-+——–+——+—————+——–+———+——-+——+—————————–+

+—-+————-+——–+——+—————+——–+———+——-+——+—————————–+

1 row in set (0.03sec)

實際執行sql還是很慢，key_len為4,但是還會很慢，這種sql是忽悠人的，但是實際上possible_keys裡面是沒有東西的，這屬於一種欺騙性的所以需要注意

使用where條件判斷

如果是字首索引如果用到了c3 是否還可以繼續呼叫c4欄位

select * from yw where c3=xxx or c4=xxxx;

select * from yw where c3=xxx union all select * from yw where c4=xxxx;

可以看到是不能呼叫的，因為在這個條件裡面，c3是可以用到的，而c4是不行的，因為c4是全表掃描的，如有一個地方需要全表掃描的話，那麼不管如何都是需要全表掃描，這也是mysql的一個特性

如果是獨立的欄位，將c3和c4獨立出來，則可以使用索引

以下是沒有意義的索引

select count(*) from yw group by c3, c4;

使用兩個欄位獨立索引都被進行呼叫

使用多索引合併

在5.5版本以上可以使用union 進行多索引合併

mysql> select *from yw where c3=xxx union all select * from yw where c4=xxxx;

這樣c3 c4都有索引，這樣的話sql是非常快的

如果使用以下sql語句：

mysql> selectcount(*) from yw group by c3, c4;

2999999 rows in set(57.26 sec)

雖然會用到索引，但是還是會全表掃描，因為掃描的IO過大，用到索引意義也不是很大

如果看到結果集超過一萬行，都可以認為這個sql是可以殺掉了

只要結果集超過1萬行（OLTP）環境，都可以認為這個SQL是有問題的

所以，最好控制結果集查詢大小超過500，這樣就可以避免過大全表掃描，避免IO過高

使用limit

如果我們為其後面加入limit 10 來檢視效果

mysql> explainselect count(*) from yw_idx group byc3,c4 order by id limit 10;

+—-+————-+——–+——+—————+——+———+——+———+———————————+

+—-+————-+——–+——+—————+——+———+——+———+———————————+

+—-+————-+——–+——+—————+——+———+——+———+———————————+

1 row in set (0.00sec)

速度並沒有提升，這種sql在生產環境也是較多

覆蓋索引是能夠查到資料

在生產中，一個表的查詢是能夠數過來的，但是非常小的業務系統非常複雜

例：

Create fulltext indexidx_xxx on TbName(xxxx);

select * from tbwhere match(xxxx) against(‘wubx’);

使用索引中注意的事項

首先建立表結構：

mysql> createtable tb_1 (

-> id int unsigned not null auto_increment,

-> c1 varchar(200) default null ,

-> c2 int not null,

-> primary key (id)

-> );

Query OK, 0 rowsaffected (0.06 sec)

插入資料

mysql> insert intotb_1(c1, c2) values(NULL,1),(1,2),(NULL,3);

Query OK, 3 rowsaffected (0.01 sec)

Records: 3 Duplicates: 0 Warnings: 0

檢視每列的資料量

mysql> selectcount(c1) ,count(*) ,count(1), count(c2),count(id) from tb_1;

+———–+———-+———-+———–+———–+

+———–+———-+———-+———–+———–+

| 1 | 3 | 3 | 3 | 3 |

+———–+———-+———-+———–+———–+

1 row in set (0.00sec)

mysql> select *from tb_1;

+—-+——+—-+

| id | c1 | c2 |

+—-+——+—-+

| 1 | NULL | 1 |

| 2 | 1 | 2 |

| 3 | NULL | 3 |

+—-+——+—-+

3 rows in set (0.00sec)

這裡面觀察出，null是不被統計的，而且null在設計欄位裡如果需要的，需要多一個位元組去標示，所以需要多佔用一個標示位

所以我們需要注意的是：

1.索引不會包含有NULL值的列

2.普通索引列可以有NULL

索引的選擇區分度最大：

比如索引的欄位，比如性別男&女這個值如果在在幾千萬的資料那麼很小，但是在表裡面有列是最大的，則是使用者的ID號：user_id，每個使用者的ID是唯一的，那麼這個列是可以作為索引的，因為是區分度也就是最高的，另外需要使用短索引，如果使用者名稱裡定義的是varchar（32）實際上我們可以用15個就可以標記出來那麼我們可以：

create index idx_username ontable_name (username(15));

查詢中使用like

例：

like “%aaa%” #這種是不能夠用到索引的

idx_c6(c6) where c6 like “av%”;

而 like av% 是能夠用到索引

idx_c6(c6)
where c6 like “av%”

這樣也是可以用到索引的，like語句如果前後百分號是不能用索引的，如果是以字元開頭並以百分號結尾的是可以用到索引的

#如果區分度已經有user_id 這種特別大的列，那麼就沒有必要做其他操作，所以不建議將區分度大的索引與其他索引放在一起，如果放在一起是為了實現索引覆蓋或查詢這種特殊場景，是比較合適的，因為是無法回表

不在列上進行運算

排完序需要取最終的資料，比如oder by 或group by 或select * 之類的sql，索引中沒有包含特殊的資料都是需要回表的

尤其是select * 的語句如果沒有建立全表索引都是要回表的

如下所示：

select * from users where YEAR(adddate)< 2007;

adddate timestamp

這種sql太多是用不到索引的，如果改為基於事件查詢則可以:

（因為2007也是通過引數傳遞進來的）

select * from useradddate <`2007-01-01 00:00:00`;

select * from tbwhere addtime > `2000-01-01 00:00:00` < `2014-XX-XX XX:XX:XX.`;

將其換為小一點的時間這樣意義大一些

差勁的sql案例：

包含不等於，比如id = 1;

select * from ywwhere id!=1;

表示如果不等於1的id 其他全部列印出來。

mysql>explain select * from yw_idx whereid!=1;

+—-+————-+——–+——-+—————+———+———+——+———+————-+

+—-+————-+——–+——-+—————+———+———+——+———+————-+

+—-+————-+——–+——-+—————+———+———+——+———+————-+

1 row in set (0.08sec)

這樣的話相當於一次全表掃描

從掃描的行數來看，優化器explain中有rows欄位

從row列表中，可看到接近全表的操作

這裡還有一個情況，我們使用limit並檢視效果

mysql> select *from yw where id!=1 limit 1;

+—-+———+——–+———+——–+———————+————–+

| id | c1 | c2 | c3 | c4 | c5 | c6 |

+—-+———+——–+———+——–+———————+————–+

| 2 | 2333997 | 269341 | 2459005 | 915557 |2014-09-24 15:38:29 | wubxwubxwubx |

+—-+———+——–+———+——–+———————+————–+

1 row in set (0.00sec)

select * from ywwhere id!=1;

select * from ywwhere id!=1 limit 1;

以上兩者可以對比

看起來沒有變化還是這麼多行，但是在limit執行的時候會有流式化的輸出，每當讀取到一行的時候會放入到buffer池中，存到一定數量後會對其進行一次排序，如當已滿足條件了，則不會再進行匹配

但是如果limit之後，速度會快很多，雖然看到此情況，可能會用到索引了，這也是用索引的一種場景

使用not in

not in的主要作用是在執行sql查詢語句的時候不在哪一個資料範圍的的記錄

mysql> select *from yw_idx where c2 not in(4262384,3605632);

mysql> explainselect * from yw_idx where c2 not in(4262384,3605632);

+—-+————-+——–+——+—————+——+———+——+———+————-+

+—-+————-+——–+——+—————+——+———+——+———+————-+

+—-+————-+——–+——+—————+——+———+——+———+————-+

1 row in set (0.10sec)

對其優化：

一般來說不能直接使用not in之類的sql語句，這屬於病態sql

優化的時候可以加一個LIMIT，以減少IO

另外limit結果較大的話或者對其結果不滿意的話，可以改為使用left join，然後用主建去關聯id為b 而b.id 為null，如下所示：

mysql> select *from yw a left join ( select id from yw where c2 in (4262384, 3605632)) b ona.id=b.id where b.id is null limit 10;

+—-+———+———+———+———+———————+——————————————————+——+

| id | c1 | c2 | c3 | c4 | c5 | c6 | id |

+—-+———+———+———+———+———————+——————————————————+——+

| 1 | 463681 | 1098981 | 1817518 | 2222359 | 2014-09-24 15:38:29 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubx | NULL |

| 2 | 2333997 | 269341 | 2459005 | 915557 |2014-09-24 15:38:29 | wubxwubxwubx | NULL|

| 3 | 2971523 | 1226698 | 842469 | 414525 | 2014-09-24 15:38:29 | wubxwubxwubxwubxwubxwubxwubxwubxwubx | NULL |

| 4 | 2835700 | 930937 | 2835332 | 1945110 | 2014-09-24 15:38:29 | wubx | NULL |

| 5 | 1578655 | 1044887 | 2649255 | 2307696 |2014-09-24 15:38:29 | wubxwubxwubxwubxwubxwubxwubx | NULL |

| 6 | 1442242 | 992011 | 1740281 | 190626 |2014-09-24 15:38:29 | wubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubx |NULL |

| 7 | 693798 | 309586 | 753637 | 2403923 | 2014-09-24 15:38:29 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubx | NULL |

| 8 | 888272 | 2581335 | 1547343 | 1465295 | 2014-09-24 15:38:29 |wubxwubxwubxwubx | NULL |

| 9 | 1608599 | 240304 | 2475805 | 2157717 | 2014-09-24 15:38:29 | wubxwubxwubxwubx | NULL |

| 10 | 2833881 | 185188 | 1736996 | 565924 | 2014-09-24 15:38:29 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubx | NULL |

+—-+———+———+———+———+———————+——————————————————+——+

10 rows in set (17.04sec)

改為順序IO

另外一種情況就是將其改為順序IO去取前幾行

只oder by id 將結果只取前10行，如果發現前10行已經能夠滿足需求的話，則將其取出不再做其他操作

mysql> select *from yw where c2 not in(4262384,3605632) order by id limit 10;

+—-+———+———+———+———+———————+——————————————————+

| id | c1 | c2 | c3 | c4 | c5 | c6 |

+—-+———+———+———+———+———————+——————————————————+

| 1 | 463681 | 1098981 | 1817518 | 2222359 | 2014-09-24 15:38:29 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubx |

| 2 | 2333997 | 269341 | 2459005 | 915557 |2014-09-24 15:38:29 | wubxwubxwubx |

| 3 | 2971523 | 1226698 | 842469 | 414525 | 2014-09-24 15:38:29 | wubxwubxwubxwubxwubxwubxwubxwubxwubx |

| 4 | 2835700 | 930937 | 2835332 | 1945110 | 2014-09-24 15:38:29 | wubx |

| 5 | 1578655 | 1044887 | 2649255 | 2307696 |2014-09-24 15:38:29 | wubxwubxwubxwubxwubxwubxwubx |

| 6 | 1442242 | 992011 | 1740281 | 190626 |2014-09-24 15:38:29 | wubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubxwubx |

| 7 | 693798 | 309586 | 753637 | 2403923 | 2014-09-24 15:38:29 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubx |

| 8 | 888272 | 2581335 | 1547343 | 1465295 | 2014-09-24 15:38:29 |wubxwubxwubxwubx |

| 9 | 1608599 | 240304 | 2475805 | 2157717 | 2014-09-24 15:38:29 | wubxwubxwubxwubx |

| 10 | 2833881 | 185188 | 1736996 | 565924 | 2014-09-24 15:38:29 |wubxwubxwubxwubxwubxwubxwubxwubxwubxwubx |

+—-+———+———+———+———+———————+——————————————————+

10 rows in set (0.02sec)

只將結果集比較大的將其追加limit ，再去想辦法對其優化

select * from yw a where a.id not in (select id from yw where id<100000) limit 100;

select * from yw a left join (select id from yw where id<100000) b where a.id = b.id and b.id is null limit 100;

總結

MySQL支援什麼樣的索引以及這幾種索引的區別是什麼

MySQL支援普通索引、組合索引、唯一索引、主鍵、全文索引、字首索引（隸屬於普通索引）

唯一索引：涉及到約束，包括在讀的時間不會往下去讀的

主建在innodb中作為一個聚集建所存在

短索引：短索引就是字首索引，包含在普通索引中

當一個表裡有一個int索引，varchar索引, 一個查詢會麼用到那個索引？（是先用int索引還是varchar索引呢？）

涉及到索引查詢成本的理解

例：

select * from tbwhere c1_int=xxx and c2_varchar=xxx;

同樣存在idx_c1和idx_c2

假設來講：總體查詢成本是1，基於int查詢0.6 基於varchar 是0.7

實際來說基於varchar的是0.7那麼，對於mysql來講是選擇任何一個索引;

如果查詢成本1，那麼其他兩個都小於查詢成本，那麼都是可以的;

那麼這時我們到底選擇哪個完全取決於mysql的機制，mysql是按照先後的順序進行選取

比如

create index varchar

首先建立varchar，那麼，它會選擇varchar，以此類推

所以，是根據其順序進行選擇的，也是mysql的機制

對其核心優化整體框架

核心概念：減少IO

實際行動：控制結果集大小，爭取1秒內完成

減少IO，考慮是否使用順序IO

通過例項來理解MySQL索引薦

相關文章