阿里面試官：MySQL如何設計索引更高效？

有情懷，有乾貨，微信搜尋【三太子敖丙】關注這個不一樣的程式設計師。
本文 GitHub https://github.com/JavaFamily 已收錄，有一線大廠面試完整考點、資料以及我的系列文章。

前言

資料庫系列更新到現在我想大家對所有的概念都已有個大概認識了，這周我在看評論的時候我發現有個網友的提問我覺得很有意思：帥丙如何設計一個索引？你們都是怎麼設計索引的？怎麼設計更高效？

我一想索引我寫過很多了呀，沒道理讀者還不會啊，但是我一回頭看完，那確實，我就寫了索引的概念，優劣勢，沒提到怎麼設計，那這篇文章又這樣應運而生了。

本文還是會有很多之前寫過的重複概念，但是也是為了大家能更好的理解MySQL中幾種索引設計的原理。

正文

我們知道，索引是一個基於連結串列實現的樹狀Tree結構，能夠快速的檢索資料，目前幾乎所RDBMS資料庫都實現了索引特性，比如MySQL的B+Tree索引，MongoDB的BTree索引等。

在業務開發過程中，索引設計高效與否決定了介面對應SQL的執行效率，高效的索引可以降低介面的Response Time，同時還可以降低成本，我們要現實的目標是：索引設計->降低介面響應時間->降低伺服器配置->降低成本，最終要落實到成本上來，因為老闆最關心的是成本。

今天就跟大家聊聊MySQL中的索引以及如何設計索引，使用索引才能提降低介面的RT，提高使用者體檢。

MySQL中的索引

MySQL中的InnoDB引擎使用B+Tree結構來儲存索引，可以儘量減少資料查詢時磁碟IO次數，同時樹的高度直接影響了查詢的效能，一般樹的高度維持在 3~4 層。

B+Tree由三部分組成：根root、枝branch以及Leaf葉子，其中root和branch不儲存資料，只儲存指標地址，資料全部儲存在Leaf Node，同時Leaf Node之間用雙向連結串列連結，結構如下：

從上面可以看到，每個Leaf Node是三部分組成的，即前驅指標p_prev，資料data以及後繼指標p_next，同時資料data是有序的，預設是升序ASC，分佈在B+tree右邊的鍵值總是大於左邊的，同時從root到每個Leaf的距離是相等的，也就是訪問任何一個Leaf Node需要的IO是一樣的，即索引樹的高度Level + 1次IO操作。

我們可以將MySQL中的索引可以看成一張小表，佔用磁碟空間，建立索引的過程其實就是按照索引列排序的過程，先在sort_buffer_size進行排序，如果排序的資料量大，sort_buffer_size容量不下，就需要通過臨時檔案來排序，最重要的是通過索引可以避免排序操作（distinct，group by，order by）。

聚集索引

MySQL中的表是IOT（Index Organization Table，索引組織表)，資料按照主鍵id順序儲存（邏輯上是連續，物理上不連續），而且主鍵id是聚集索引（clustered index），儲存著整行資料，如果沒有顯示的指定主鍵，MySQL會將所有的列組合起來構造一個row_id作為primary key，例如表users(id, user_id, user_name, phone, primary key(id))，id是聚集索引，儲存了id, user_id, user_name, phone整行的資料。

輔助索引

輔助索引也稱為二級索引，索引中除了儲存索引列外，還儲存了主鍵id，對於user_name的索引idx_user_name(user_name)而言，其實等價於idx_user_name(user_name, id)，MySQL會自動在輔助索引的最後新增上主鍵id，熟悉Oracle資料庫的都知道，索引裡除了索引列還儲存了row_id（代表資料的物理位置，由四部分組成：物件編號+資料檔案號+資料塊號+資料行號），我們在建立輔助索引也可以顯示新增主鍵id。

-- 建立user_name列上的索引
mysql> create index idx_user_name on users(user_name);
-- 顯示新增主鍵id建立索引
mysql> create index idx_user_name_id on users(user_name,id);
-- 對比兩個索引的統計資料
mysql> select a.space as tbl_spaceid, a.table_id, a.name as table_name, row_format, space_type,  b.index_id , b.name as index_name, n_fields, page_no, b.type as index_type  from information_schema.INNODB_TABLES a left join information_schema.INNODB_INDEXES b  on a.table_id =b.table_id where a.name = 'test/users';
+-------------+----------+------------+------------+------------+----------+------------------+----------+------
| tbl_spaceid | table_id | table_name | row_format | space_type | index_id | index_name       | n_fields | page_no | index_type |
+-------------+----------+------------+------------+------------+----------+------------------+----------+------
|         518 |     1586 | test/users | Dynamic    | Single     |     1254 | PRIMARY          |        9 |       4 |          3 |
|         518 |     1586 | test/users | Dynamic    | Single     |     4003 | idx_user_name    |        2 |       5 |          0 |
|         518 |     1586 | test/users | Dynamic    | Single     |     4004 | idx_user_name_id |        2 |      45 |          0 |
mysql> select index_name, last_update, stat_name, stat_value, stat_description from mysql.innodb_index_stats where index_name in ('idx_user_name','idx_user_name_id');
+------------------+---------------------+--------------+------------+-----------------------------------+
| index_name       | last_update         | stat_name    | stat_value | stat_description                  |
+------------------+---------------------+--------------+------------+-----------------------------------+   
| idx_user_name    | 2021-01-02 17:14:48 | n_leaf_pages |       1358 | Number of leaf pages in the index |
| idx_user_name    | 2021-01-02 17:14:48 | size         |       1572 | Number of pages in the index      |
| idx_user_name_id | 2021-01-02 17:14:48 | n_leaf_pages |       1358 | Number of leaf pages in the index |
| idx_user_name_id | 2021-01-02 17:14:48 | size         |       1572 | Number of pages in the index      |

對比一下兩個索引的結果，n_fields表示索引中的列數，n_leaf_pages表示索引中的葉子頁數，size表示索引中的總頁數，通過資料比對就可以看到，輔助索引中確實包含了主鍵id，也說明了這兩個索引時完全一致。

Index_name	n_fields	n_leaf_pages	size
idx_user_name	2	1358	1572
idx_user_name_id	2	1358	1572

索引回表

上面證明了輔助索引包含主鍵id，如果通過輔助索引列去過濾資料有可能需要回表，舉個例子：業務需要通過使用者名稱user_name去查詢使用者表users的資訊，業務介面對應的SQL：

select  user_id, user_name, phone from users where user_name = 'Laaa';

我們知道，對於索引idx_user_name而言，其實就是一個小表idx_user_name(user_name, id)，如果只查詢索引中的列，只需要掃描索引就能獲取到所需資料，是不需要回表的，如下SQL語句：

SQL 1: select id, user_name from users where user_name = 'Laaa';

SQL 2: select id from users where user_name = 'Laaa';

mysql> explain select id, name from users where name = 'Laaa';
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
| id | select_type | table | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
|  1 | SIMPLE      | users | NULL       | ref  | idx_user_name | idx_user_name | 82      | const |    1 |   100.00 | Using index |
mysql> explain select id from users where name = 'Laaa';
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
| id | select_type | table | partitions | type | possible_keys | key           | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+-------
|  1 | SIMPLE      | users | NULL       | ref  | idx_user_name | idx_user_name | 82      | const |    1 |   100.00 | Using index |

SQL 1和SQL 2的執行計劃中的Extra=Using index 表示使用覆蓋索引掃描，不需要回表，再來看上面的業務SQL：

select user_id, user_name, phone from users where user_name = 'Laaa';

可以看到select後面的user_id，phone列不在索引idx_user_name中，就需要通過主鍵id進行回表查詢，MySQL內部分如下兩個階段處理：

Section 1： select **id** from users where user_name = 'Laaa' //id = 100101

Section 2: select user_id, user_name, phone from users where id = 100101;

將Section 2的操作稱為回表，即通過輔助索引中的主鍵id去原表中查詢資料。

索引高度

MySQL的索引時B+tree結構，即使表裡有上億條資料，索引的高度都不會很高，通常維持在3-4層左右，我來計算下索引idx_name的高度，從上面知道索引資訊：index_id = 4003， page_no = 5，它的偏移量offset就是page_no x innodo_page_size + 64 = 81984，通過hexdump進行檢視

$hexdump -s 81984 -n 10 /usr/local/var/mysql/test/users.ibd
0014040 00 02 00 00 00 00 00 00 0f a3                  
001404a

其中索引的PAGE_LEVEL為00，即idx_user_name索引高度為1，0f a3 代表索引編號，轉換為十進位制是4003，正是index_id。

資料掃描方式

全表掃描

從左到右依次掃描整個B+Tree獲取資料，掃描整個表資料，IO開銷大，速度慢，鎖等嚴重，影響MySQL的併發。

對於OLAP的業務場景，需要掃描返回大量資料，這時候全表掃描的順序IO效率更高。

索引掃描

通常來講索引比表小，掃描的資料量小，消耗的IO少，執行速度塊，幾乎沒有鎖等，能夠提高MySQL的併發。

對於OLTP系統，希望所有的SQL都能命中合適的索引總是美好的。

主要區別就是掃描資料量大小以及IO的操作，全表掃描是順序IO，索引掃描是隨機IO，MySQL對此做了優化，增加了change buffer特性來提高IO效能。

索引優化案例

分頁查詢優化

業務要根據時間範圍查詢交易記錄，介面原始的SQL如下：

select  * from trade_info where status = 0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59' order by id desc limit 102120, 20;

表trade_info上有索引idx_status_create_time(status,create_time)，通過上面分析知道，等價於索引（status,create_time,id)，對於典型的分頁limit m, n來說，越往後翻頁越慢，也就是m越大會越慢，因為要定位m位置需要掃描的資料越來越多，導致IO開銷比較大，這裡可以利用輔助索引的覆蓋掃描來進行優化，先獲取id，這一步就是索引覆蓋掃描，不需要回表，然後通過id跟原表trade_info進行關聯，改寫後的SQL如下：

select * from trade_info a ,

(select  id from trade_info where status = 0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59' order by id desc limit 102120, 20) as b   //這一步走的是索引覆蓋掃描，不需要回表
 where a.id = b.id;

很多同學只知道這樣寫效率高，但是未必知道為什麼要這樣改寫，理解索引特性對編寫高質量的SQL尤為重要。

分而治之總是不錯的

營銷系統有一批過期的優惠卷要失效，核心SQL如下：

-- 需要更新的資料量500w
update coupons set status = 1 where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59';

在Oracle裡更新500w資料是很快，因為可以利用多個cpu core去執行，但是MySQL就需要注意了，一個SQL只能使用一個cpu core去處理，如果SQL很複雜或執行很慢，就會阻塞後面的SQL請求，造成活動連線數暴增，MySQL CPU 100%，相應的介面Timeout，同時對於主從複製架構，而且做了業務讀寫分離，更新500w資料需要5分鐘，Master上執行了5分鐘，binlog傳到了slave也需要執行5分鐘，那就是Slave延遲5分鐘，在這期間會造成業務髒資料，比如重複下單等。

優化思路：先獲取where條件中的最小id和最大id，然後分批次去更新，每個批次1000條，這樣既能快速完成更新，又能保證主從複製不會出現延遲。

優化如下：

先獲取要更新的資料範圍內的最小id和最大id（表沒有物理delete，所以id是連續的）

mysql> explain select min(id) min_id, max(id) max_id from coupons where status =0 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59'; 
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
| id | select_type | table | partitions | type  | possible_keys          | key                    | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------+------------+-------+------------------------+------------------------+---------+---
|  1 | SIMPLE      | users | NULL       | range | idx_status_create_time | idx_status_create_time | 6       | NULL | 180300 |   100.00 | Using where; Using index |

Extra=Using where; Using index使用了索引idx_status_create_time，同時需要的資料都在索引中能找到，所以不需要回表查詢資料。

以每次1000條commit一次進行迴圈update，主要程式碼如下：

current_id = min_id;
for  current_id < max_id do
  update coupons set status = 1 where id >=current_id and id <= current_id + 1000;  //通過主鍵id更新1000條很快
commit;
current_id += 1000;
done

這兩個案例告訴我們，要充分利用輔助索引包含主鍵id的特性，先通過索引獲取主鍵id走覆蓋索引掃描，不需要回表，然後再通過id去關聯操作是高效的，同時根據MySQL的特性使用分而治之的思想既能高效完成操作，又能避免主從複製延遲產生的業務資料混亂。

MySQL索引設計

熟悉了索引的特性之後，就可以在業務開發過程中設計高質量的索引，降低介面的響應時間。

字首索引

對於使用REDUNDANT或者COMPACT格式的InnoDB表，索引鍵字首長度限制為767位元組。如果TEXT或VARCHAR列的列字首索引超過191個字元，則可能會達到此限制，假定為utf8mb4字符集，每個字元最多4個位元組。

可以通過設定引數innodb_large_prefix來開啟或禁用索引字首長度的限制，即是設定為OFF，索引雖然可以建立成功，也會有一個警告，主要是因為index size會很大，效率大量的IO的操作，即使MySQL優化器命中了該索引，效率也不會很高。

-- 設定innodb_large_prefix=OFF禁用索引字首限制，雖然可以建立成功，但是有警告。
mysql> create index idx_nickname on users(nickname);    // `nickname` varchar(255)
Records: 0  Duplicates: 0  Warnings: 1
mysql> show warnings;
+---------+------+---------------------------------------------------------+
| Level   | Code | Message                                                 |
+---------+------+---------------------------------------------------------+
| Warning | 1071 | Specified key was too long; max key length is 767 bytes |

業務發展初期，為了快速實現功能，對一些資料表欄位的長度定義都比較寬鬆，比如使用者表users的暱稱nickname定義為varchar(128)，而且有業務介面需要通過nickname查詢，系統執行了一段時間之後，查詢users表最大的nickname長度為30，這個時候就可以建立字首索引來減小索引的長度提升效能。

-- `nickname` varchar(128) DEFAULT NULL定義的執行計劃
mysql> explain select * from users where nickname = 'Laaa';
+----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+--------
| id | select_type | table | partitions | type | possible_keys | key          | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------------+---------+-------+------+--------
|  1 | SIMPLE      | users | NULL       | ref  | idx_nickname  | idx_nickname | 515     | const |    1 |   100.00 | NULL  |

key_len=515，由於表和列都是utf8mb4字符集，每個字元佔4個位元組，變長資料型別+2Bytes，允許NULL額外+1Bytes，即128 x 4 + 2 + 1 = 515Bytes。建立字首索引，字首長度也可以不是當前表的資料列最大值，應該是區分度最高的那部分長度，一般能達到90%以上即可，例如email欄位儲存都是類似這樣的值xxxx@yyy.com，字首索引的最大長度可以是xxxx這部分的最大長度即可。

-- 建立字首索引，字首長度為30
mysql> create index idx_nickname_part on users(nickname(30));
-- 檢視執行計劃
mysql> explain select * from users where nickname = 'Laaa';
+----+-------------+-------+------------+------+--------------------------------+-------------------+---------+-
| id | select_type | table | partitions | type | possible_keys                  | key               | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+------+--------------------------------+-------------------+---------+-
|  1 | SIMPLE      | users | NULL       | ref  | idx_nickname_part,idx_nickname | idx_nickname_part | 123     | const |    1 |   100.00 | Using where |

可以看到優化器選擇了字首索引，索引長度為123，即30 x 4 + 2 + 1 = 123 Bytes，大小不到原來的四分之。

字首索引雖然可以減小索引的大小，但是不能消除排序。

mysql> explain select gender,count(*) from users where nickname like 'User100%' group by nickname limit 10;
+----+-------------+-------+------------+-------+--------------------------------+--------------+---------+-----
| id | select_type | table | partitions | type  | possible_keys                  | key          | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+--------------------------------+--------------+---------+-----
|  1 | SIMPLE      | users | NULL       | range | idx_nickname_part,idx_nickname | idx_nickname | 515     | NULL |  899 |   100.00 | Using index condition |
--可以看到Extra= Using index condition表示使用了索引，但是需要回表查詢資料，沒有發生排序操作。
mysql> explain select gender,count(*) from users where nickname like  'User100%' group by nickname limit 10;
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------
| id | select_type | table | partitions | type  | possible_keys     | key               | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+-------+-------------------+-------------------+---------+------+------
|  1 | SIMPLE      | users | NULL       | range | idx_nickname_part | idx_nickname_part | 123     | NULL |  899 |   100.00 | Using where; Using temporary |
--可以看到Extra= Using where; Using temporaryn表示在使用了索引的情況下，需要回表去查詢所需的資料，同時發生了排序操作。

複合索引

在單列索引不能很好的過濾資料的時候，可以結合where條件中其他欄位來建立複合索引，更好的去過濾資料，減少IO的掃描次數，舉個例子：業務需要按照時間段來查詢交易記錄，有如下的SQL：

select  * from trade_info where status = 1 and create_time >= '2020-10-01 00:00:00' and create_time <= '2020-10-07 23:59:59';

開發同學根據以往復合索引的設計的經驗：唯一值多選擇性好的列作為複合索引的前導列，所以建立複合索idx_create_time_status是高效的，因為create_time是一秒一個值，唯一值很多，選擇性很好，而status只有離散的6個值，所以認為這樣建立是沒問題的，但是這個經驗只適合於等值條件過濾，不適合有範圍條件過濾的情況，例如idx_user_id_status(user_id，status)這個是沒問題的，但是對於包含有create_time範圍的複合索引來說，就不適應了，我們來看下這兩種不同索引順序的差異，即idx_status_create_time和idx_create_time_status。

-- 分別建立兩種不同的複合索引
mysql> create index idx_status_create_time on trade_info(status, create_time);
mysql> create index idx_create_time_status on trade_info(create_time,status);
-- 檢視SQL的執行計劃
mysql> explain select * from users where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59';
+----+-------------+-------+------------+-------+-----------------------------------------------+---------------
| id | select_type | table | partitions | type  | possible_keys                                 | key                    | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+-------+------------+-------+-----------------------------------------------+---------------
|  1 | SIMPLE      | trade_info | NULL       | range | idx_status_create_time,idx_create_time_status | idx_status_create_time | 6       | NULL | 98518 |   100.00 | Using index condition |

從執行計劃可以看到，兩種不同順序的複合索引都存在的情況，MySQL優化器選擇的是idx_status_create_time索引，那為什麼不選擇idx_create_time_status，我們通過optimizer_trace來跟蹤優化器的選擇。

-- 開啟optimizer_trace跟蹤
mysql> set session optimizer_trace="enabled=on",end_markers_in_json=on;
-- 執行SQL語句
mysql> select * from trade_info where status = 1 and create_time >='2021-10-01 00:00:00' and create_time <= '2021-10-07 23:59:59';
-- 檢視跟蹤結果
mysql>SELECT trace FROM information_schema.OPTIMIZER_TRACE\G;

對比下兩個索引的統計資料，如下所示：

複合索引	Type	Rows	參與過濾索引列	Chosen	Cause
idx_status_create_time	Index Range Scan	98518	status AND create_time	True	Cost低
idx_create_time_status	Index Range Scan	98518	create_time	False	Cost高

MySQL優化器是基於Cost的，COST主要包括IO_COST和CPU_COST，MySQL的CBO（Cost-Based Optimizer基於成本的優化器）總是選擇Cost最小的作為最終的執行計劃去執行，從上面的分析，CBO選擇的是複合索引idx_status_create_time，因為該索引中的status和create_time都能參與了資料過濾，成本較低；而idx_create_time_status只有create_time引數資料過濾，status被忽略了，其實CBO將其簡化為單列索引idx_create_time，選擇性沒有複合索引idx_status_create_time好。

複合索引設計原則

將範圍查詢的列放在複合索引的最後面，例如idx_status_create_time。
列過濾的頻繁越高，選擇性越好，應該作為複合索引的前導列，適用於等值查詢，例如idx_user_id_status。

這兩個原則不是矛盾的，而是相輔相成的。

跳躍索引

一般情況下，如果表users有複合索引idx_status_create_time，我們都知道，單獨用create_time去查詢，MySQL優化器是不走索引，所以還需要再建立一個單列索引idx_create_time。用過Oracle的同學都知道，是可以走索引跳躍掃描（Index Skip Scan），在MySQL 8.0也實現Oracle類似的索引跳躍掃描，在優化器選項也可以看到skip_scan=on。

| optimizer_switch             |use_invisible_indexes=off,skip_scan=on,hash_join=on |

適合複合索引前導列唯一值少，後導列唯一值多的情況，如果前導列唯一值變多了，則MySQL CBO不會選擇索引跳躍掃描，取決於索引列的資料分表情況。

mysql> explain select id, user_id，status, phone from users where create_time >='2021-01-02 23:01:00' and create_time <= '2021-01-03 23:01:00';
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----
|  1 | SIMPLE      | users | NULL       | range  | idx_status_create_time          | idx_status_create_time | NULL    | NULL | 15636 |    11.11 | Using where; Using index for skip scan|

也可以通過optimizer_switch='skip_scan=off'來關閉索引跳躍掃描特性。

總結

本位為大家介紹了MySQL中的索引，包括聚集索引和輔助索引，輔助索引包含了主鍵id用於回表操作，同時利用覆蓋索引掃描可以更好的優化SQL。

同時也介紹瞭如何更好做MySQL索引設計，包括字首索引，複合索引的順序問題以及MySQL 8.0推出的索引跳躍掃描，我們都知道，索引可以加快資料的檢索，減少IO開銷，會佔用磁碟空間，是一種用空間換時間的優化手段，同時更新操作會導致索引頻繁的合併分裂，影響索引效能，在實際的業務開發中，如何根據業務場景去設計合適的索引是非常重要的，今天就聊這麼多，希望對大家有所幫助。

我是敖丙，你知道的越多，你不知道的越多，感謝各位的三連，我們下期見。

絮叨

敖丙把自己的面試文章整理成了一本電子書，共 1630頁！

乾貨滿滿，字字精髓。目錄如下，還有我複習時總結的面試題以及簡歷模板，現在免費送給大家。

連結:https://pan.baidu.com/s/1ZQEKJBgtYle3v-1LimcSwg 密碼:wjk6

我是敖丙，你知道的越多，你不知道的越多，感謝各位人才的：點贊、收藏和評論，我們下期見！

文章持續更新，可以微信搜一搜「 三太子敖丙 」第一時間閱讀，回覆【資料】有我準備的一線大廠面試資料和簡歷模板，本文 GitHub https://github.com/JavaFamily 已經收錄，有大廠面試完整考點，歡迎Star。