背景：
某業務DB從5.5升級5.6後，同一sql執行時間由毫秒級飆升到了20秒，sql文字如下

點選(此處)摺疊或開啟

select * from big_table as t
where ( t.plan_TYPE = 1 or t.plan_TYPE=3 )
and t.limit_TYPE in (10)
and t.xx_ID = 25773208367
and t.USER_ID in (133174222100)
plan by t.gmt_create desc , t.ID desc limit 1,10

摸索過程：

點選(此處)摺疊或開啟

檢視當前執行計劃
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
type: range
possible_keys: idx_xx_id,idx_gmt_create
key: idx_gmt_create
key_len: 17
ref: NULL
rows: 6816016
Extra: Using index condition; Using where
1 row in set (0.00 sec)

表上索引分佈
PRIMARY KEY (`id`),
KEY `idx_xx_id` (`xx_id`,`plan_type`,`user_id`),
……
KEY `idx_gmt_create` (`limit_type`,`xx_id`,`gmt_create`)
該sql選擇索引idx_gmt_create，因其符合最左字首策略，故排序沒有使用到filesort，其訪問路徑大致如下：
1 反向掃描idx_gmt_create葉子結點，搜尋(limit_type,xx_id)=(10,25773208367)的元組；
2 回表，驗證該元組對應的主鍵記錄是否滿足約束(plan_type,user_id)=(1 or 3, 133174222100)，滿足則計數加1否則丟棄，繼續掃描下一個元組；
3 當計數達到10時，停止掃描，將對應的10條記錄返回給客戶端；

點選(此處)摺疊或開啟

root@ 03:20:56>select limit_type,count(*) from big_table group by limit_type;
+-------------+----------+
| limit_type | count(*) |
+-------------+----------+
| NULL | 226865 |
| 9 | 463346 |
| 10 | 13353116 |
+-------------+----------+
3 rows in set (3.13 sec)

作為複合索引的引導列，limit_type欄位的選擇性驚人的低，這是查詢變慢的主要原因之一。
之所以要強調”之一”，是因為針對本例，只要其他欄位足夠給力，即便limit_type=10也能很快執行完畢，
檢視xx_id的分佈情況，也是比較畸形，該sql又很不幸的選擇了候選行最多的那個

點選(此處)摺疊或開啟

root@ 04:01:12>select xx_id,count(*) from big_table where limit_type =10 group by xx_id order by xx_id desc;
+-------------+----------+
| xx_id | count(*) |
+-------------+----------+
| 25773208367 | 13352433 |
| 25770261347 | 2 |
| 258809681 | 148 |
| 1 | 2100 |
+-------------+----------+
4 rows in set (5.79 sec)

如果xx_id=1，該sql最多隻需要比較2100條記錄即可返回，會很快執行完畢；
即便xx= 25773208367，如果能快速找出滿足非索引欄位約束的主鍵記錄，sql也會很快執行完畢，mysql是在驗證了海量的(limit_type,xx_id)=(10,25773208367)元組後，才湊齊10條同時滿足(plan_type,user_id)約束的主鍵記錄，據此我們可以反推出最早滿足所有約束條件的user_id，其查詢邏輯如下

點選(此處)摺疊或開啟

select user_id,count(*) from big_table t
where limit_type =10 and xx_id =25773208367
and ( t.plan_TYPE = 1 or t.plan_TYPE=3 )
group by user_id having count(*)>=10 plan by gmt_create desc limit 1,5;
執行結果
+------------+----------+
| user_id | count(*) |
+------------+----------+
| 1851362558 | 15 |
| 2118141658 | 11 |
| 2641244918 | 14 |
| 2448823838 | 17 |
| 16375410 | 32 |
+------------+----------+
5 rows in set (1 min 12.42 sec)

隨便挑一個替換25773208367，比如1851362558，執行計劃沒有變，原本需要執行20多秒的sql卻在200毫秒內執行完畢。
而5.5版本的執行計劃為

點選(此處)摺疊或開啟

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
type: range
possible_keys: idx_xx_id
key: idx_xx_id
key_len: 18
ref: NULL
rows: 116
Extra: Using where; Using filesort

雖然採用了filesort，但是經索引idx_xx_id過濾後的候選行非常少，故執行時間很短。

解決方案
1 修改sql，新增force index (idx_xx_id)，此方案不夠靈活；
2 修改sql，將排序欄位從gmt_create改為gmt_modified，因無法採用索引排序5.6會選擇idx_xx_id，此方案可能造成返回資料有誤；
3 修改sql，將t.limit_TYPE in (10)改為t.limit_TYPE >9 and t.limit_TYPE <11，最佳化器會認為sql沒有滿足索引最左字首便不再使用idx_gmt_create,這招似乎有點賤，同時說明mysql最佳化器還不夠智慧。

結束語
5.6最佳化器做了大量改進，以本sql為例，讓其選擇了idx_gmt_create從而省去了filesort，之所以執行變慢了是因為表欄位資料分佈太不均勻，而本sql又湊巧滿足了各種坑，這算是一個意外吧。

排序sql升級5.6變慢原因分析

相關文章