深入淺出 MySQL 優先佇列(你一定會踩到的order by limit 問題)

英語和演算法是程式設計師的兩條腿

本文適用於 MySQL 5.6 及以上版本

0.先拋問題

假設欄位category無索引且有重複值，order by category 和 limit 組合使用的結果會和預期不符。

問題復現：

表結構（就是兩個欄位）

CREATE TABLE `ratings` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `category` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

對所有資料按category欄位排序： select * from ratings order by category;

id	category
1	1
5	1
10	1
3	2
4	2
6	2
9	2
2	3
7	3
8	3

當我們想分頁展示前5條時使用select * from ratings order by category limit 5;

期望得到的ID順序是1 5 10 3 4。

但實際結果如下：

id	category
1	1
10	1
5	1
3	2
4	2

怎麼肥似？MySQL 出 Bug 了？

可能有同學遇到過這個問題，百度或谷歌一下解決了，你有沒有想過，你查到的辦法是最優解嗎？別人是怎麼得出這個辦法的？MySQL 為什麼會這樣做，跟版本有關嗎？

先拋結論：

最優解是後面再加個列值唯一的排序欄位，如：order by category,id；
MySQL 為什麼這樣做？答案是為了快！（MySQL 5.6及其之後才有此優化）
次優解是對order by後面的category 加索引（為什麼是次優解？看完本文你將會有答案）；

下面課代表將還原一下這 3 條結論的產出過程。

1. 最優解

MySQL 文件 8.2.1.19 LIMIT Query Optimization 中對此場景有如下描述：

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.
One factor that affects the execution plan is LIMIT, so an ORDER BY query with and without LIMIT may return rows in different orders.

總結來說就是：

當 ORDER BY 列的欄位值存在重複，那麼這條 ORDER BY 語句返回的資料順序會因為LIMIT的存在而變得不一樣

這是 MySQL 預設對該場景做的優化，如果你需要保證加不加 LIMIT 順序都要一致，官方也給出了辦法：

If it is important to ensure the same row order with and without LIMIT, include additional columns in the ORDER BY clause to make the order deterministic.

就是在ORDER BY 後面再多加一個排序欄位（比如 ID 欄位）。

以上描述最早出現在MySQL 5.6文件中，從這個版本開始，引入了這個針對ORDER BY LIMIT 的優化。

好了，針對文中的場景，我們只需要select * from ratings order by category,id;即可解決。

那麼問題來了，MySQL 為什麼要做這麼一個看似是 Bug 的優化？

2.MySQL 的 ORDER BY 邏輯

顧名思義，ORDER BY 就是排序。

執行一下explain select * from ratings order by category limit 5;

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ratings
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 10
     filtered: 100.00
        Extra: Using filesort
1 row in set, 1 warning (0.00 sec)

可以看到 Extra: Using filesort 表示需要排序。

正常情況下， MySQL 會有記憶體排序和外部排序兩種：

如果待排序的資料量小於sort buffer size，排序就在記憶體中完成(快速排序)；
如果待排序的資料量大於sort buffer size，就使用臨時檔案進行外部排序(歸併排序);

很明顯，這兩種排序都是對所有結果全部排序，講道理，不管有沒有LIMIT，都是從排完序的結果中按順序取需要的條數，有沒有LIMIT是不會影響返回的結果順序的。

但是，MySQL 5.6 版本針對 ORDER BY LIMIT做了個小優化（排序欄位無索引，且列值不唯一時）：優化器在遇到 ORDER BY LIMIT語句的時候，使用了priority queue。

filesort.cc 中有如下虛擬碼描述該優化：

while (get_next_sortkey())
   {
     if (using priority queue)
       push sort key into queue
     else
     {
       try to put sort key into buffer;
       if (no free space in sort buffer)
       {
         do {
           allocate new, larger buffer;
           retry putting sort key into buffer;
         } until (record fits or no space for new buffer)
         if (no space for new buffer)
         {
           sort record pointers (all buffers);
           dump sorted sequence to 'tempfile';
           dump Merge_chunk describing sequence location into 'chunk_file';
         }
       }
       if (key was packed)
         tell sort buffer the actual number of bytes used;
     }
   }
   if (buffer has some elements && dumped at least once)
     sort-dump-dump as above;
   else
     don't sort, leave sort buffer to be sorted by caller.

並在 WL#1393: Optimizing filesort with small limit 中闡述了該優化邏輯:

Many web customers have to do
"SELECT ... ORDER BY non_index_column LIMIT X",

When X *  is smaller than sort_buff_size we can use
the following algoritm to speed up the sort:

- Create a queue to hold 'limit' keys.
- Scan through the table and store the first (last if DESC) keys in the queue
- Return values from queue

This is much faster than the current algoritm that works as:

該 WorkLog 中記錄了優化後的效果：10 to 20 times faster than a quicksort(感興趣的同學可以去閱讀原文)。

所以，就是為了快！

MySQL 認為這種場景就是求 TOP N 的問題，使用 priority queue 就能解決。

3.priority queue(優先順序佇列)

priority queue 其實就是堆，Java 中有java.util.PriorityQueue類，其本質就是堆這種資料結構。

簡單解釋一下什麼是堆：

堆是一個完全二叉樹；

堆中每一個節點的值都必須大於等於（大頂堆）或小於等於（小頂堆）其子樹中每個節點的值。

如果 MySQL 使用歸併或快排，需要把所有資料都排好序，再取LIMIT 的前幾條，剩餘已排序的資料就白白浪費了。

而採用 priority queue 可以根據 LIMIT的條數維護一個堆，只需要把所有資料在這個堆裡過一遍就能得到結果。

使用如下語句可以驗證 MySQL 使用了 priority queue：

SET optimizer_trace='enabled=on';
select * from ratings order by category limit 5;
SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`\G;

 "filesort_priority_queue_optimization": {
              "limit": 5,
              "chosen": true
            },

可以看到 filesort_priority_queue_optimization.chosen = true

下面用流程圖還原一下 priority queue 的執行邏輯（以LIMIT 5為例）：

友情提示：圖中的小頂堆以 category 值的大小排序

取前五條資料構成一個小頂堆：

取下一行資料(6,2),發現 2 小於當前堆中最大的category 3，於是把(2,3)從堆中刪掉，把(6,2) 入堆：

重複步驟 2，直至符合查詢條件的資料都經歷過比較入堆，最終堆中資料如圖：

以上就是通過 priority queue 找到最小的 5 行 category 資料的執行過程。

最後我們將其出堆即可得到結果，每次出堆最小元素後將最後一個元素放入堆頂，按照小頂堆重新堆化，過程如圖：

可以看到，這個結果和select * from ratings order by category limit 5;的輸出一致

4.加索引為什麼是次優解

顯然，按照ORDER BY 的邏輯，直接對排序欄位加索引也可以省去記憶體排序步驟，從而解決這個問題。

但索引也不是銀彈，多出來的category索引會增加表的維護成本，如果沒有明顯的業務需要，單純為了繞過這個priority queue的優化而加索引，課代表認為有點得不償失。

尤其是當表資料量非常大的時候，索引的體量會很可觀。而且，針對文中場景，category作為分類欄位，重複率會比較高，即使有按分類查詢的業務 SQL ，MySQL 也不一定會選取這條索引。

綜上，針對本場景，個人認為order by category,id才是該問題的最優解。

PS：會不會有人問：關我鳥事，我從沒寫過帶 LIMIT 的 SQL 啊！

難道你寫的 CRUD 功能都不帶分頁的嗎？PageHelper 原始碼去了解一下？

5. 總結

本文案例是課代表上線過程中遭遇到的實際問題，諮詢了下週圍同學，有好幾個都遇到過此問題，網上文章大多淺入淺出，讀完有隔靴搔癢之感，無法解答心中疑惑。遂整理此文。

其中涉及資料結構，PageHelper，MySQL 文件，相關參考資料羅列在文末，如果有時間能順著文章思路親自讀一遍參考文件，相信會有更深的收穫。

6.參考資料：

《資料結構與演算法之美》---王爭第 28,29 講
《MySQL實戰45講》---林曉斌第 04、05、10、16、17 講
8.2.1.16 LIMIT Query Optimization---https://dev.mysql.com/doc/ref...
MySQL · 答疑解惑 · MySQL Sort 分頁---http://mysql.taobao.org/month...
filesort.cc---https://dev.mysql.com/doc/dev...
WL#1393: Optimizing filesort with small limit---https://dev.mysql.com/worklog...

?關注 Java課代表，獲取最新 Java 乾貨?