MySQL:關於排序order by limit值不穩定的說明（1）

gaopengtttt發表於2018-12-11

原文網址 : http://blog.itpub.net/7728585/viewspace-2284889/

導讀

作者：高鵬（重慶八怪）
原文地址：
https://www.jianshu.com/p/8c2154872f83
本系列文章將持續更新，歡迎關注~
作者按：如果有誤請諒解和指正，本文僅僅作為拋磚引玉，歡迎交流拍磚，謝謝！

原始碼版本：5.7.14
本文約定：PQ 就是 Priority Queue 及優先佇列其實現演算法是堆排序。

一、問題丟擲

資料如下：

CREATE TABLE `testse` (
  `id` int(11) NOT NULL,
  `nu` int(11) DEFAULT NULL,
  `name` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `nu` (`nu`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


INSERT INTO `testse` VALUES (-1,14,'gaopeng'),(0,14,'gaopeng'),(1,1,'gaopeng'),(2,99,'gaopeng'),(3,55,'gaopeng'),(4,20,'gaopeng'),(5,24,'gaopeng'),(6,14,'gaopeng'),(7,1
4,'gaopeng'),(8,13,'gaopeng'),(9,9,'gaopeng'),(10,19,'gaopeng'),(11,20,'gaopeng'),(12,14,'gaopeng'),(13,15,'gaopeng'),(14,16,'gaopeng'),(15,20,'gaopeng'),(100,14,'gaope
ng'),(111,14,'gaopeng');

問題如下：

mysql>  select *  from testse order by  nu limit 3,1;
+-----+------+---------+
| id  | nu   | name    |
+-----+------+---------+
| 111 |   14 | gaopeng |
+-----+------+---------+
1 row in set (2.76 sec)

mysql>  select *  from testse force index(nu) order by  nu limit 3,1;
+----+------+---------+
| id | nu   | name    |
+----+------+---------+
| -1 |   14 | gaopeng |
+----+------+---------+
1 row in set (0.00 sec)

問為什麼這兩個語句得到的資料不一樣？

二、問題原因

這裡首先給出原因：在MySQL排序的時候可以使用索引來避免排序及不需要使用filesort，其次當使用filesort的時候可能在記憶體中出現兩種情況及堆排序列和快速排序兩種方式。所以MySQL記憶體排序可以使用的途徑包含：

直接利用索引避免排序
快速排序
堆排序

具體使用哪一種排序方式是優化器決定的，總的說來如下：

直接利用索引避免排序：用於有索引且回表效率高的情況下
快速排序演算法：如果沒有索引大量排序的情況下
堆排序演算法：如果沒有索引排序量不大的情況下

而快速排序和堆排序是不穩定的排序演算法，也就是對於重複值是不能保證順序的。而直接利用索引的話其返回資料是穩定的，因為索引的B+樹葉子結點的順序是唯一且一定的。如上的key nu，其葉子結點包含了nu+id，它是唯一的順序也是遞增的。因此在這種不穩定的演算法情況下上面的查詢出現了不一樣的結果，歸根結底就是使用索引避免排序和堆排序對於重複值的處理上是不同的。
也許你會問為什麼存在兩種排序方式，實際上在大量排序的情況下快速排序是有優勢的，而堆排序使用優先佇列只完成少量的排序是有優勢的，因為它根本不需要排序完成只排序你需要的資料量就可以了。

MySQL認為快速排序的速度是堆排序的3倍如下：

/*
    How much Priority Queue sort is slower than qsort.
    Measurements (see unit test) indicate that PQ is roughly 3 times slower.
  */
  const double PQ_slowness= 3.0;

那麼在使用排序演算法的時候會根據待排序資料量的大小進行切換具體根據函式check_if_pq_applicable進行判定的。在filesort函式裡面有如下程式碼：

if (check_if_pq_applicable(trace, &param, &table_sort,
                             table, num_rows, memory_available,
                             subselect != NULL))
  {
    DBUG_PRINT("info", ("filesort PQ is applicable")); //使用堆排序
    /*
      For PQ queries (with limit) we know exactly how many pointers/records
      we have in the buffer, so to simplify things, we initialize
      all pointers here. (We cannot pack fields anyways, so there is no
      point in doing lazy initialization).
     */
    table_sort.init_record_pointers();
    .....
   filesort->using_pq= true;
    param.using_pq= true;
  }
  else//使用快速排序
  {
    DBUG_PRINT("info", ("filesort PQ is not applicable"));
    filesort->using_pq= false;
    param.using_pq= false;

 .....
  }

三、如何確定是使用的哪種方法返回排序的結果

對於直接利用索引避免排序，這個直接看執行計劃就好了，不會出現filesort字樣。但是對於到底使用額快速排序還是堆排序則不好判斷因為執行計劃是一樣的，我想到的只能是在除錯的時候做斷點，不知道還有其他辦法沒有。因此我在上面程式碼的if分支裡面做了2個斷點，其中一個斷點在堆排序演算法裡面，一個斷點在快速排序演算法裡面如下：

3       breakpoint     keep y   0x0000000000f50e62 in filesort(THD*, Filesort*, bool, ha_rows*, ha_rows*, ha_rows*)
                                               at /root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:359
        breakpoint already hit 3 times
4       breakpoint     keep y   0x0000000000f50d41 in filesort(THD*, Filesort*, bool, ha_rows*, ha_rows*, ha_rows*)
                                               at /root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:333
        breakpoint already hit 1 time

其中斷點3代表快速排序命中，斷點4代表堆排序命中。

四、額外的測試

如上所述我們可以將結果的返回定義為三種方式，我們將在這裡測試這三種方式的得到資料不同。

使用索引避免排序的結果

語句：
select * from testse force index(nu) order by nu;

mysql> desc select *  from testse force index(nu) order by  nu;
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table  | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | testse | NULL       | index | NULL          | nu   | 5       | NULL |   19 |   100.00 | NULL  |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql> select *  from testse force index(nu) order by  nu;
+-----+------+---------+
| id  | nu   | name    |
+-----+------+---------+
|   1 |    1 | gaopeng |
|   9 |    9 | gaopeng |
|   8 |   13 | gaopeng |
|  -1 |   14 | gaopeng |
|   0 |   14 | gaopeng |
|   6 |   14 | gaopeng |
|   7 |   14 | gaopeng |
|  12 |   14 | gaopeng |
| 100 |   14 | gaopeng |
| 111 |   14 | gaopeng |
|  13 |   15 | gaopeng |
|  14 |   16 | gaopeng |
|  10 |   19 | gaopeng |
|   4 |   20 | gaopeng |
|  11 |   20 | gaopeng |
|  15 |   20 | gaopeng |
|   5 |   24 | gaopeng |
|   3 |   55 | gaopeng |
|   2 |   99 | gaopeng |
+-----+------+---------+
19 rows in set (0.01 sec)

使用快速排序的結果

語句：
select * from testse order by nu;

因為我前面設定了斷點，其斷點命中如下：

Breakpoint 3, filesort (thd=0x7fff300128c0, filesort=0x7fff30963e90, sort_positions=false, examined_rows=0x7ffff01158a0, found_rows=0x7ffff0115898, 
    returned_rows=0x7ffff0115890) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:359
359         DBUG_PRINT("info", ("filesort PQ is not applicable"));

可以看到PQ 沒有開啟，也就是堆排序沒有使用使用的優先佇列堆排序方式。那麼它的結果是：

mysql> desc select *  from testse order by  nu;
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------+
|  1 | SIMPLE      | testse | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   19 |   100.00 | Using filesort |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

mysql> select *  from testse order by  nu;
+-----+------+---------+
| id  | nu   | name    |
+-----+------+---------+
|   1 |    1 | gaopeng |
|   9 |    9 | gaopeng |
|   8 |   13 | gaopeng |
| 111 |   14 | gaopeng |
| 100 |   14 | gaopeng |
|  12 |   14 | gaopeng |
|   7 |   14 | gaopeng |
|   6 |   14 | gaopeng |
|   0 |   14 | gaopeng |
|  -1 |   14 | gaopeng |
|  13 |   15 | gaopeng |
|  14 |   16 | gaopeng |
|  10 |   19 | gaopeng |
|   4 |   20 | gaopeng |
|  11 |   20 | gaopeng |
|  15 |   20 | gaopeng |
|   5 |   24 | gaopeng |
|   3 |   55 | gaopeng |
|   2 |   99 | gaopeng |
+-----+------+---------+
19 rows in set (1.74 sec)

使用堆排序

語句：
select * from testse order by nu limit 8;

其斷點命中

Breakpoint 4, filesort (thd=0x7fff300128c0, filesort=0x7fff3095ecc8, sort_positions=false, examined_rows=0x7ffff01158a0, found_rows=0x7ffff0115898, 
    returned_rows=0x7ffff0115890) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:333
333         DBUG_PRINT("info", ("filesort PQ is applicable"));

可以看到PQ 開啟了，也就是堆排序。

mysql> desc  select *  from testse order by  nu limit 8;
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------+
|  1 | SIMPLE      | testse | NULL       | ALL  | NULL          | NULL | NULL    | NULL |   19 |   100.00 | Using filesort |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

mysql> select *  from testse order by  nu limit 8;
+-----+------+---------+
| id  | nu   | name    |
+-----+------+---------+
|   1 |    1 | gaopeng |
|   9 |    9 | gaopeng |
|   8 |   13 | gaopeng |
|  -1 |   14 | gaopeng |
|   0 |   14 | gaopeng |
|  12 |   14 | gaopeng |
| 111 |   14 | gaopeng |
|   6 |   14 | gaopeng |
+-----+------+---------+
8 rows in set (2.20 sec)

可以看到從前面8行來看，每種方法返回的資料是不一樣的：

使用索引避免排序：
+-----+------+---------+
| id  | nu   | name    |
+-----+------+---------+
|   1 |    1 | gaopeng |
|   9 |    9 | gaopeng |
|   8 |   13 | gaopeng |
|  -1 |   14 | gaopeng |
|   0 |   14 | gaopeng |
|   6 |   14 | gaopeng |
|   7 |   14 | gaopeng |
|  12 |   14 | gaopeng |


使用快速排序：

+-----+------+---------+
| id  | nu   | name    |
+-----+------+---------+
|   1 |    1 | gaopeng |
|   9 |    9 | gaopeng |
|   8 |   13 | gaopeng |
| 111 |   14 | gaopeng |
| 100 |   14 | gaopeng |
|  12 |   14 | gaopeng |
|   7 |   14 | gaopeng |
|   6 |   14 | gaopeng |

使用堆排序：
+-----+------+---------+
| id  | nu   | name    |
+-----+------+---------+
|   1 |    1 | gaopeng |
|   9 |    9 | gaopeng |
|   8 |   13 | gaopeng |
|  -1 |   14 | gaopeng |
|   0 |   14 | gaopeng |
|  12 |   14 | gaopeng |
| 111 |   14 | gaopeng |
|   6 |   14 | gaopeng |
+-----+------+---------+

五、總結

可以看到在不同的獲取資料的方式得到的資料是不一樣的，但是這只是重複資料排序的部分，這是排序演算法的不穩定性決定的。快速排序適合大資料量排序，堆排序在少量排序上有優勢，因此當order by limit n，n到了一個數量級的時候會切換排序演算法，這個在執行計劃是看不到的，具體使用那種演算法是優化器通過函式check_if_pq_applicable進行判定的。
其次這只是一種造成排序資料不一致的情況還有一種情況是由於引數 max_length_for_sort_data的引數影響的一次排序和二次排序，這個有機會在研究一下程式碼。一般預設1024個位元組還是很少會超過的。

六、堆排序演算法簡單說明

最後我還是想簡單描述一下堆排序演算法，也當複習一下。具體可以參考演算法導論等書籍，我們以大頂堆為例，實際上任何一個待排序的陣列都可以看成一棵完全二叉樹，用演算法導論的截圖如下：

640?wx_fmt=png

這棵樹是滿足大頂堆定義的，在大頂堆中有如下特性：

必須滿足完全二叉樹
關於完全二叉樹參考
http://blog.itpub.net/7728585/viewspace-2125889/
很方便的根據父節點的位置計算出兩個葉子結點的位置
如果父節點的位置為i/2，左子節點為 i，右子節點為i+1這是完全二叉樹的特性決定
所有子節點都可以看做一個子堆那麼所有結點都有
父節點>左子節點 && 父節點>右節點
很明顯的可以找到最大的元素，就是整個堆的根結點

在這個演算法中，最重要也是最主要的就是堆的維護，堆的構建。

維護：
維護：採用的是自上而下的維護，可以用遞迴完成。
這裡電子版有點不清晰，黑色結點就是值4

640?wx_fmt=other

構建
構建：是採用自下而上的構建，構建就是不斷迴圈的對各個父結點做維護，以達到對任何一個無序陣列滿足大頂堆的條件。因為下層的子樹滿足了大頂堆條件那麼上層就一定滿足大頂堆的條件。

640?wx_fmt=other

排序

實際上排序就是將陣列中的第一個數字也就是最大的數字和最後一個數字交換，然後再次做一次維護做好大頂堆結構即可，如果反覆不斷做這個操作那麼整個陣列都會排序完成。

640?wx_fmt=other

640?wx_fmt=png

對於MySQL的原始碼中堆排序的程式碼存在於priority_queue.h檔案中，其中可以看到一些方法如：

維護 heapify函式
構建 build_heap函式
排序 sort函式
當然原始碼的實現複雜得多，有興趣的朋友可以深入。棧幀備查：

#0  Priority_queue<uchar*, std::vector<uchar*, Malloc_allocator<uchar*> >, <unnamed>::Mem_compare>::heapify(size_t, size_t) (this=0x7ffff0115650, i=0, last=8)
    at /root/mysql5.7.14/percona-server-5.7.14-7/include/priority_queue.h:124
#1  0x0000000000f5807a in Priority_queue<uchar*, std::vector<uchar*, Malloc_allocator<uchar*> >, <unnamed>::Mem_compare>::heapify(size_t) (this=0x7ffff0115650, i=0)
    at /root/mysql5.7.14/percona-server-5.7.14-7/include/priority_queue.h:147
#2  0x0000000000f57d1e in Priority_queue<uchar*, std::vector<uchar*, Malloc_allocator<uchar*> >, <unnamed>::Mem_compare>::update_top(void) (this=0x7ffff0115650)
    at /root/mysql5.7.14/percona-server-5.7.14-7/include/priority_queue.h:354
#3  0x0000000000f57814 in Bounded_queue<uchar*, uchar*, Sort_param, <unnamed>::Mem_compare>::push(uchar *) (this=0x7ffff0115650, element=0x7fff309166a0 "o")
    at /root/mysql5.7.14/percona-server-5.7.14-7/sql/bounded_queue.h:106
#4  0x0000000000f52da7 in find_all_keys(Sort_param *, QEP_TAB *, Filesort_info *, IO_CACHE *, IO_CACHE *, Bounded_queue<uchar*, uchar*, Sort_param, <unnamed>::Mem_compare> *, ha_rows *) (param=0x7ffff01154c0, qep_tab=0x7fff309268e8, fs_info=0x7ffff0115550, chunk_file=0x7ffff0115200, tempfile=0x7ffff0115360, pq=0x7ffff0115650, 
    found_rows=0x7ffff0115898) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:1013
#5  0x0000000000f51165 in filesort (thd=0x7fff30000bc0, filesort=0x7fff30926bd8, sort_positions=false, examined_rows=0x7ffff01158a0, found_rows=0x7ffff0115898, 
    returned_rows=0x7ffff0115890) at /root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:425

附：如果需要演算法實現可參見原文:
https://www.jianshu.com/p/8c2154872f83

對本文有任何疑問可掃碼新增原文作者微信

640?wx_fmt=jpeg