MySQL、Elasticsearch 深度分頁

KerryWu發表於2023-04-16

1. 深度分頁

1.1. 定義

早些年在企業表格類的頁面中,我們總能見到分頁的那種表格設計。當分頁的頁數太多,還設計了“跳轉到x頁”的功能。

我們這裡講的“深度分頁”,就是指當分頁太多(假設有幾十萬頁),當訪問第幾萬頁時,必然會查詢很慢。因為針對通常對於分頁的搜尋,都是將幾萬頁的資料查出來,然後取差集,只返回當前頁的資料。只為了返回一頁的資料,卻查詢的很深入,帶來的效能代價很大。

1.2. 需求上杜絕

1. 少分頁

首先,我們先想想這樣的需求設計是否合理,誰會在一個表格中跳到第幾萬頁檢視資料。我們如果想要找到想要的內容,應該是透過搜尋,而非每頁都點進去去看。

所以看看百度、谷歌搜尋引擎的設計,都是最多隻展示10頁。

2. “翻頁”替代“跳頁”

那如果有些場景,我就需要檢視所有頁面的內容呢?例如:貼吧的評論等。ok,就算有這種情況,也不應該存在“跳頁”的需求,可以設計成一頁一頁的“翻頁”。

例如我們身邊很多app的設計:下拉重新整理更多。每下拉一次,就多展示一頁資料。

總結來看,實在想不到需要深度“跳頁”的需求,那就應該在需求上杜絕。如果為了一個沒必要的需求,需要對研發、伺服器投入大量的資源消耗,是很浪費的。

1.3. 技術實現

下面是從 mysql 和 elasticsearch 的維度講講深度分頁的實現。這裡先總結一下:

  • mysql:深度分頁,包括跳頁也能做,可以透過減少回表提高效能
  • elasticsearch:天然不建議跳頁,但提供了翻頁的最佳化方案

2. mysql深度分頁最佳化

2.1. 減少回表

最佳化前

表格查詢,基本上查詢的欄位都要求很多。按照通常分頁查詢的場景,假設正常分頁查詢的sql如下:

select * from table_name where userId = 'xxx' limit 10000, 10

就算我們基於 userId 建立普通索引,因為 select * ,我們依然要先基於普通索引查詢 10010 條資料,然後回表查詢 10010 次,但最後再將前 10000條資料丟掉,只取10條。

最佳化後

最佳化的sql是:

select * from table_name where id in (select id from table_name where userId = 'xxx'  limit 10000, 10)

這裡建立子查詢,子查詢中只查詢了主鍵,可以走普通索引直接查到。而最終的回表查詢,只是查 10 條資料。所以,避免了 10000 次的回表,而且避免了 10000 次 select * 帶來的 io 浪費。

這裡的最佳化,適用於翻頁、跳頁的場景。那麼下面針對於翻頁,還可以進一步最佳化。

2.2. 翻頁最佳化

如果是翻頁,那每一次查詢,都能拿到上一次查詢的最後一條資料。這裡有個前提,就看我們分頁搜尋是根據什麼欄位排序的。這裡先假設是根據id排序的,id是順序的(自增長或雪花ID),假設上一次查詢的id為 15000,那麼sql可以是:

select * from table_name where userId = 'xxx' and id > 15000  limit 0, 10

3. es深度分頁

3.1. from/size

這個是我們最常用的 es 分頁語法了。但是它天然不支援深度分頁。

要求:from + size < max_result_window ,否則es查詢就會報錯。而es max_result_window預設值為 10000。為什麼呢?

假設 es 索引建立了8個分片,假設我們查詢 from=1000,size=10,第101頁的10條資料,實際需要查詢多少資料呢?

因為 es 不知道這第 101頁的資料在哪個分片,所以協調節點發請求命令到每個分片對應的資料節點上,從每個分片都獲取了 1010 條資料。

此時協調節點中一共獲取到了 1010 * 8 = 8080 條資料,再基於協調節點的記憶體做排序,拿到符合 from=1000,size=10 的10條資料,最終將 8070 條資料丟棄。

這裡就分頁到了101頁,如果分頁上萬呢,如果es的索引分片更多呢?就為了獲取這10條資料,對es的記憶體等效能損害太大了,所以才有了 max_result_window 的限制。

3.2. scroll遍歷

ES官方不再推薦使用Scroll API 進行深度分頁。 如果您需要在分頁超過 10,000 個點選時保留索引狀態,請使用帶有時間點 (PIT) 的 search_after 引數。所以這裡就不多說了。

Scroll API 原理上是對某次查詢生成一個遊標 scroll_id , 後續的查詢只需要根據這個遊標去取資料,直到結果集中返回的 hits 欄位為空,就表示遍歷結束。scroll_id 的生成可以理解為建立了一個臨時的歷史快照,在此之後的增刪改查等操作不會影響到這個快照的結果。

所有文件獲取完畢之後,需要手動清理掉 scroll_id 。雖然es 會有自動清理機制,但是 srcoll_id 的存在會耗費大量的資源來儲存一份當前查詢結果集映像,並且會佔用檔案描述符。所以用完之後要及時清理。使用 es 提供的 CLEAR_API 來刪除指定的 scroll_id。

3.3. search after

當我們在es搜尋時用到 sort 排序,預設返回資料中每個物件都會帶上 sort值,如:

1. 全部資料

請求:

GET operation_log/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "operation_time": {
        "order": "desc"
      }
    }
  ]
}

返回:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "0nPH_4MBMgWicIn2Nxwj",
        "_score" : null,
        "_source" : {
          "ip" : "22.1.11.0",
          "trace_id" : "780821e89b2dc653",
          "operation_time" : "2022-10-02 12:31:10",
          "module" : "資源中心",
          "action_code" : "DELETE",
          "location" : "資源中心->檔案管理",
          "object_id" : "fffff-1",
          "object_name" : "《2022員工績效打分細則》",
          "operator_id" : "operator_id-3",
          "operator_name" : "王五",
          "operator_dept_id" : "operator_dept_id-2",
          "operator_dept_name" : "人力資源部",
          "changes" : [ ]
        },
        "sort" : [
          1664713870000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "1nPI_4MBMgWicIn2-hzS",
        "_score" : null,
        "_source" : {
          "ip" : "10.0.0.0",
          "trace_id" : "670021ff9a28768",
          "operation_time" : "2022-10-02 09:32:00",
          "module" : "資源中心",
          "action_code" : "DELETE",
          "location" : "資源中心->檔案管理",
          "object_id" : "fffff-b",
          "object_name" : "《有NULL的文件》",
          "operator_id" : "operator_id-b",
          "operator_name" : "路人B",
          "changes" : [ ]
        },
        "sort" : [
          1664703120000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "1XPI_4MBMgWicIn21xwt",
        "_score" : null,
        "_source" : {
          "ip" : "10.0.0.0",
          "trace_id" : "670021ff9a28ei6",
          "operation_time" : "2022-10-02 09:31:00",
          "module" : "資源中心",
          "action_code" : "DELETE",
          "location" : "資源中心->檔案管理",
          "object_id" : "fffff-a",
          "object_name" : "《有空字串的文件》",
          "operator_id" : "operator_id-a",
          "operator_dept_id" : "",
          "operator_dept_name" : "",
          "operator_name" : "路人A",
          "changes" : [ ]
        },
        "sort" : [
          1664703060000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "0XPG_4MBMgWicIn2_Bxn",
        "_score" : null,
        "_source" : {
          "trace_id" : "990821e89a2dc653",
          "operator_id" : "operator_id-2",
          "ip" : "22.1.11.0",
          "module" : "資源中心",
          "action_code" : "UPDATE",
          "changes" : [
            {
              "old_value" : "僅李四可檢視",
              "new_value" : "全員可檢視",
              "field_name" : "檢視許可權"
            },
            {
              "old_value" : "僅李四可檢視",
              "new_value" : "人力資源部可檢視",
              "field_name" : "編輯許可權"
            }
          ],
          "operator_dept_id" : "operator_dept_id-2",
          "object_id" : "fffff-1",
          "operator_dept_name" : "人力資源部",
          "operator_name" : "李四",
          "operation_time" : "2022-09-05 11:31:10",
          "object_name" : "《2022員工績效打分細則》",
          "location" : "資源中心->檔案管理->檔案許可權"
        },
        "sort" : [
          1662377470000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "1HPI_4MBMgWicIn2rhyD",
        "_score" : null,
        "_source" : {
          "trace_id" : "670021e89a2dc655",
          "operator_id" : "operator_id-2",
          "ip" : "10.1.11.5",
          "module" : "企業組織",
          "action_code" : "DELETE",
          "changes" : [ ],
          "operator_dept_id" : "operator_dept_id-2",
          "object_id" : "xxxxx-1",
          "operator_dept_name" : "人力資源部",
          "operator_name" : "李四",
          "operation_time" : "2022-05-05 10:35:12",
          "object_name" : "成德善",
          "location" : "企業組織->員工管理->身份管理"
        },
        "sort" : [
          1651746912000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "03PI_4MBMgWicIn2chxb",
        "_score" : null,
        "_source" : {
          "ip" : "10.1.11.1",
          "trace_id" : "670021e89a2dc7b6",
          "operation_time" : "2022-05-03 09:35:10",
          "module" : "企業組織",
          "action_code" : "ADD",
          "location" : "企業組織->員工管理->身份管理",
          "object_id" : "xxxxx-2",
          "object_name" : "成寶拉",
          "operator_id" : "operator_id-1",
          "operator_name" : "張三",
          "operator_dept_id" : "operator_dept_id-1",
          "operator_dept_name" : "研發中心-後端一部",
          "changes" : [
            {
              "field_name" : "姓名",
              "new_value" : "成寶拉"
            },
            {
              "field_name" : "性別",
              "new_value" : "女"
            },
            {
              "field_name" : "手機號碼",
              "new_value" : "13055770002"
            },
            {
              "field_name" : "郵箱",
              "new_value" : "baola@qq.com"
            }
          ]
        },
        "sort" : [
          1651570510000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "0HPG_4MBMgWicIn2yxxE",
        "_score" : null,
        "_source" : {
          "ip" : "10.1.11.1",
          "trace_id" : "670021ff9a2dc6b7",
          "operation_time" : "2022-05-02 09:31:18",
          "module" : "企業組織",
          "action_code" : "UPDATE",
          "location" : "企業組織->員工管理->身份管理",
          "object_id" : "xxxxx-1",
          "object_name" : "成德善",
          "operator_id" : "operator_id-1",
          "operator_name" : "張三",
          "operator_dept_id" : "operator_dept_id-1",
          "operator_dept_name" : "研發中心-後端一部",
          "changes" : [
            {
              "field_name" : "手機號碼",
              "old_value" : "13055660000",
              "new_value" : "13055770001"
            },
            {
              "field_name" : "姓名",
              "old_value" : "成德善",
              "new_value" : "成秀妍"
            }
          ]
        },
        "sort" : [
          1651483878000
        ]
      }
    ]
  }
}

可以查到一共7條資料,然後每條資料的 sort

2. 查 0,3 條資料

請求:

GET operation_log/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "operation_time": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 3
}

返回:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "0nPH_4MBMgWicIn2Nxwj",
        "_score" : null,
        "_source" : {
          "ip" : "22.1.11.0",
          "trace_id" : "780821e89b2dc653",
          "operation_time" : "2022-10-02 12:31:10",
          "module" : "資源中心",
          "action_code" : "DELETE",
          "location" : "資源中心->檔案管理",
          "object_id" : "fffff-1",
          "object_name" : "《2022員工績效打分細則》",
          "operator_id" : "operator_id-3",
          "operator_name" : "王五",
          "operator_dept_id" : "operator_dept_id-2",
          "operator_dept_name" : "人力資源部",
          "changes" : [ ]
        },
        "sort" : [
          1664713870000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "1nPI_4MBMgWicIn2-hzS",
        "_score" : null,
        "_source" : {
          "ip" : "10.0.0.0",
          "trace_id" : "670021ff9a28768",
          "operation_time" : "2022-10-02 09:32:00",
          "module" : "資源中心",
          "action_code" : "DELETE",
          "location" : "資源中心->檔案管理",
          "object_id" : "fffff-b",
          "object_name" : "《有NULL的文件》",
          "operator_id" : "operator_id-b",
          "operator_name" : "路人B",
          "changes" : [ ]
        },
        "sort" : [
          1664703120000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "1XPI_4MBMgWicIn21xwt",
        "_score" : null,
        "_source" : {
          "ip" : "10.0.0.0",
          "trace_id" : "670021ff9a28ei6",
          "operation_time" : "2022-10-02 09:31:00",
          "module" : "資源中心",
          "action_code" : "DELETE",
          "location" : "資源中心->檔案管理",
          "object_id" : "fffff-a",
          "object_name" : "《有空字串的文件》",
          "operator_id" : "operator_id-a",
          "operator_dept_id" : "",
          "operator_dept_name" : "",
          "operator_name" : "路人A",
          "changes" : [ ]
        },
        "sort" : [
          1664703060000
        ]
      }
    ]
  }
}

拿到最後一條(第3條)資料的 sort[1664703060000],查詢後續的3條。

3. 查 3,6 條資料

請求:

GET operation_log/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "operation_time": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 3,
  "search_after": [
    1664703060000
  ]
}

返回:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 7,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "0XPG_4MBMgWicIn2_Bxn",
        "_score" : null,
        "_source" : {
          "trace_id" : "990821e89a2dc653",
          "operator_id" : "operator_id-2",
          "ip" : "22.1.11.0",
          "module" : "資源中心",
          "action_code" : "UPDATE",
          "changes" : [
            {
              "old_value" : "僅李四可檢視",
              "new_value" : "全員可檢視",
              "field_name" : "檢視許可權"
            },
            {
              "old_value" : "僅李四可檢視",
              "new_value" : "人力資源部可檢視",
              "field_name" : "編輯許可權"
            }
          ],
          "operator_dept_id" : "operator_dept_id-2",
          "object_id" : "fffff-1",
          "operator_dept_name" : "人力資源部",
          "operator_name" : "李四",
          "operation_time" : "2022-09-05 11:31:10",
          "object_name" : "《2022員工績效打分細則》",
          "location" : "資源中心->檔案管理->檔案許可權"
        },
        "sort" : [
          1662377470000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "1HPI_4MBMgWicIn2rhyD",
        "_score" : null,
        "_source" : {
          "trace_id" : "670021e89a2dc655",
          "operator_id" : "operator_id-2",
          "ip" : "10.1.11.5",
          "module" : "企業組織",
          "action_code" : "DELETE",
          "changes" : [ ],
          "operator_dept_id" : "operator_dept_id-2",
          "object_id" : "xxxxx-1",
          "operator_dept_name" : "人力資源部",
          "operator_name" : "李四",
          "operation_time" : "2022-05-05 10:35:12",
          "object_name" : "成德善",
          "location" : "企業組織->員工管理->身份管理"
        },
        "sort" : [
          1651746912000
        ]
      },
      {
        "_index" : "operation_log",
        "_type" : "_doc",
        "_id" : "03PI_4MBMgWicIn2chxb",
        "_score" : null,
        "_source" : {
          "ip" : "10.1.11.1",
          "trace_id" : "670021e89a2dc7b6",
          "operation_time" : "2022-05-03 09:35:10",
          "module" : "企業組織",
          "action_code" : "ADD",
          "location" : "企業組織->員工管理->身份管理",
          "object_id" : "xxxxx-2",
          "object_name" : "成寶拉",
          "operator_id" : "operator_id-1",
          "operator_name" : "張三",
          "operator_dept_id" : "operator_dept_id-1",
          "operator_dept_name" : "研發中心-後端一部",
          "changes" : [
            {
              "field_name" : "姓名",
              "new_value" : "成寶拉"
            },
            {
              "field_name" : "性別",
              "new_value" : "女"
            },
            {
              "field_name" : "手機號碼",
              "new_value" : "13055770002"
            },
            {
              "field_name" : "郵箱",
              "new_value" : "baola@qq.com"
            }
          ]
        },
        "sort" : [
          1651570510000
        ]
      }
    ]
  }
}

3.4. pit(search after)

使用 search_after 需要具有相同查詢和排序值的多個搜尋請求。 如果在這些請求之間發生重新整理,結果的順序可能會發生變化,從而導致跨頁面的結果不一致。 為防止出現這種情況,您可以建立一個時間點 (PIT) 以保留搜尋中的當前索引狀態。

1. 建立pit

請求

POST operation_log/_pit?keep_alive=10m

返回:

{
  "id" : "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA="
}
2. 帶上pit去查詢 0,3=
GET _search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "operation_time": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 3,
  "pit": {
    "id": "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA=",
    "keep_alive": "10m"
  }
}

這裡要注意的是 GET _search,並沒有指定索引。因為從設計上來看,透過 pit id,就已經能找到當初建立pit的索引了。

2. 帶上pit去查詢 3,6=
GET _search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "operation_time": {
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 3,
  "search_after": [
    1664703060000
  ],
  "pit": {
    "id": "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA=",
    "keep_alive": "10m"
  }
}

3.5. 效能對比

分別分頁獲取1 - 10,49000 - 49010,99000 - 99010範圍各10條資料(前提10w條),效能大致是這樣:

分頁方式1~1049000~4901099000~99010
from/size8ms30ms117ms
scroll7ms66ms36ms
search_after5ms8ms7ms

相關文章