1. 深度分頁
1.1. 定義
早些年在企業表格類的頁面中,我們總能見到分頁的那種表格設計。當分頁的頁數太多,還設計了“跳轉到x頁”的功能。
我們這裡講的“深度分頁”,就是指當分頁太多(假設有幾十萬頁),當訪問第幾萬頁時,必然會查詢很慢。因為針對通常對於分頁的搜尋,都是將幾萬頁的資料查出來,然後取差集,只返回當前頁的資料。只為了返回一頁的資料,卻查詢的很深入,帶來的效能代價很大。
1.2. 需求上杜絕
1. 少分頁
首先,我們先想想這樣的需求設計是否合理,誰會在一個表格中跳到第幾萬頁檢視資料。我們如果想要找到想要的內容,應該是透過搜尋,而非每頁都點進去去看。
所以看看百度、谷歌搜尋引擎的設計,都是最多隻展示10頁。
2. “翻頁”替代“跳頁”
那如果有些場景,我就需要檢視所有頁面的內容呢?例如:貼吧的評論等。ok,就算有這種情況,也不應該存在“跳頁”的需求,可以設計成一頁一頁的“翻頁”。
例如我們身邊很多app的設計:下拉重新整理更多。每下拉一次,就多展示一頁資料。
總結來看,實在想不到需要深度“跳頁”的需求,那就應該在需求上杜絕。如果為了一個沒必要的需求,需要對研發、伺服器投入大量的資源消耗,是很浪費的。
1.3. 技術實現
下面是從 mysql 和 elasticsearch 的維度講講深度分頁的實現。這裡先總結一下:
- mysql:深度分頁,包括跳頁也能做,可以透過減少回表提高效能
- elasticsearch:天然不建議跳頁,但提供了翻頁的最佳化方案
2. mysql深度分頁最佳化
2.1. 減少回表
最佳化前
表格查詢,基本上查詢的欄位都要求很多。按照通常分頁查詢的場景,假設正常分頁查詢的sql如下:
select * from table_name where userId = 'xxx' limit 10000, 10
就算我們基於 userId 建立普通索引,因為 select * ,我們依然要先基於普通索引查詢 10010 條資料,然後回表查詢 10010 次,但最後再將前 10000條資料丟掉,只取10條。
最佳化後
最佳化的sql是:
select * from table_name where id in (select id from table_name where userId = 'xxx' limit 10000, 10)
這裡建立子查詢,子查詢中只查詢了主鍵,可以走普通索引直接查到。而最終的回表查詢,只是查 10 條資料。所以,避免了 10000 次的回表,而且避免了 10000 次 select *
帶來的 io 浪費。
這裡的最佳化,適用於翻頁、跳頁的場景。那麼下面針對於翻頁,還可以進一步最佳化。
2.2. 翻頁最佳化
如果是翻頁,那每一次查詢,都能拿到上一次查詢的最後一條資料。這裡有個前提,就看我們分頁搜尋是根據什麼欄位排序的。這裡先假設是根據id排序的,id是順序的(自增長或雪花ID),假設上一次查詢的id為 15000,那麼sql可以是:
select * from table_name where userId = 'xxx' and id > 15000 limit 0, 10
3. es深度分頁
3.1. from/size
這個是我們最常用的 es 分頁語法了。但是它天然不支援深度分頁。
要求:from + size < max_result_window ,否則es查詢就會報錯。而es max_result_window預設值為 10000。為什麼呢?
假設 es 索引建立了8個分片,假設我們查詢 from=1000,size=10
,第101頁的10條資料,實際需要查詢多少資料呢?
因為 es 不知道這第 101頁的資料在哪個分片,所以協調節點發請求命令到每個分片對應的資料節點上,從每個分片都獲取了 1010 條資料。
此時協調節點中一共獲取到了 1010 * 8 = 8080 條資料,再基於協調節點的記憶體做排序,拿到符合 from=1000,size=10
的10條資料,最終將 8070 條資料丟棄。
這裡就分頁到了101頁,如果分頁上萬呢,如果es的索引分片更多呢?就為了獲取這10條資料,對es的記憶體等效能損害太大了,所以才有了 max_result_window 的限制。
3.2. scroll遍歷
ES官方不再推薦使用Scroll API 進行深度分頁。 如果您需要在分頁超過 10,000 個點選時保留索引狀態,請使用帶有時間點 (PIT) 的 search_after 引數。所以這裡就不多說了。
Scroll API 原理上是對某次查詢生成一個遊標 scroll_id , 後續的查詢只需要根據這個遊標去取資料,直到結果集中返回的 hits 欄位為空,就表示遍歷結束。scroll_id 的生成可以理解為建立了一個臨時的歷史快照,在此之後的增刪改查等操作不會影響到這個快照的結果。
所有文件獲取完畢之後,需要手動清理掉 scroll_id 。雖然es 會有自動清理機制,但是 srcoll_id 的存在會耗費大量的資源來儲存一份當前查詢結果集映像,並且會佔用檔案描述符。所以用完之後要及時清理。使用 es 提供的 CLEAR_API 來刪除指定的 scroll_id。
3.3. search after
當我們在es搜尋時用到 sort
排序,預設返回資料中每個物件都會帶上 sort
值,如:
1. 全部資料
請求:
GET operation_log/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
]
}
返回:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0nPH_4MBMgWicIn2Nxwj",
"_score" : null,
"_source" : {
"ip" : "22.1.11.0",
"trace_id" : "780821e89b2dc653",
"operation_time" : "2022-10-02 12:31:10",
"module" : "資源中心",
"action_code" : "DELETE",
"location" : "資源中心->檔案管理",
"object_id" : "fffff-1",
"object_name" : "《2022員工績效打分細則》",
"operator_id" : "operator_id-3",
"operator_name" : "王五",
"operator_dept_id" : "operator_dept_id-2",
"operator_dept_name" : "人力資源部",
"changes" : [ ]
},
"sort" : [
1664713870000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1nPI_4MBMgWicIn2-hzS",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28768",
"operation_time" : "2022-10-02 09:32:00",
"module" : "資源中心",
"action_code" : "DELETE",
"location" : "資源中心->檔案管理",
"object_id" : "fffff-b",
"object_name" : "《有NULL的文件》",
"operator_id" : "operator_id-b",
"operator_name" : "路人B",
"changes" : [ ]
},
"sort" : [
1664703120000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1XPI_4MBMgWicIn21xwt",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28ei6",
"operation_time" : "2022-10-02 09:31:00",
"module" : "資源中心",
"action_code" : "DELETE",
"location" : "資源中心->檔案管理",
"object_id" : "fffff-a",
"object_name" : "《有空字串的文件》",
"operator_id" : "operator_id-a",
"operator_dept_id" : "",
"operator_dept_name" : "",
"operator_name" : "路人A",
"changes" : [ ]
},
"sort" : [
1664703060000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0XPG_4MBMgWicIn2_Bxn",
"_score" : null,
"_source" : {
"trace_id" : "990821e89a2dc653",
"operator_id" : "operator_id-2",
"ip" : "22.1.11.0",
"module" : "資源中心",
"action_code" : "UPDATE",
"changes" : [
{
"old_value" : "僅李四可檢視",
"new_value" : "全員可檢視",
"field_name" : "檢視許可權"
},
{
"old_value" : "僅李四可檢視",
"new_value" : "人力資源部可檢視",
"field_name" : "編輯許可權"
}
],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "fffff-1",
"operator_dept_name" : "人力資源部",
"operator_name" : "李四",
"operation_time" : "2022-09-05 11:31:10",
"object_name" : "《2022員工績效打分細則》",
"location" : "資源中心->檔案管理->檔案許可權"
},
"sort" : [
1662377470000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1HPI_4MBMgWicIn2rhyD",
"_score" : null,
"_source" : {
"trace_id" : "670021e89a2dc655",
"operator_id" : "operator_id-2",
"ip" : "10.1.11.5",
"module" : "企業組織",
"action_code" : "DELETE",
"changes" : [ ],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "xxxxx-1",
"operator_dept_name" : "人力資源部",
"operator_name" : "李四",
"operation_time" : "2022-05-05 10:35:12",
"object_name" : "成德善",
"location" : "企業組織->員工管理->身份管理"
},
"sort" : [
1651746912000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "03PI_4MBMgWicIn2chxb",
"_score" : null,
"_source" : {
"ip" : "10.1.11.1",
"trace_id" : "670021e89a2dc7b6",
"operation_time" : "2022-05-03 09:35:10",
"module" : "企業組織",
"action_code" : "ADD",
"location" : "企業組織->員工管理->身份管理",
"object_id" : "xxxxx-2",
"object_name" : "成寶拉",
"operator_id" : "operator_id-1",
"operator_name" : "張三",
"operator_dept_id" : "operator_dept_id-1",
"operator_dept_name" : "研發中心-後端一部",
"changes" : [
{
"field_name" : "姓名",
"new_value" : "成寶拉"
},
{
"field_name" : "性別",
"new_value" : "女"
},
{
"field_name" : "手機號碼",
"new_value" : "13055770002"
},
{
"field_name" : "郵箱",
"new_value" : "baola@qq.com"
}
]
},
"sort" : [
1651570510000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0HPG_4MBMgWicIn2yxxE",
"_score" : null,
"_source" : {
"ip" : "10.1.11.1",
"trace_id" : "670021ff9a2dc6b7",
"operation_time" : "2022-05-02 09:31:18",
"module" : "企業組織",
"action_code" : "UPDATE",
"location" : "企業組織->員工管理->身份管理",
"object_id" : "xxxxx-1",
"object_name" : "成德善",
"operator_id" : "operator_id-1",
"operator_name" : "張三",
"operator_dept_id" : "operator_dept_id-1",
"operator_dept_name" : "研發中心-後端一部",
"changes" : [
{
"field_name" : "手機號碼",
"old_value" : "13055660000",
"new_value" : "13055770001"
},
{
"field_name" : "姓名",
"old_value" : "成德善",
"new_value" : "成秀妍"
}
]
},
"sort" : [
1651483878000
]
}
]
}
}
可以查到一共7條資料,然後每條資料的 sort
。
2. 查 0,3 條資料
請求:
GET operation_log/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0nPH_4MBMgWicIn2Nxwj",
"_score" : null,
"_source" : {
"ip" : "22.1.11.0",
"trace_id" : "780821e89b2dc653",
"operation_time" : "2022-10-02 12:31:10",
"module" : "資源中心",
"action_code" : "DELETE",
"location" : "資源中心->檔案管理",
"object_id" : "fffff-1",
"object_name" : "《2022員工績效打分細則》",
"operator_id" : "operator_id-3",
"operator_name" : "王五",
"operator_dept_id" : "operator_dept_id-2",
"operator_dept_name" : "人力資源部",
"changes" : [ ]
},
"sort" : [
1664713870000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1nPI_4MBMgWicIn2-hzS",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28768",
"operation_time" : "2022-10-02 09:32:00",
"module" : "資源中心",
"action_code" : "DELETE",
"location" : "資源中心->檔案管理",
"object_id" : "fffff-b",
"object_name" : "《有NULL的文件》",
"operator_id" : "operator_id-b",
"operator_name" : "路人B",
"changes" : [ ]
},
"sort" : [
1664703120000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1XPI_4MBMgWicIn21xwt",
"_score" : null,
"_source" : {
"ip" : "10.0.0.0",
"trace_id" : "670021ff9a28ei6",
"operation_time" : "2022-10-02 09:31:00",
"module" : "資源中心",
"action_code" : "DELETE",
"location" : "資源中心->檔案管理",
"object_id" : "fffff-a",
"object_name" : "《有空字串的文件》",
"operator_id" : "operator_id-a",
"operator_dept_id" : "",
"operator_dept_name" : "",
"operator_name" : "路人A",
"changes" : [ ]
},
"sort" : [
1664703060000
]
}
]
}
}
拿到最後一條(第3條)資料的 sort
值 [1664703060000]
,查詢後續的3條。
3. 查 3,6 條資料
請求:
GET operation_log/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3,
"search_after": [
1664703060000
]
}
返回:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 7,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "0XPG_4MBMgWicIn2_Bxn",
"_score" : null,
"_source" : {
"trace_id" : "990821e89a2dc653",
"operator_id" : "operator_id-2",
"ip" : "22.1.11.0",
"module" : "資源中心",
"action_code" : "UPDATE",
"changes" : [
{
"old_value" : "僅李四可檢視",
"new_value" : "全員可檢視",
"field_name" : "檢視許可權"
},
{
"old_value" : "僅李四可檢視",
"new_value" : "人力資源部可檢視",
"field_name" : "編輯許可權"
}
],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "fffff-1",
"operator_dept_name" : "人力資源部",
"operator_name" : "李四",
"operation_time" : "2022-09-05 11:31:10",
"object_name" : "《2022員工績效打分細則》",
"location" : "資源中心->檔案管理->檔案許可權"
},
"sort" : [
1662377470000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "1HPI_4MBMgWicIn2rhyD",
"_score" : null,
"_source" : {
"trace_id" : "670021e89a2dc655",
"operator_id" : "operator_id-2",
"ip" : "10.1.11.5",
"module" : "企業組織",
"action_code" : "DELETE",
"changes" : [ ],
"operator_dept_id" : "operator_dept_id-2",
"object_id" : "xxxxx-1",
"operator_dept_name" : "人力資源部",
"operator_name" : "李四",
"operation_time" : "2022-05-05 10:35:12",
"object_name" : "成德善",
"location" : "企業組織->員工管理->身份管理"
},
"sort" : [
1651746912000
]
},
{
"_index" : "operation_log",
"_type" : "_doc",
"_id" : "03PI_4MBMgWicIn2chxb",
"_score" : null,
"_source" : {
"ip" : "10.1.11.1",
"trace_id" : "670021e89a2dc7b6",
"operation_time" : "2022-05-03 09:35:10",
"module" : "企業組織",
"action_code" : "ADD",
"location" : "企業組織->員工管理->身份管理",
"object_id" : "xxxxx-2",
"object_name" : "成寶拉",
"operator_id" : "operator_id-1",
"operator_name" : "張三",
"operator_dept_id" : "operator_dept_id-1",
"operator_dept_name" : "研發中心-後端一部",
"changes" : [
{
"field_name" : "姓名",
"new_value" : "成寶拉"
},
{
"field_name" : "性別",
"new_value" : "女"
},
{
"field_name" : "手機號碼",
"new_value" : "13055770002"
},
{
"field_name" : "郵箱",
"new_value" : "baola@qq.com"
}
]
},
"sort" : [
1651570510000
]
}
]
}
}
3.4. pit(search after)
使用 search_after 需要具有相同查詢和排序值的多個搜尋請求。 如果在這些請求之間發生重新整理,結果的順序可能會發生變化,從而導致跨頁面的結果不一致。 為防止出現這種情況,您可以建立一個時間點 (PIT) 以保留搜尋中的當前索引狀態。
1. 建立pit
請求
POST operation_log/_pit?keep_alive=10m
返回:
{
"id" : "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA="
}
2. 帶上pit去查詢 0,3=
GET _search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3,
"pit": {
"id": "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA=",
"keep_alive": "10m"
}
}
這裡要注意的是 GET _search
,並沒有指定索引。因為從設計上來看,透過 pit id,就已經能找到當初建立pit的索引了。
2. 帶上pit去查詢 3,6=
GET _search
{
"query": {
"match_all": {}
},
"sort": [
{
"operation_time": {
"order": "desc"
}
}
],
"from": 0,
"size": 3,
"search_after": [
1664703060000
],
"pit": {
"id": "i9W1AwENb3BlcmF0aW9uX2xvZxZYRllDdWw4d1NFbTZDaXJhYXU1VGhnABY1SVZXdWt3Y1JEaUNndzMwNHFZNkhBAAAAAAAAACTZFmtMZ2xzTTVZVGI2V1JxaDV6dFdXMVEAARZYRllDdWw4d1NFbTZDaXJhYXU1VGhnAAA=",
"keep_alive": "10m"
}
}
3.5. 效能對比
分別分頁獲取1 - 10,49000 - 49010,99000 - 99010範圍各10條資料(前提10w條),效能大致是這樣:
分頁方式 | 1~10 | 49000~49010 | 99000~99010 |
---|---|---|---|
from/size | 8ms | 30ms | 117ms |
scroll | 7ms | 66ms | 36ms |
search_after | 5ms | 8ms | 7ms |