elasticsearch查詢之三種fetch id的方案分析

無風聽海發表於2022-02-19

原文網址 : https://www.cnblogs.com/wufengtinghai/p/15911856.html

一、使用場景介紹

elasticsearch除了普通的全文檢索之外，在很多的業務場景中都有使用，各個業務模組根據自己業務特色設定查詢條件，通過elasticsearch執行並返回所有命中的記錄的id；如果命中的記錄數達到數萬級別的話，查詢效能會有明顯的下降，尤其是命中超大型的document的時候；

獲取記錄的id目前可以使用的有三種方式；

通過_source:["id"]

設定_source:false,通過es返回的後設資料_id分離出device的id；

使用store=true來單獨的儲存device id，查詢的時候使用stored_fields= ['id']；

二、store對映引數

預設情況下，欄位值會被索引以使其可搜尋，但不會儲存它們。這意味著可以查詢該欄位，但不能檢索原始欄位值。

通常這並不重要。該欄位值已經是_source欄位的一部分，該欄位是預設儲存的。如果您只想檢索單個欄位或幾個欄位的值，而不是整個_source，那麼可以通過_source過濾來實現。

在某些情況下，儲存欄位是有意義的。例如，如果你有一個文件，一個標題，一個日期，和一個非常大的內容欄位，你可能想只檢索標題和日期，而不必從一個大的_source欄位提取這些欄位:

設定對應欄位的store引數為true，並建立mapping；

PUT my_store_test
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}



{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_store_test"
}

put一個document進行索引

PUT my_store_test/_doc/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

{
  "_index" : "my_store_test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

通過在查詢語句中設定stored_fields來篩選要返回的欄位，elasticsearch返回的fields欄位包含對應的欄位值；

GET my_store_test/_search
{
  "stored_fields": [ "title", "date" ] 
}


{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_store_test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "fields" : {
          "date" : [
            "2015-01-01T00:00:00.000Z"
          ],
          "title" : [
            "Some short title"
          ]
        }
      }
    ]
  }
}

三、測試情況

我們測試使用my_store_index，裡邊包含50W的document，還有一些特別大的document；

我們fetch_ids_query進行測試

預設情況下通過elasticsearch查詢返回的_source欄位獲取記錄的id欄位；

通過take_from__id控制從elasticsearch查詢返回的後設資料_id解析出記錄id；

通過task_stored_fields控制從elasticsearch查詢返回的fields獲取記錄的id；

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
import time


def fetch_ids_query(client, take_from__id = False, task_stored_fields = False):
    start = time.time()
    s = Search(using=client, index="my_store_index")
    s = s.params(http_auth=["test", "test"], request_timeout=50);
    q = Q('bool',
          must_not=[Q('match_phrase_prefix', name='us')]
          )
    s = s.query(q)

    s = s.source(False) if take_from__id else s.source(['id'])
    if task_stored_fields:
        s = s.extra(stored_fields= ['id'])
        s = s.source(False)

    s = s[0:40000]
    response = s.execute()

    print(f'hit total {response.hits.total}')
    print(f'fetch total {len(response.hits.hits)}')
    

    ids = []
    if take_from__id:
        for hit in response.hits.hits:
            id = hit['_id'][37:]
            ids.append(id)
    elif task_stored_fields:
        for hit in response.hits.hits:
            id = hit.fields['id'][0]
            ids.append(id)
    else:
        for hit in response.hits.hits:
            id = hit._source['id']
            ids.append(id)

    end = time.time()
    print(f"all execute time {end - start}s")
    

client = Elasticsearch(hosts=['http://127.0.0.1:9200'], http_auth=["test", "test"])

print('fetch id from source')
fetch_ids_query(client);
print()
print('fetch id from _id and set source = false')
fetch_ids_query(client, True);
print()
print('fetch id from stored id and set source = false')
fetch_ids_query(client, False, True);

四、結果分析

經測試在命中484970，fetch 40000條記錄的前提下，後兩種方式的執行時間更短，但是通過後設資料解析_id會更加友好，不僅節省儲存空間，而且查詢的時候避免了記憶體和CPU的震盪；

fetch id from source
hit total 484970
fetch total 40000
all execute time 28.691869497299194s

fetch id from _id and set source = false
hit total 484970
fetch total 40000
all execute time 11.315539121627808s

fetch id from stored id and set source = false
hit total 484970
fetch total 40000
all execute time 13.930094957351685s

基於Lucene查詢原理分析Elasticsearch的效能
2018-10-30
Elasticsearch
Elasticsearch查詢
2018-12-01
Elasticsearch
elasticsearch的模糊查詢
2019-01-04
Elasticsearch
ElasticSearch的查詢（二）
2021-02-03
Elasticsearch
剖析Elasticsearch的IndexSorting:一種查詢效能優化利器
2018-11-10
ElasticsearchIndex優化
Elasticsearch中的Term查詢和全文查詢
2021-07-06
Elasticsearch
Elasticsearch複合查詢——boosting查詢
2021-11-17
Elasticsearch
Elasticsearch 高亮查詢
2019-01-24
Elasticsearch
ElasticSearch DSL 查詢
2021-02-23
Elasticsearch
elasticsearch查詢之大資料集分頁效能分析
2022-02-09
Elasticsearch大資料
Elasticsearch 或並查詢
2019-01-24
Elasticsearch
Elasticsearch（三）：索引查詢
2020-10-21
Elasticsearch索引
elasticsearch之多索引查詢
2021-12-31
Elasticsearch索引
elasticsearch之exists查詢
2023-01-12
Elasticsearch
Elasticsearch 分頁查詢
2021-04-05
Elasticsearch
一次elasticsearch 查詢瞬間超時案例分析
2023-12-04
Elasticsearch
pgrep查詢正在執行的程式ID
2020-03-27
Elasticsearch——定位不合法的查詢
2019-02-19
Elasticsearch
ElasticSearch類似Mysql的not in 和 in 查詢
2021-09-08
ElasticsearchMySql
elasticsearch查詢之大資料集分頁查詢
2022-02-08
Elasticsearch大資料
從根上理解elasticsearch(lucene)查詢原理(2)-lucene常見查詢型別原理分析
2023-12-12
Elasticsearch型別
Elasticsearch 並或查詢 JSON
2019-04-04
ElasticsearchJSON
Elasticsearch系列---聚合查詢(一)
2020-04-02
Elasticsearch
Elasticsearch系列---聚合查詢原理
2020-04-17
Elasticsearch
Elasticsearch——filter過濾查詢
2019-02-19
ElasticsearchFilter
elasticSearch head 查詢報錯
2024-11-12
Elasticsearch
Elasticsearch 查詢與過濾
2021-03-13
Elasticsearch
Elasticsearch 複合查詢——多字串多欄位查詢
2021-03-14
Elasticsearch字串
Linux基礎命令---查詢程式id
2019-02-09
Linux
將聚合新增到 Elasticsearch 查詢
2024-05-17
Elasticsearch
Elasticsearch Query DSL查詢入門
2019-05-17
Elasticsearch
fastadmin中快速搜尋時執行查詢的欄位預設查詢id
2024-05-31
AST
ES 20 - 查詢Elasticsearch中的資料 (基於DSL查詢, 包括查詢校驗match + bool + term)
2019-06-27
Elasticsearch
實踐006-elasticsearch查詢之1-URI Search查詢
2022-05-05
Elasticsearch
SpringBoot整合Elasticsearch遊標查詢（scroll）
2020-10-16
Spring BootElasticsearch
ElasticSearch基礎及查詢語法
2019-05-03
Elasticsearch
Elasticsearch複合查詢—constant score query
2021-11-17
Elasticsearch
elasticsearch之單請求多查詢
2023-01-05
Elasticsearch

elasticsearch查詢之三種fetch id的方案分析

相關文章