elasticsearch之警惕inner hits的效能問題

無風聽海發表於2022-01-06

一、inner hits簡介

elasticsearch提供了nested資料型別來處理主子文件的問題,可以解決子文件欄位被分裂平鋪導致欄位之間失去了整體的關聯性;

elasticsearch提供的inner hits主要完成在通過子文件進行匹配查詢的時候,可以方便控制匹配的子文件的返回;

二、資料描述

資料結構及index情況可以參考 elasticsearch支援大table格式資料的搜尋

三、問題簡介

通過一個簡單的ip來搜尋,只匹配了一個主文件,而且返回了十個子元素,並進行了高亮處理;

查詢語句

{
  "_source": {
    "excludes": [
      "content"
    ]
  },
  "query": {
    "bool": {
      "should": {
        "nested": {
          "path": "content",
          "query": {
            "query_string": {
              "query": "192.168.1.1*",
              "fields": [
                "content.*"
              ]
            }
          },
          "inner_hits": {
            "from": 0,
            "size": 10,
            "highlight": {
              "fields": {
                "*": {}
              },
              "fragment_size": 1000
            }
          },
          "score_mode": "avg",
          "ignore_unmapped": true
        }
      }
    }
  },
  "size": 20,
  "timeout": "20s"
}

執行語句的時間長達3111ms,只是匹配了一個文件,並且只高亮返回10個子文件,時間不至於這麼長;

{
    "took":3111,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":0.001722915,
        "hits":[
		]
    }
}

四、定位問題

執行以下語句,使用profile api來檢視query執行的時間;


{
  "profile": true,
  "_source": {
    "excludes": [
      "content"
    ]
  },
  "query": {
    "bool": {
      "should": {
        "nested": {
          "path": "content",
          "query": {
            "query_string": {
              "query": "192.168.1.1*",
              "fields": [
                "content.*"
              ]
            }
          },
          "inner_hits": {
            "from": 0,
            "size": 10,
            "highlight": {
              "fields": {
                "*": {}
              },
              "fragment_size": 1000
            }
          },
          "score_mode": "avg",
          "ignore_unmapped": true
        }
      }
    }
  },
  "size": 20,
  "timeout": "20s"
}

通過profile部分,我們可以看到整個search的時間不到20ms,肯定不是查詢導致的問題了;

{
    "took":2859,
    "timed_out":false,
    "profile":{
        "shards":[
            {
                "searches":[
                    {
                        "query":[
                            {
                                "type":"BooleanQuery",
                                "time":"9.9ms",
                                "time_in_nanos":9945310,
                                "breakdown":{
                                    "score":9349172,
                                    "build_scorer_count":6,
                                    "match_count":0,
                                    "create_weight":398951,
                                    "next_doc":1262,
                                    "match":0,
                                    "create_weight_count":1,
                                    "next_doc_count":1,
                                    "score_count":1,
                                    "build_scorer":176010,
                                    "advance":19905,
                                    "advance_count":1
                                }
                            }
                        ],
                        "rewrite_time":41647,
                        "collector":[
                            {
                                "name":"CancellableCollector",
                                "reason":"search_cancelled",
                                "time":"9.3ms",
                                "time_in_nanos":9376796,
                                "children":[
                                    {
                                        "name":"SimpleTopScoreDocCollector",
                                        "reason":"search_top_hits",
                                        "time":"9.3ms",
                                        "time_in_nanos":9355874
                                    }
                                ]
                            }
                        ]
                    }
                ],
                "aggregations":[

                ]
            }
        ]
    }
}

是不是高亮的問題呢?

去掉查詢語句中的高亮部分,執行如下查詢語句;

{
  "_source": {
    "excludes": [
      "content"
    ]
  },
  "query": {
    "bool": {
      "should": {
        "nested": {
          "path": "content",
          "query": {
            "query_string": {
              "query": "192.168.1.1*",
              "fields": [
                "content.*"
              ]
            }
          },
          "inner_hits": {
            "from": 0,
            "size": 10
          },
          "score_mode": "avg",
          "ignore_unmapped": true
        }
      }
    }
  },
  "size": 20,
  "timeout": "20s"
}

可以看到執行時間並沒有什麼大的變化;

{
    "took":3117,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":0.001722915,
        "hits":[
            {
                 "inner_hits":{
                    "content":{
                        "hits":{
                            "total":400000,
                            "max_score":0.001722915,
                            "hits":[
                             ]
                        }
                    }
                }
            }
        ]
    }
}

現在剩下的只能是跟返回的文件有關係了;

禁止返回主文件,執行如下查詢語句;

{
  "_source": false,
  "query": {
    "bool": {
      "should": {
        "nested": {
          "path": "content",
          "query": {
            "query_string": {
              "query": "192.168.1.1*",
              "fields": [
                "content.*"
              ]
            }
          },
          "inner_hits": {
            "from": 0,
            "size": 10
          },
          "score_mode": "avg",
          "ignore_unmapped": true
        }
      }
    }
  },
  "size": 20,
  "timeout": "20s"
}

可以看到時間還是沒有什麼變化;

{
    "took":2915,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":0.001722915,
        "hits":[
            {
                 "inner_hits":{
                    "content":{
                        "hits":{
                            "total":400000,
                            "max_score":0.001722915,
                            "hits":[
                             ]
                        }
                    }
                }
            }
        ]
    }
}

修改查詢語句,禁止返回子文件,執行以下語句

{
  "_source": false,
  "query": {
    "bool": {
      "should": {
        "nested": {
          "path": "content",
          "query": {
            "query_string": {
              "query": "192.168.1.1*",
              "fields": [
                "content.*"
              ]
            }
          },
          "inner_hits": {
            "from": 0,
            "size": 0
          },
          "score_mode": "avg",
          "ignore_unmapped": true
        }
      }
    }
  },
  "size": 20,
  "timeout": "20s"
}

可以看到10ms就執行完成了;

{
    "took":10,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":0.001722915,
        "hits":[
            {
                "_type":"_doc",
                "_score":0.001722915,
                "inner_hits":{
                    "content":{
                        "hits":{
                            "total":400000,
                            "max_score":0,
                            "hits":[

                            ]
                        }
                    }
                }
            }
        ]
    }
}

五、問題原因分析

通過以上分析我們可以知道,由於返回了10個子文件,導致了執行時間的增長;從直觀考慮來說淡出的返回10個不大的文件,不至於會耗時這麼長時間啊;

inner hits提供了from和size來控制返回子文件的數量,我們以為可以像普通的查詢那樣使用,但是這裡size的預設值是3,from+size必須小於100;

{
                "type":"illegal_argument_exception",
                "reason":"Inner result window is too large, the inner hit definition's [null]'s from + size must be less than or equal to: [100] but was [101]. This limit can be set by changing the [index.max_inner_result_window] index level setting."
            }

既然有這個限制,那麼肯定是inner hit的效能不是很好,肯定跟nested type的儲存結構和inner hits的實現機制有關係了;其實由於主文件和所有相關的子文件資料都儲存在父文件的source欄位,導致返回子文件的時候
,需要載入和解析主文件的source欄位,並定位處理子文件;通過上邊的查詢返回結果可以看到,雖然只匹配了一個主文件,但是這個主文件下有40W的子文件,這麼多的文件勢必會導致source很大,最終導致執行時間的暴漲;

ested document don’t have a _source field, because the entire source of document is stored with the root document under its _source field. To include the source of just the nested document, the source of the root document is parsed and just the relevant bit for the nested document is included as source in the inner hit. Doing this for each matching nested document has an impact on the time it takes to execute the entire search request, especially when size and the inner hits' size are set higher than the default. To avoid the relatively expensive source extraction for nested inner hits, one can disable including the source and solely rely on doc values fields.

六、解決方案

  1. 單個文件只會儲存在單個分片上,無法通過增加分片提高查詢的速度;
  2. 文件提到了禁用source,並依賴doc values欄位,但是經測試查詢時間基本沒有任何改善;
  3. 減少返回的子文件個數,可以顯著的降低查詢時間,例如下邊返回3個;
{
    "took":967,
    "timed_out":false,
    "_shards":{
        "total":1,
        "successful":1,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":1,
        "max_score":0.001722915,
        "hits":[
            {
                "_type":"_doc",
                "_score":0.001722915,
                "inner_hits":{
                    "content":{
                        "hits":{
                            "total":100008,
                            "max_score":0.001722915
                        }
                    }
                }
            }
        ]
    }
}

相關文章