[轉]23個最有用的Elasticsearch檢索技巧

小旋鋒發表於2018-08-24

前言

本文主要介紹 Elasticsearch 23種最有用的檢索技巧,提供了詳盡的原始碼舉例,並配有相應的Java API實現,是不可多得的 Elasticsearch 學習&實戰資料

資料準備

為了講解不同型別 ES 檢索,我們將要對包含以下型別的文件集合進行檢索:

title               標題
authors             作者
summary             摘要
publish_date        釋出日期
num_reviews         評論數
publisher           出版社
複製程式碼

首先,我們藉助 bulk API 批量建立新的索引並提交資料

# 設定索引 settings
PUT /bookdb_index
{ "settings": { "number_of_shards": 1 }}

# bulk 提交資料
POST /bookdb_index/book/_bulk
{"index":{"_id":1}}
{"title":"Elasticsearch: The Definitive Guide","authors":["clinton gormley","zachary tong"],"summary":"A distibuted real-time search and analytics engine","publish_date":"2015-02-07","num_reviews":20,"publisher":"oreilly"}
{"index":{"_id":2}}
{"title":"Taming Text: How to Find, Organize, and Manipulate It","authors":["grant ingersoll","thomas morton","drew farris"],"summary":"organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization","publish_date":"2013-01-24","num_reviews":12,"publisher":"manning"}
{"index":{"_id":3}}
{"title":"Elasticsearch in Action","authors":["radu gheorge","matthew lee hinman","roy russo"],"summary":"build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms","publish_date":"2015-12-03","num_reviews":18,"publisher":"manning"}
{"index":{"_id":4}}
{"title":"Solr in Action","authors":["trey grainger","timothy potter"],"summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr","publish_date":"2014-04-05","num_reviews":23,"publisher":"manning"}
複製程式碼

注意:本文實驗使用的ES版本是 ES 6.3.0

1、基本匹配檢索( Basic Match Query)

1.1 全文檢索

有兩種方式可以執行全文檢索:

1)使用包含引數的檢索API,引數作為URL的一部分

舉例:以下對 "guide" 執行全文檢索

GET bookdb_index/book/_search?q=guide

[Results]
  "hits": {
    "total": 2,
    "max_score": 1.3278645,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.3278645,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ],
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "publish_date": "2014-04-05",
          "num_reviews": 23,
          "publisher": "manning"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1.2871116,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ],
          "summary": "A distibuted real-time search and analytics engine",
          "publish_date": "2015-02-07",
          "num_reviews": 20,
          "publisher": "oreilly"
        }
      }
    ]
  }
複製程式碼

2)使用完整的ES DSL,其中Json body作為請求體 其執行結果如方式 1)結果一致.

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "guide",
      "fields" : ["_all"]
    }
  }
}
複製程式碼

解讀: 使用multi_match關鍵字代替match關鍵字,作為對多個欄位執行相同查詢的方便的簡寫方式。 fields屬性指定要查詢的欄位,在這種情況下,我們要對文件中的所有欄位進行查詢

注意:ES 6.x 預設不啟用 _all 欄位, 不指定 fields 預設搜尋為所有欄位

1.2 指定特定欄位檢索

這兩個API也允許您指定要搜尋的欄位。
例如,要在標題欄位(title)中搜尋帶有 "in action" 字樣的圖書

1)URL檢索方式

GET bookdb_index/book/_search?q=title:in action

[Results]
  "hits": {
    "total": 2,
    "max_score": 1.6323128,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1.6323128,
        "_source": {
          "title": "Elasticsearch in Action",
          "authors": [
            "radu gheorge",
            "matthew lee hinman",
            "roy russo"
          ],
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "publish_date": "2015-12-03",
          "num_reviews": 18,
          "publisher": "manning"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.6323128,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ],
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "publish_date": "2014-04-05",
          "num_reviews": 23,
          "publisher": "manning"
        }
      }
    ]
  }
複製程式碼

2)DSL檢索方式 然而,full body的DSL為您提供了建立更復雜查詢的更多靈活性(我們將在後面看到)以及指定您希望的返回結果。在下面的示例中,我們指定要返回的結果數、偏移量(對分頁有用)、我們要返回的文件欄位以及屬性的高亮顯示。

結果數的表示方式:size
偏移值的表示方式:from
指定返回欄位 的表示方式 :_source
高亮顯示 的表示方式 :highliaght

GET bookdb_index/book/_search
{
  "query": {
    "match": {
      "title": "in action"
    }
  },
  "size": 2,
  "from": 0,
  "_source": ["title", "summary", "publish_date"],
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 1.6323128,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1.6323128,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        },
        "highlight": {
          "title": [
            "Elasticsearch <em>in</em> <em>Action</em>"
          ]
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.6323128,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        },
        "highlight": {
          "title": [
            "Solr <em>in</em> <em>Action</em>"
          ]
        }
      }
    ]
  }
複製程式碼

注意:

  1. 對於 multi-word 檢索,匹配查詢允許您指定是否使用 and 運算子, 而不是使用預設 or 運算子 ---> "operator" : "and"
  2. 您還可以指定 minimum_should_match 選項來調整返回結果的相關性,詳細資訊可以在Elasticsearch指南中查詢Elasticsearch guide獲取。

2、多欄位檢索 (Multi-field Search)

如我們已經看到的,要在搜尋中查詢多個文件欄位(例如在標題和摘要中搜尋相同的查詢字串),請使用multi_match查詢

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "guide", 
      "fields": ["title", "summary"]
    }
  }
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 2.0281231,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 2.0281231,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ],
          "summary": "A distibuted real-time search and analytics engine",
          "publish_date": "2015-02-07",
          "num_reviews": 20,
          "publisher": "oreilly"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.3278645,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ],
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "publish_date": "2014-04-05",
          "num_reviews": 23,
          "publisher": "manning"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1.0333893,
        "_source": {
          "title": "Elasticsearch in Action",
          "authors": [
            "radu gheorge",
            "matthew lee hinman",
            "roy russo"
          ],
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "publish_date": "2015-12-03",
          "num_reviews": 18,
          "publisher": "manning"
        }
      }
    ]
  }
複製程式碼

注意:以上結果中文件4(_id=4)匹配的原因是guide在summary存在。

3、 Boosting提升某欄位得分的檢索( Boosting)

由於我們正在多個欄位進行搜尋,我們可能希望提高某一欄位的得分。 在下面的例子中,我們將“摘要”欄位的得分提高了3倍,以增加“摘要”欄位的重要性,從而提高文件 4 的相關性。

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "elasticsearch guide", 
      "fields": ["title", "summary^3"]
    }
  },
  "_source": ["title", "summary", "publish_date"]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 3.9835935,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 3.9835935,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 3.1001682,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 2.0281231,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
  }
複製程式碼

注意:Boosting不僅意味著計算得分乘法以增加因子。 實際的提升得分值是通過歸一化和一些內部優化。參考 Elasticsearch guide檢視更多

4、Bool檢索( Bool Query)

可以使用 AND / OR / NOT 運算子來微調我們的搜尋查詢,以提供更相關或指定的搜尋結果。

在搜尋API中是通過bool查詢來實現的。 bool查詢接受 must 引數(等效於AND),一個 must_not 引數(相當於NOT)或者一個 should 引數(等同於OR)。

例如,如果我想在標題中搜尋一本名為 "Elasticsearch" 或 "Solr" 的書,AND由 "clinton gormley" 創作,但NOT由 "radu gheorge" 創作

GET bookdb_index/book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {"match": {"title": "Elasticsearch"}},
              {"match": {"title": "Solr"}}
            ]
          }
        },
        {
          "match": {"authors": "clinton gormely"}
        }
      ],
      "must_not": [
        {
          "match": {"authors": "radu gheorge"}
        }
      ]
    }
  }
}

[Results]
  "hits": {
    "total": 1,
    "max_score": 2.0749094,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 2.0749094,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ],
          "summary": "A distibuted real-time search and analytics engine",
          "publish_date": "2015-02-07",
          "num_reviews": 20,
          "publisher": "oreilly"
        }
      }
    ]
  }
複製程式碼

關於bool查詢中的should, 有兩種情況:

  • 當should的同級存在must的時候,should中的條件可以滿足也可以不滿足,滿足的越多得分越高
  • 當沒有must的時候,預設should中的條件至少要滿足一個

注意:您可以看到,bool查詢可以包含任何其他查詢型別,包括其他布林查詢,以建立任意複雜或深度巢狀的查詢

5、 Fuzzy 模糊檢索( Fuzzy Queries)

在 Match檢索 和多匹配檢索中可以啟用模糊匹配來捕捉拼寫錯誤。 基於與原始詞的 Levenshtein 距離來指定模糊度

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "comprihensiv guide",
      "fields": ["title","summary"],
      "fuzziness": "AUTO"
    }
  },
  "_source": ["title","summary","publish_date"],
  "size": 2
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 2.4344182,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 2.4344182,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1.2871116,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
  }
複製程式碼

"AUTO" 的模糊值相當於當欄位長度大於5時指定值2。但是,設定80%的拼寫錯誤的編輯距離為1,將模糊度設定為1可能會提高整體搜尋效能。 有關更多資訊, Typos and Misspellingsch

6、 Wildcard Query 萬用字元檢索

萬用字元查詢允許您指定匹配的模式,而不是整個片語(term)檢索

  • ? 匹配任何字元
    • 匹配零個或多個字元

舉例,要查詢具有以 "t" 字母開頭的作者的所有記錄,如下所示

GET bookdb_index/book/_search
{
  "query": {
    "wildcard": {
      "authors": {
        "value": "t*"
      }
    }
  },
  "_source": ["title", "authors"],
  "highlight": {
    "fields": {
      "authors": {}
    }
  }
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1,
        "_source": {
          "title": "Elasticsearch: The Definitive Guide",
          "authors": [
            "clinton gormley",
            "zachary tong"
          ]
        },
        "highlight": {
          "authors": [
            "zachary <em>tong</em>"
          ]
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "authors": [
            "grant ingersoll",
            "thomas morton",
            "drew farris"
          ]
        },
        "highlight": {
          "authors": [
            "<em>thomas</em> morton"
          ]
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ]
        },
        "highlight": {
          "authors": [
            "<em>trey</em> grainger",
            "<em>timothy</em> potter"
          ]
        }
      }
    ]
  }
複製程式碼

7、正規表示式檢索( Regexp Query)

正規表示式能指定比萬用字元檢索更復雜的檢索模式,舉例如下:

POST bookdb_index/book/_search
{
  "query": {
    "regexp": {
      "authors": "t[a-z]*y"
    }
  },
  "_source": ["title", "authors"],
  "highlight": {
    "fields": {
      "authors": {}
    }
  }
}

[Results]
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1,
        "_source": {
          "title": "Solr in Action",
          "authors": [
            "trey grainger",
            "timothy potter"
          ]
        },
        "highlight": {
          "authors": [
            "<em>trey</em> grainger",
            "<em>timothy</em> potter"
          ]
        }
      }
    ]
  }
複製程式碼

8、匹配短語檢索( Match Phrase Query)

匹配短語查詢要求查詢字串中的所有詞都存在於文件中,按照查詢字串中指定的順序並且彼此靠近

預設情況下,這些詞必須完全相鄰,但您可以指定偏離值(slop value),該值指示在仍然考慮文件匹配的情況下詞與詞之間的偏離值。

GET bookdb_index/book/_search
{
  "query": {
    "multi_match": {
      "query": "search engine",
      "fields": ["title", "summary"],
      "type": "phrase",
      "slop": 3
    }
  },
  "_source": [ "title", "summary", "publish_date" ]
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 0.88067603,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.88067603,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.51429313,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
  }
複製程式碼

注意:在上面的示例中,對於非短語型別查詢,文件_id 1通常具有較高的分數,並且顯示在文件_id 4之前,因為其欄位長度較短。

然而,作為一個短語查詢,詞與詞之間的接近度被考慮在內,所以文件_id 4分數更好

9、匹配片語字首檢索

匹配片語字首查詢在查詢時提供搜尋即時型別或 "相對簡單" "的自動完成版本,而無需以任何方式準備資料。

像match_phrase查詢一樣,它接受一個斜率引數,使得單詞的順序和相對位置沒有那麼 "嚴格"。 它還接受max_expansions引數來限制匹配的條件數以減少資源強度

GET bookdb_index/book/_search
{
  "query": {
    "match_phrase_prefix": {
      "summary": {
        "query": "search en",
        "slop": 3,
        "max_expansions": 10
      }
    }
  },
  "_source": ["title","summary","publish_date"]
}
複製程式碼

注意:查詢時間搜尋型別具有效能成本。 一個更好的解決方案是將時間作為索引型別。 更多相關API查詢 Completion Suggester API 或者 Edge-Ngram filters 。

10、字串檢索( Query String)

query_string查詢提供了以簡明的簡寫語法執行多匹配查詢 multi_match queries ,布林查詢 bool queries ,提升得分 boosting ,模糊匹配 fuzzy matching ,萬用字元 wildcards ,正規表示式 regexp 和範圍查詢 range queries 的方式。

在下面的例子中,我們對 "search algorithm" 一詞執行模糊搜尋,其中一本作者是 "grant ingersoll" 或 "tom morton"。 我們搜尋所有欄位,但將提升應用於文件2的摘要欄位

GET bookdb_index/book/_search
{
  "query": {
    "query_string": {
      "query": "(saerch~1 algorithm~1) AND (grant ingersoll)  OR (tom morton)",
      "fields": ["summary^2","title","authors","publisher"]
    }
  },
  "_source": ["title","summary","authors"],
  "highlight": {
    "fields": {
      "summary": {}
    }
  }
}

[Results]
  "hits": {
    "total": 1,
    "max_score": 3.571021,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 3.571021,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "authors": [
            "grant ingersoll",
            "thomas morton",
            "drew farris"
          ]
        },
        "highlight": {
          "summary": [
            "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging"
          ]
        }
      }
    ]
  }
複製程式碼

11、簡化的字串檢索 (Simple Query String)

simple_query_string 查詢是 query_string 查詢的一個版本,更適合用於暴露給使用者的單個搜尋框, 因為它分別用 + / | / - 替換了 AND / OR / NOT 的使用,並放棄查詢的無效部分,而不是在使用者出錯時丟擲異常。

GET bookdb_index/book/_search
{
  "query": {
    "simple_query_string": {
      "query": "(saerch~1 algorithm~1) + (grant ingersoll)  | (tom morton)",
      "fields": ["summary^2","title","authors","publisher"]
    }
  },
  "_source": ["title","summary","authors"],
  "highlight": {
    "fields": {
      "summary": {}
    }
  }
}

[Results]
# 結果同上
複製程式碼

12、Term/Terms檢索(指定欄位檢索)

上面1-11小節的例子是全文搜尋的例子。 有時我們對結構化搜尋更感興趣,我們希望在其中找到完全匹配並返回結果

在下面的例子中,我們搜尋 Manning Publications 釋出的索引中的所有圖書(藉助 term和terms查詢 )

GET bookdb_index/book/_search
{
  "query": {
    "term": {
      "publisher": {
        "value": "manning"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": 0.35667494,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.35667494,
        "_source": {
          "publisher": "manning",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      }
    ]
  }
複製程式碼

Multiple terms可指定多個關鍵詞進行檢索

GET bookdb_index/book/_search
{
  "query": {
    "terms": {
      "publisher": ["oreilly", "manning"]
    }
  }
}
複製程式碼

13、Term排序檢索-(Term Query - Sorted)

Term查詢和其他查詢一樣,輕鬆的實現排序。多級排序也是允許的

GET bookdb_index/book/_search
{
  "query": {
    "term": {
      "publisher": {
        "value": "manning"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"],
  "sort": [{"publisher.keyword": { "order": "desc"}},
    {"title.keyword": {"order": "asc"}}]
}

[Results]
  "hits": {
    "total": 3,
    "max_score": null,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        },
        "sort": [
          "manning",
          "Elasticsearch in Action"
        ]
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        },
        "sort": [
          "manning",
          "Solr in Action"
        ]
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": null,
        "_source": {
          "publisher": "manning",
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        },
        "sort": [
          "manning",
          "Taming Text: How to Find, Organize, and Manipulate It"
        ]
      }
    ]
  }
複製程式碼

注意:Elasticsearch 6.x 全文搜尋用text型別的欄位,排序用不用 text 型別的欄位

14、範圍檢索(Range query)

另一個結構化檢索的例子是範圍檢索。下面的舉例中,我們檢索了2015年釋出的書籍。

GET bookdb_index/book/_search
{
  "query": {
    "range": {
      "publish_date": {
        "gte": "2015-01-01",
        "lte": "2015-12-31"
      }
    }
  },
  "_source" : ["title","publish_date","publisher"]
}

[Results]
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1,
        "_source": {
          "publisher": "oreilly",
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 1,
        "_source": {
          "publisher": "manning",
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]
  }
複製程式碼

注意:範圍查詢適用於日期,數字和字串型別欄位

15、過濾檢索(Filtered query)

(5.0版本起已不再存在,不必關注)

過濾的查詢允許您過濾查詢的結果。 如下的例子,我們在標題或摘要中查詢名為“Elasticsearch”的圖書,但是我們希望將結果過濾到只有20個或更多評論的結果。

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "range" : {
                    "num_reviews": {
                        "gte": 20
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews"]
}


[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide"
        }
      }
    ]
複製程式碼

注意:已過濾的查詢不要求存在要過濾的查詢。 如果沒有指定查詢,則執行 match_all 查詢,基本上返回索引中的所有文件,然後對其進行過濾。 實際上,首先執行過濾器,減少需要查詢的表面積。 此外,過濾器在第一次使用後被快取,這使得它非常有效

更新: 已篩選的查詢已推出的Elasticsearch 5.X版本中移除,有利於布林查詢。 這是與上面重寫的使用bool查詢相同的示例。 返回的結果是完全一樣的。

GET bookdb_index/book/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "elasticsearch",
            "fields": ["title","summary"]
          }
        }
      ],
      "filter": {
        "range": {
          "num_reviews": {
            "gte": 20
          }
        }
      }
    }
  },
  "_source" : ["title","summary","publisher", "num_reviews"]
}
複製程式碼

16、多個過濾器檢索(Multiple Filters)

(5.x不再支援,無需關注) 多個過濾器可以通過使用布林過濾器進行組合。

在下一個示例中,過濾器確定返回的結果必須至少包含20個評論,不得在2015年之前釋出,並且應該由oreilly釋出

POST /bookdb_index/book/_search
{
    "query": {
        "filtered": {
            "query" : {
                "multi_match": {
                    "query": "elasticsearch",
                    "fields": ["title","summary"]
                }
            },
            "filter": {
                "bool": {
                    "must": {
                        "range" : { "num_reviews": { "gte": 20 } }
                    },
                    "must_not": {
                        "range" : { "publish_date": { "lte": "2014-12-31" } }
                    },
                    "should": {
                        "term": { "publisher": "oreilly" }
                    }
                }
            }
        }
    },
    "_source" : ["title","summary","publisher", "num_reviews", "publish_date"]
}


[Results]
"hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.5955761,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "publisher": "oreilly",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      }
    ]
複製程式碼

17、 Function 得分:Field值因子( Function Score: Field Value Factor)

可能有一種情況,您想要將文件中特定欄位的值納入相關性分數的計算。 這在您希望基於其受歡迎程度提升文件的相關性的情況下是有代表性的場景

在我們的例子中,我們希望增加更受歡迎的書籍(按評論數量判斷)。 這可以使用field_value_factor函式得分

GET bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title","summary"]
        }
      },
      "field_value_factor": {
        "field": "num_reviews",
        "modifier": "log1p",
        "factor": 2
      }
    }
  },
  "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[Results]
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 1.5694137,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 1.4725765,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.14181662,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.13297246,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      }
    ]
  }
複製程式碼

注1:我們可以執行一個常規的multi_match查詢,並按num_reviews欄位排序,但是我們失去了相關性得分的好處。
注2:有許多附加引數可以調整對原始相關性分數 (如“ modifier ”,“ factor ”,“boost_mode”等)的增強效果的程度。
詳見 Elasticsearch guide.

18、 Function 得分:衰減函式( Function Score: Decay Functions )

假設,我們不是想通過一個欄位的值逐漸增加得分,以獲取理想的結果。 舉例:價格範圍、數字欄位範圍、日期範圍。 在我們的例子中,我們正在搜尋2014年6月左右出版的“ search engines ”的書籍。

GET bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title", "summary"]
        }
      },
      "functions": [
        {
          "exp": {
            "publish_date": {
              "origin": "2014-06-15",
              "scale": "30d",
              "offset": "7d"
            }
          }
        }
      ],
      "boost_mode": "replace"
    }
  },
  "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[Results]
  "hits": {
    "total": 4,
    "max_score": 0.22793062,
    "hits": [
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "4",
        "_score": 0.22793062,
        "_source": {
          "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr",
          "num_reviews": 23,
          "title": "Solr in Action",
          "publish_date": "2014-04-05"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "1",
        "_score": 0.0049215667,
        "_source": {
          "summary": "A distibuted real-time search and analytics engine",
          "num_reviews": 20,
          "title": "Elasticsearch: The Definitive Guide",
          "publish_date": "2015-02-07"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "2",
        "_score": 0.000009612435,
        "_source": {
          "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization",
          "num_reviews": 12,
          "title": "Taming Text: How to Find, Organize, and Manipulate It",
          "publish_date": "2013-01-24"
        }
      },
      {
        "_index": "bookdb_index",
        "_type": "book",
        "_id": "3",
        "_score": 0.0000049185574,
        "_source": {
          "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms",
          "num_reviews": 18,
          "title": "Elasticsearch in Action",
          "publish_date": "2015-12-03"
        }
      }
    ]
  }
複製程式碼

19、Function得分:指令碼得分( Function Score: Script Scoring )

在內建計分功能不符合您需求的情況下,可以選擇指定用於評分的Groovy指令碼

在我們的示例中,我們要指定一個考慮到publish_date的指令碼,然後再決定考慮多少評論。 較新的書籍可能沒有這麼多的評論,所以他們不應該為此付出“代價”

得分指令碼如下所示:

publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value

if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
  my_score = Math.log(2.5 + num_reviews)
} else {
  my_score = Math.log(1 + num_reviews)
}
return my_score
複製程式碼

要動態使用評分指令碼,我們使用script_score引數

GET /bookdb_index/book/_search
{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "search engine",
          "fields": ["title","summary"]
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
              "params": {
                "threshold": "2015-07-30"
              },  
              "lang": "groovy", 
              "source": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
            }
          }
        }
      ]
    }
  },
  "_source": ["title","summary","publish_date", "num_reviews"]
}
複製程式碼

注1:要使用動態指令碼,必須為config / elasticsearch.yml檔案中的Elasticsearch例項啟用它。 也可以使用已經儲存在Elasticsearch伺服器上的指令碼。 檢視 Elasticsearch reference docs 以獲取更多資訊。
注2: JSON不能包含嵌入的換行符,因此分號用於分隔語句。
原文作者: by Tim Ojo Aug. 05, 16 · Big Data Zone
原文地址:dzone.com/articles/23…

注意:ES6.3 怎樣啟用 groovy 指令碼?配置未成功
script.allowed_types: inline & script.allowed_contexts: search, update

Java API 實現

Java API 實現上面的查詢,程式碼見 github.com/whirlys/ela…

參考文章:
銘毅天下:[譯]你必須知道的23個最有用的Elasticseaerch檢索技巧
英文原文:23 Useful Elasticsearch Example Queries


更多內容請訪問我的個人部落格:laijianfeng.org

開啟微信掃一掃,關注【小旋鋒】微信公眾號,及時接收博文推送

小旋鋒的微信公眾號

相關文章