當執行一個搜尋時,它將這個搜尋請求廣播給所有的索引分片。可以通過提供路由引數來控制要搜尋哪些分片。例如,當檢索tweets這個索引時,路由引數可以設定為使用者名稱:
curl -X POST "localhost:9200/twitter/_search?routing=kimchy" -H 'Content-Type: application/json' -d' { "query": { "bool" : { "must" : { "query_string" : { "query" : "some query string here" } }, "filter" : { "term" : { "user" : "kimchy" } } } } } '
1. Search
查詢可以提供一個簡單的查詢字串作為引數,也可以用一個請求體。
1.1. URI Search
這種方式用的很少,就不細說了,舉個例子吧:
curl -X GET "localhost:9200/product/_search?q=category:honor&sort=price:asc"
1.2. Request Body Search
同樣,舉個例子:
curl -X GET "localhost:9200/twitter/_search" -H 'Content-Type: application/json' -d' { "query" : { "term" : { "user" : "kimchy" } } } '
1.2.1. Query
可以用 Query DSL 定義一個query
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "query" : { "term" : { "user" : "kimchy" } } } '
1.2.2. From / Size
通過 from 和 size 引數,可以分頁查詢。from 表示從第幾條開始取,size 表示最多取多少條。from預設值是0,size預設值是10
curl -X GET "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "from" : 0, "size" : 10, "query" : { "term" : { "user" : "kimchy" } } } '
1.2.3. Sort
可以按一個或多個欄位排序
有一些特殊的排序欄位:_score 表示按分數排序,_doc 表示按索引順序排序
假設有這樣一個索引:
curl -X PUT "localhost:9200/my_index" -H 'Content-Type: application/json' -d' { "mappings": { "_doc": { "properties": { "post_date": { "type": "date" }, "user": { "type": "keyword" }, "name": { "type": "keyword" }, "age": { "type": "integer" } } } } } '
針對這個索引,我們這樣來查詢:
curl -X GET "localhost:9200/my_index/_search" -H 'Content-Type: application/json' -d' { "sort" : [ { "post_date" : {"order" : "asc"}}, "user", { "name" : "desc" }, { "age" : "desc" }, "_score" ], "query" : { "term" : { "user" : "kimchy" } } } '
這個例子,依次按照 post_date升序、user升序、name降序、age降序、分數升序排序
(PS:_doc是最有效的排序,如果不關心文件的返回順序的話)
Elasticsearch支援按陣列或者多值欄位排序,mode選項用來控制基於陣列中的那個值來對文件進行排序。mode選項的可選值有:
- min :最小值
- max :最大值
- sum :用所有值的和來作為排序值
- avg :用所有值的平均值作為排序值
- median :用所有值的中間值作為排序值
舉個例子:
curl -X PUT "localhost:9200/my_index/_doc/1?refresh" -H 'Content-Type: application/json' -d' { "product": "chocolate", "price": [20, 4] } ' curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "query" : { "term" : { "product" : "chocolate" } }, "sort" : [ {"price" : {"order" : "asc", "mode" : "avg"}} ] } '
什麼意思呢?也就說,欄位的值可能是一個陣列,或者該欄位值有多個,那麼當我們按這種欄位排序的時候就必須確定在排序的時候這個欄位的值是什麼,也就是該欄位的排序值
所謂的mode選項就是用來確定這種欄位的最終排序值的,比如:取欄位值陣列中最小的那個值作為該欄位的排序值,或者取最大、或者平均值等等
上面的例子中,price欄位值是一個陣列,陣列有兩個元素,後面的查詢指定的mode是avg,意味著price欄位在排序的時候它的排序值是 (20+4)/2=12
上例中,對結果集按price欄位升序排序,price欄位的排序值是price欄位值求平均
Mission
mission 引數用於指定當文件沒有這個欄位時該如何處理,可選值是:_last 和 _first ,預設是 _last
類似於關係型資料庫中欄位為NULL的記錄都放在最後
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "sort" : [ { "price" : {"missing" : "_last"} } ], "query" : { "term" : { "product" : "chocolate" } } } '
1.2.4. Source filtering
可以控制 _source 欄位怎樣返回
預設返回 _source欄位的內容,當然你可以設定不返回該欄位,例如:
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "_source": false, "query" : { "term" : { "user" : "kimchy" } } } '
正常情況下,返回是這樣的:
{ "_index" : "product", "_type" : "_doc", "_id" : "3", "_score" : 1.0, "_source" : { "productName" : "Honor Note10", "category" : "Honor", "price" : 2499 } }
禁用後是這樣的:
{ "_index" : "product", "_type" : "_doc", "_id" : "3", "_score" : 1.0 }
還可以用萬用字元,以進一步控制_source中返回那些欄位:
curl -X GET "localhost:9200/product/_search?pretty" -H 'Content-Type: application/json' -d' { "_source": "product*", "query" : { "match_all" : {} } } '
或者
curl -X GET "localhost:9200/product/_search?pretty" -H 'Content-Type: application/json' -d' { "_source": ["product*", "abc*"], "query" : { "match_all" : {} } } '
1.2.5. 高亮
curl -X GET "localhost:9200/product/_search?pretty" -H 'Content-Type: application/json' -d' { "query" : { "match" : { "category" : "MI" } }, "highlight" : { "fields" : { "productName": {} } } } '
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-highlighting.html
1.2.6. Explain
執行計劃可以看到分數是怎樣計算出來的
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "explain": true, "query" : { "term" : { "user" : "kimchy" } } } '
1.3. Count
curl -X GET "localhost:9200/product/_doc/_count?pretty&q=category:honor" curl -X GET "localhost:9200/product/_doc/_count?pretty" -H 'Content-Type: application/json' -d' { "query" : { "term" : { "category" : "honor" } } } ' { "count" : 3, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 } }
2. Aggregations
相當於關係型資料庫中的聚集函式(或者叫聚合函式)
聚合可以巢狀!聚合可以巢狀!!聚合可以巢狀!!!
聚合主要有4種型別:
- Bucketing
- Mertric
- Matrix
- Pipeline
基本的聚合結構是這樣的:
aggregations 是一個JSON物件,它代表一個聚合。(PS:這個關鍵字也可以用 aggs )
- 每個聚合都關聯了一個邏輯名稱(例如:如果聚合計算平均價格,那麼在這個場景下我可以給這個聚合起個名字叫“avg_price”)
- 在響應結果中,這些邏輯名稱用於唯一標識一個聚合
- 每個聚合都有一個指定的型別(比如:sum ,avg ,max ,min 等等)
- 每個聚合型別都定義了自己的body
2.1. Metrics Aggregations
這種型別的聚合是基於以某種方式從聚合的文件中提取的值來計算度量。這個值通常取自文件的欄位值,也可以通過指令碼計算得到的。
數值度量聚合是一種特殊的度量聚合,它輸出數值。根據輸出值的多少,分為單值數值度量聚合(比如:avg)和多值數值度量聚合(比如:stats)。
2.1.1. Avg
從文件的數值欄位中提取值進行計算
假設,我們的文件是學生成績(0~100),我們可以求平均分數:
curl -X POST "localhost:9200/exams/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs":{ "avg_grade":{ "avg":{ "field":"grade" } } } } '
上面的聚合例子,計算所有學生的平均成績。這裡的聚合型別是avg,field指定哪個欄位用於計算。
再來一個例子:
請求: curl -X POST "localhost:9200/product/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs":{ "avg_price":{ "avg":{ "field":"price" } } } } ' 響應: { "took":13, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":7, "max_score":0, "hits":[ ] }, "aggregations":{ "avg_price":{ "value":2341.5714285714284 } } }
預設情況下,沒有那個欄位的文件將被忽略(PS:就像關係型資料庫中求平均值時會忽略NULL的記錄一樣),我們可以給它指定一個值,例如:
curl -X POST "localhost:9200/exams/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs" : { "grade_avg" : { "avg" : { "field" : "grade", "missing": 10 } } } } '
如果文件沒有grade欄位,那麼用10作為該欄位值參與計算
2.1.2. Sum
從文件的數值欄位中提取值進行計算
請求: curl -X POST "localhost:9200/product/_search?size=0" -H 'Content-Type: application/json' -d' { "query":{ "constant_score":{ "filter":{ "match":{ "category":"vivo" } } } }, "aggs":{ "vivo_prices":{ "sum":{ "field":"price" } } } } ' 響應: { "took":3, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":2, "max_score":0, "hits":[ ] }, "aggregations":{ "vivo_prices":{ "value":3796 } } }
求category欄位值匹配vivo的商品的價格總和
相當於,select sum(price) from product where category like '%vivo%' group by category
2.1.3. Max
從文件的數值欄位中提取值進行計算
curl -X POST "localhost:9200/sales/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs" : { "max_price" : { "max" : { "field" : "price" } } } } '
2.1.4. Stats
這是一個多值聚合,它返回 min ,max ,sum ,count ,avg 的組合結果
curl -X POST "localhost:9200/exams/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs" : { "grades_stats" : { "stats" : { "field" : "grade" } } } } '
它的返回可能是這樣的:
{ ... "aggregations": { "grades_stats": { "count": 2, "min": 50.0, "max": 100.0, "avg": 75.0, "sum": 150.0 } } }
再來一個例子:
請求: curl -X POST "localhost:9200/product/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs" : { "product_stats" : { "stats" : { "field" : "price" } } } } ' 響應: { "took":4, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":7, "max_score":0, "hits":[ ] }, "aggregations":{ "product_stats":{ "count":7, "min":998, "max":4299, "avg":2341.5714285714284, "sum":16391 } } }
2.2. Bucket Aggregations
可以理解為範圍聚合,它的結果是一段一段的,一個一個的bucket
2.2.1. Range
每個Range包含from,不包含to
前閉後開區間
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "price_ranges" : { "range" : { "field" : "price", "ranges" : [ { "to" : 100.0 }, { "from" : 100.0, "to" : 200.0 }, { "from" : 200.0 } ] } } } } '
返回可能是這樣的:
{ ... "aggregations": { "price_ranges" : { "buckets": [ { "key": "*-100.0", "to": 100.0, "doc_count": 2 }, { "key": "100.0-200.0", "from": 100.0, "to": 200.0, "doc_count": 2 }, { "key": "200.0-*", "from": 200.0, "doc_count": 3 } ] } } }
再比如:
請求: curl -X GET "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "price_ranges" : { "range" : { "field" : "price", "ranges" : [ { "to" : 1000 }, { "from" : 1000, "to" : 2000 }, { "from" : 2000 } ] } } } } ' 響應: { "took":1, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":7, "max_score":1, "hits":[ { "_index":"product", "_type":"_doc", "_id":"5", "_score":1, "_source":{ "productName":"MI 8", "category":"MI", "price":2499 } }, { "_index":"product", "_type":"_doc", "_id":"2", "_score":1, "_source":{ "productName":"Honor Magic2", "category":"Honor", "price":4299 } }, { "_index":"product", "_type":"_doc", "_id":"4", "_score":1, "_source":{ "productName":"MI Max2", "category":"MI", "price":1099 } }, { "_index":"product", "_type":"_doc", "_id":"6", "_score":1, "_source":{ "productName":"vivo X23", "category":"vivo", "price":2798 } }, { "_index":"product", "_type":"_doc", "_id":"1", "_score":1, "_source":{ "productName":"Honor 10", "category":"Honor", "price":2199 } }, { "_index":"product", "_type":"_doc", "_id":"7", "_score":1, "_source":{ "productName":"vivo Z1", "category":"vivo", "price":998 } }, { "_index":"product", "_type":"_doc", "_id":"3", "_score":1, "_source":{ "productName":"Honor Note10", "category":"Honor", "price":2499 } } ] }, "aggregations":{ "price_ranges":{ "buckets":[ { "key":"*-1000.0", "to":1000, "doc_count":1 }, { "key":"1000.0-2000.0", "from":1000, "to":2000, "doc_count":1 }, { "key":"2000.0-*", "from":2000, "doc_count":5 } ] } } }
代替返回一個陣列,可以設定keyed為true,這樣可以給每個bucket關聯一個位於的字串key,例如:
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "price_ranges" : { "range" : { "field" : "price", "keyed" : true, "ranges" : [ { "to" : 100 }, { "from" : 100, "to" : 200 }, { "from" : 200 } ] } } } } '
於是返回變成這樣了:
{ ... "aggregations": { "price_ranges" : { "buckets": { "*-100.0": { "to": 100.0, "doc_count": 2 }, "100.0-200.0": { "from": 100.0, "to": 200.0, "doc_count": 2 }, "200.0-*": { "from": 200.0, "doc_count": 3 } } } } }
當然,我們也可以給每個範圍區間自定義key:
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "price_ranges" : { "range" : { "field" : "price", "keyed" : true, "ranges" : [ { "key" : "cheap", "to" : 100 }, { "key" : "average", "from" : 100, "to" : 200 }, { "key" : "expensive", "from" : 200 } ] } } } } '
返回:
{ ... "aggregations": { "price_ranges" : { "buckets": { "cheap": { "to": 100.0, "doc_count": 2 }, "average": { "from": 100.0, "to": 200.0, "doc_count": 2 }, "expensive": { "from": 200.0, "doc_count": 3 } } } } }
舉個栗子:
請求: curl -X GET "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "query": { "match" : { "category" : "honor"} }, "aggs" : { "price_ranges" : { "range" : { "field" : "price", "keyed" : true, "ranges" : [ { "key" : "low", "to" : 1000 }, { "key" : "medium", "from" : 1000, "to" : 2000 }, { "key" : "high", "from" : 2000 } ] } } } } ' 響應: { "took":1, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":3, "max_score":0.9808292, "hits":[ { "_index":"product", "_type":"_doc", "_id":"2", "_score":0.9808292, "_source":{ "productName":"Honor Magic2", "category":"Honor", "price":4299 } }, { "_index":"product", "_type":"_doc", "_id":"1", "_score":0.6931472, "_source":{ "productName":"Honor 10", "category":"Honor", "price":2199 } }, { "_index":"product", "_type":"_doc", "_id":"3", "_score":0.2876821, "_source":{ "productName":"Honor Note10", "category":"Honor", "price":2499 } } ] }, "aggregations":{ "price_ranges":{ "buckets":{ "low":{ "to":1000, "doc_count":0 }, "medium":{ "from":1000, "to":2000, "doc_count":0 }, "high":{ "from":2000, "doc_count":3 } } } } }
2.2.2. Filter
先過濾再聚合
請求: curl -X POST "localhost:9200/product/_search?size=0" -H 'Content-Type: application/json' -d' { "aggs":{ "vivo":{ "filter":{ "term":{ "category":"vivo" } }, "aggs":{ "avg_price":{ "avg":{ "field":"price" } } } } } } ' 響應: { "took":2, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":7, "max_score":0, "hits":[ ] }, "aggregations":{ "vivo":{ "doc_count":2, "avg_price":{ "value":1898 } } } }
2.2.3. Terms Aggregation
相當於關係型資料庫中的分組(group by)
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "genres" : { "terms" : { "field" : "genre" } } } } '
返回可能是這樣的:
{ ... "aggregations" : { "genres" : { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets" : [ { "key" : "electronic", "doc_count" : 6 }, { "key" : "rock", "doc_count" : 3 }, { "key" : "jazz", "doc_count" : 2 } ] } } }
再舉個例子:
請求: curl -X GET "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "group_by_category" : { "terms" : { "field" : "category" } } } } ' 響應: { "took":16, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":7, "max_score":1, "hits":[ { "_index":"product", "_type":"_doc", "_id":"5", "_score":1, "_source":{ "productName":"MI 8", "category":"MI", "price":2499 } }, { "_index":"product", "_type":"_doc", "_id":"2", "_score":1, "_source":{ "productName":"Honor Magic2", "category":"Honor", "price":4299 } }, { "_index":"product", "_type":"_doc", "_id":"4", "_score":1, "_source":{ "productName":"MI Max2", "category":"MI", "price":1099 } }, { "_index":"product", "_type":"_doc", "_id":"6", "_score":1, "_source":{ "productName":"vivo X23", "category":"vivo", "price":2798 } }, { "_index":"product", "_type":"_doc", "_id":"1", "_score":1, "_source":{ "productName":"Honor 10", "category":"Honor", "price":2199 } }, { "_index":"product", "_type":"_doc", "_id":"7", "_score":1, "_source":{ "productName":"vivo Z1", "category":"vivo", "price":998 } }, { "_index":"product", "_type":"_doc", "_id":"3", "_score":1, "_source":{ "productName":"Honor Note10", "category":"Honor", "price":2499 } } ] }, "aggregations":{ "group_by_category":{ "doc_count_error_upper_bound":0, "sum_other_doc_count":0, "buckets":[ { "key":"honor", "doc_count":3 }, { "key":"mi", "doc_count":2 }, { "key":"vivo", "doc_count":2 } ] } } }
size 可以用於指定返回多少個term bucket
請求: curl -X GET "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "group_by_category" : { "terms" : { "field" : "category", "size" : 2 } } } } ' 響應: { ... "aggregations":{ "group_by_category":{ "doc_count_error_upper_bound":0, "sum_other_doc_count":2, "buckets":[ { "key":"honor", "doc_count":3 }, { "key":"mi", "doc_count":2 } ] } } }
3. 示例
排序
curl -X POST "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "query" : { "term" : { "category" : "honor"} }, "sort" : "price" } ' curl -X POST "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "query" : { "term" : { "category" : "honor"} }, "sort" : { "price" : "desc" } } ' curl -X POST "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "query" : { "term" : { "category" : "honor"} }, "sort" : { "price" : { "order" : "desc" } } } ' 響應: { "took":1, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":3, "max_score":null, "hits":[ { "_index":"product", "_type":"_doc", "_id":"2", "_score":null, "_source":{ "productName":"Honor Magic2", "category":"Honor", "price":4299 }, "sort":[ 4299 ] }, { "_index":"product", "_type":"_doc", "_id":"3", "_score":null, "_source":{ "productName":"Honor Note10", "category":"Honor", "price":2499 }, "sort":[ 2499 ] }, { "_index":"product", "_type":"_doc", "_id":"1", "_score":null, "_source":{ "productName":"Honor 10", "category":"Honor", "price":2199 }, "sort":[ 2199 ] } ] } }
分組求平均
請求: curl -X GET "localhost:9200/product/_search" -H 'Content-Type: application/json' -d' { "aggs" : { "group_by_category" : { "terms" : { "field" : "category" }, "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } } } } ' 響應: { "took":2, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":7, "max_score":1, "hits":[ { "_index":"product", "_type":"_doc", "_id":"5", "_score":1, "_source":{ "productName":"MI 8", "category":"MI", "price":2499 } }, { "_index":"product", "_type":"_doc", "_id":"2", "_score":1, "_source":{ "productName":"Honor Magic2", "category":"Honor", "price":4299 } }, { "_index":"product", "_type":"_doc", "_id":"4", "_score":1, "_source":{ "productName":"MI Max2", "category":"MI", "price":1099 } }, { "_index":"product", "_type":"_doc", "_id":"6", "_score":1, "_source":{ "productName":"vivo X23", "category":"vivo", "price":2798 } }, { "_index":"product", "_type":"_doc", "_id":"1", "_score":1, "_source":{ "productName":"Honor 10", "category":"Honor", "price":2199 } }, { "_index":"product", "_type":"_doc", "_id":"7", "_score":1, "_source":{ "productName":"vivo Z1", "category":"vivo", "price":998 } }, { "_index":"product", "_type":"_doc", "_id":"3", "_score":1, "_source":{ "productName":"Honor Note10", "category":"Honor", "price":2499 } } ] }, "aggregations":{ "group_by_category":{ "doc_count_error_upper_bound":0, "sum_other_doc_count":0, "buckets":[ { "key":"honor", "doc_count":3, "avg_price":{ "value":2999 } }, { "key":"mi", "doc_count":2, "avg_price":{ "value":1799 } }, { "key":"vivo", "doc_count":2, "avg_price":{ "value":1898 } } ] } } }
4. 示例索引
curl -X PUT "localhost:9200/product" -H 'Content-Type: application/json' -d' { "mappings" : { "_doc" : { "properties": { "productName": {"type": "text"}, "category": {"type": "text", "fielddata": true}, "price": {"type": "integer"} } } } } ' curl -X POST "localhost:9200/product/_doc/_bulk" -H 'Content-Type: application/json' --data-binary "@product.json" {"index" : {"_id" : "1" } } {"productName" : "Honor 10", "category" : "Honor", "price" : 2199} {"index" : {"_id" : "2" } } {"productName" : "Honor Magic2", "category" : "Honor", "price" : 4299} {"index" : {"_id" : "3" } } {"productName" : "Honor Note10", "category" : "Honor", "price" : 2499} {"index" : {"_id" : "4" } } {"productName" : "MI Max2", "category" : "MI", "price" : 1099} {"index" : {"_id" : "5" } } {"productName" : "MI 8", "category" : "MI", "price" : 2499} {"index" : {"_id" : "6" } } {"productName" : "vivo X23", "category" : "vivo", "price" : 2798} {"index" : {"_id" : "7" } } {"productName" : "vivo Z1", "category" : "vivo", "price" : 998}
5. 參考
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html
6. 其它相關