- Metric 一些系列的統計方法
- Bucket 一組滿足條件的文件
- Aggregation屬於Search 的一部分。一般情況下,建議將其Size指定為0
例子
- 單值分析:只輸出一個分析結果
- min,max,avg,sum
- Cardinality(類似 distinct Count)
- 多值分析:輸出多個分析結果
- stats ,extended stats
- percentile, percentile rank
- top hits (排在前面的示例)
- 檢視最低工資
- 檢視最高工資
- 一個聚合輸出多個值
- 一次查詢包含多個聚合
- 同時檢視最低 最高 和平均工資
PUT /employees/ { "mappings" : { "properties" : { "age" : { "type" : "integer" }, "gender" : { "type" : "keyword" }, "job" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 50 } } }, "name" : { "type" : "keyword" }, "salary" : { "type" : "integer" } } } } PUT /employees/_bulk { "index" : { "_id" : "1" } } { "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 } { "index" : { "_id" : "2" } } { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000} { "index" : { "_id" : "3" } } { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 } { "index" : { "_id" : "4" } } { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000} { "index" : { "_id" : "5" } } { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 } { "index" : { "_id" : "6" } } { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000} { "index" : { "_id" : "7" } } { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 } { "index" : { "_id" : "8" } } { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000} { "index" : { "_id" : "9" } } { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 } { "index" : { "_id" : "10" } } { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000} { "index" : { "_id" : "11" } } { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 } { "index" : { "_id" : "12" } } { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000} { "index" : { "_id" : "13" } } { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 } { "index" : { "_id" : "14" } } { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000} { "index" : { "_id" : "15" } } { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 } { "index" : { "_id" : "16" } } { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "17" } } { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "18" } } { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000} { "index" : { "_id" : "19" } } { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000} { "index" : { "_id" : "20" } } { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000} //查詢 POST employees/_search { "size":0, "aggs": { "min": { "min": { "field": "salary" } }, "max":{ "max" :{ "field": "salary" } }, "avg":{ "avg": { "field": "salary" } } } } //返回 { "took" : 111, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "avg" : { "value" : 24700.0 }, "min" : { "value" : 9000.0 }, "max" : { "value" : 50000.0 } } } # 一個聚合,輸出多值 POST employees/_search { "size": 0, "aggs": { "stats_salary": { "stats": { "field":"salary" } } } } // "aggregations" : { "stats_salary" : { "count" : 20, "min" : 9000.0, "max" : 50000.0, "avg" : 24700.0, "sum" : 494000.0 } }
- 同時檢視最低 最高 和平均工資
- 按照一定的規則,將文件分配到不同的桶中,從而達到分類的目的。ES提供的一些常見的Bucket Aggregation
- Term
- 數字型別
- Range 、Date Range
- Histogram / Data Histogram
- 支援巢狀:也就在桶裡在做分桶
- 欄位需要開啟fielddata,才能進行Terms Aggregation
- Keyword 預設支援doc_values
- Text 需要在Mapping 中 enable ,會按照分詞後的結果進行分
- Demo
- 對job 和 job.keyword 進行聚合
- 對性別進行Terms聚合
- 指定bucket size
POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field":"job.keyword" } } } } //return "aggregations" : { "jobs" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Java Programmer", "doc_count" : 7 }, { "key" : "Javascript Programmer", "doc_count" : 4 }, { "key" : "QA", "doc_count" : 3 }, { "key" : "DBA", "doc_count" : 2 }, { "key" : "Web Designer", "doc_count" : 2 }, { "key" : "Dev Manager", "doc_count" : 1 }, { "key" : "Product Manager", "doc_count" : 1 } ] } } # 對 Text 欄位開啟 fielddata,支援terms aggregation PUT employees/_mapping { "properties" : { "job":{ "type": "text", "fielddata": true } } } # 對 Text 欄位進行 terms 分詞。分詞後的terms POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field":"job" } } } } # 對job.keyword 和 job 進行 terms 聚合,分桶的總數並不一樣 POST employees/_search { "size": 0, "aggs": { "cardinate": { "cardinality": { "field": "job.keyword" } } } } # 對 性別的 keyword 進行聚合 POST employees/_search { "size": 0, "aggs": { "gender": { "terms": { "field":"gender" } } } }
- 類似SQL中的Distinct
- 應用場景:當後去分桶後,桶內最匹配的頂部文件列表
- Size :按年齡分桶,找出指定資料量的分桶資訊
- Top Hits:檢視各個工種中,年紀最大的3名員工
#指定 bucket 的 size POST employees/_search { "size": 0, "aggs": { "ages_5": { "terms": { "field":"age", "size":3 } } } } # 指定size,不同工種中,年紀最大的3個員工的具體資訊 POST employees/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job.keyword" }, "aggs":{ "old_employee":{ "top_hits": { "size": 3, "sort": [{ "age": { "order": "desc" } }] } } } } } }
- 在聚合經常發生,效能高的,索引不斷寫入
- 按照數字的範圍,進行分桶
- 在Range Aggregation中,可以自定義Key
-
Demo:
- 按照工資的Range 分桶
-
按照工資的間隔(Histogram)分桶
//Salary Ranges 分桶,可以自己定義 key POST employees/_search { "size": 0, "aggs": { "salary_range": { "range": { "field":"salary", "ranges":[ { "to":10000 }, { "from":10000, "to":20000 }, { "key":">20000", "from":20000 } ] } } } } //return "aggregations" : { "salary_range" : { "buckets" : [ { "key" : "*-10000.0", "to" : 10000.0, "doc_count" : 1 }, { "key" : "10000.0-20000.0", "from" : 10000.0, "to" : 20000.0, "doc_count" : 4 }, { "key" : ">20000", "from" : 20000.0, "doc_count" : 15 } ] } } //Salary Histogram,工資0到10萬,以 5000一個區間進行分桶 POST employees/_search { "size": 0, "aggs": { "salary_histrogram": { "histogram": { "field":"salary", "interval":10000, "extended_bounds":{ "min":0, "max":100000 } } } } }
- Bucket 聚合分析允許通過新增子聚合分析進一步分析,子聚合分析可以是
- Bucket
- Metric
- Demo
- 按照工作型別進行分桶,並統計工資資訊
- 先按照工作型別分桶,然後按性別分桶,並統計工資資訊
# 巢狀聚合1,按照工作型別分桶,並統計工資資訊 POST employees/_search { "size": 0, "aggs": { "Job_salary_stats": { "terms": { "field": "job.keyword" }, "aggs": { "salary": { "stats": { "field": "salary" } } } } } } # 多次巢狀。根據工作型別分桶,然後按照性別分桶,計算工資的統計資訊 POST employees/_search { "size": 0, "aggs": { "Job_gender_stats": { "terms": { "field": "job.keyword" }, "aggs": { "gender_stats": { "terms": { "field": "gender" }, "aggs": { "salary_stats": { "stats": { "field": "salary" } } } } } } } }
# 總結
- 聚合分析的具體語法
- 一個聚合查詢中可以包含多個聚合:每個Bucket聚合可以包含多個子聚合
- Metrix
- 單值輸出 & 多值輸出
- Bucket
- Terms & 數字範圍