ES 筆記三十八:Bucket & Metric 聚合分析及巢狀聚合

CrazyZard發表於2019-12-28
  • Metric 一些系列的統計方法
  • Bucket 一組滿足條件的文件

ES 筆記三十八:Bucket & Metric 聚合分析及巢狀聚合

  • Aggregation屬於Search 的一部分。一般情況下,建議將其Size指定為0

ES 筆記三十八:Bucket & Metric 聚合分析及巢狀聚合

例子

ES 筆記三十八:Bucket & Metric 聚合分析及巢狀聚合

  • 單值分析:只輸出一個分析結果
    • min,max,avg,sum
    • Cardinality(類似 distinct Count)
  • 多值分析:輸出多個分析結果
    • stats ,extended stats
    • percentile, percentile rank
    • top hits (排在前面的示例)
  • 檢視最低工資
  • 檢視最高工資
  • 一個聚合輸出多個值
  • 一次查詢包含多個聚合
    • 同時檢視最低 最高 和平均工資
      PUT /employees/
      {
      "mappings" : {
      "properties" : {
      "age" : {
        "type" : "integer"
      },
      "gender" : {
        "type" : "keyword"
      },
      "job" : {
        "type" : "text",
        "fields" : {
          "keyword" : {
            "type" : "keyword",
            "ignore_above" : 50
          }
        }
      },
      "name" : {
        "type" : "keyword"
      },
      "salary" : {
        "type" : "integer"
      }
      }
      }
      }
      PUT /employees/_bulk
      { "index" : {  "_id" : "1" } }
      { "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
      { "index" : {  "_id" : "2" } }
      { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
      { "index" : {  "_id" : "3" } }
      { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
      { "index" : {  "_id" : "4" } }
      { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
      { "index" : {  "_id" : "5" } }
      { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
      { "index" : {  "_id" : "6" } }
      { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
      { "index" : {  "_id" : "7" } }
      { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
      { "index" : {  "_id" : "8" } }
      { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
      { "index" : {  "_id" : "9" } }
      { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
      { "index" : {  "_id" : "10" } }
      { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
      { "index" : {  "_id" : "11" } }
      { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
      { "index" : {  "_id" : "12" } }
      { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
      { "index" : {  "_id" : "13" } }
      { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
      { "index" : {  "_id" : "14" } }
      { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
      { "index" : {  "_id" : "15" } }
      { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
      { "index" : {  "_id" : "16" } }
      { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
      { "index" : {  "_id" : "17" } }
      { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
      { "index" : {  "_id" : "18" } }
      { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
      { "index" : {  "_id" : "19" } }
      { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
      { "index" : {  "_id" : "20" } }
      { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
      //查詢
      POST employees/_search
      {
      "size":0,
      "aggs": {
      "min": {
      "min": {
      "field": "salary"
      }
      },
      "max":{
      "max" :{
      "field": "salary"
      }
      },
      "avg":{
      "avg": {
      "field": "salary"
      }
      }
      }
      }
      //返回
      {
      "took" : 111,
      "timed_out" : false,
      "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
      },
      "hits" : {
      "total" : {
      "value" : 20,
      "relation" : "eq"
      },
      "max_score" : null,
      "hits" : [ ]
      },
      "aggregations" : {
      "avg" : {
      "value" : 24700.0
      },
      "min" : {
      "value" : 9000.0
      },
      "max" : {
      "value" : 50000.0
      }
      }
      }
      # 一個聚合,輸出多值
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "stats_salary": {
      "stats": {
      "field":"salary"
      }
      }
      }
      }
      //
      "aggregations" : {
      "stats_salary" : {
      "count" : 20,
      "min" : 9000.0,
      "max" : 50000.0,
      "avg" : 24700.0,
      "sum" : 494000.0
      }
      }
  • 按照一定的規則,將文件分配到不同的桶中,從而達到分類的目的。ES提供的一些常見的Bucket Aggregation
    • Term
    • 數字型別
      • Range 、Date Range
      • Histogram / Data Histogram
  • 支援巢狀:也就在桶裡在做分桶

ES 筆記三十八:Bucket & Metric 聚合分析及巢狀聚合

  • 欄位需要開啟fielddata,才能進行Terms Aggregation
    • Keyword 預設支援doc_values
    • Text 需要在Mapping 中 enable ,會按照分詞後的結果進行分
  • Demo
    • 對job 和 job.keyword 進行聚合
    • 對性別進行Terms聚合
    • 指定bucket size
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "jobs": {
      "terms": {
      "field":"job.keyword"
      }
      }
      }
      }
      //return 
      "aggregations" : {
      "jobs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
      {
        "key" : "Java Programmer",
        "doc_count" : 7
      },
      {
        "key" : "Javascript Programmer",
        "doc_count" : 4
      },
      {
        "key" : "QA",
        "doc_count" : 3
      },
      {
        "key" : "DBA",
        "doc_count" : 2
      },
      {
        "key" : "Web Designer",
        "doc_count" : 2
      },
      {
        "key" : "Dev Manager",
        "doc_count" : 1
      },
      {
        "key" : "Product Manager",
        "doc_count" : 1
      }
      ]
      }
      }
      # 對 Text 欄位開啟 fielddata,支援terms aggregation
      PUT employees/_mapping
      {
      "properties" : {
      "job":{
         "type":     "text",
         "fielddata": true
      }
      }
      }
      # 對 Text 欄位進行 terms 分詞。分詞後的terms
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "jobs": {
        "terms": {
          "field":"job"
        }
      }
      }
      }
      # 對job.keyword 和 job 進行 terms 聚合,分桶的總數並不一樣
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "cardinate": {
        "cardinality": {
          "field": "job.keyword"
        }
      }
      }
      }
      # 對 性別的 keyword 進行聚合
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "gender": {
        "terms": {
          "field":"gender"
        }
      }
      }
      }
  • 類似SQL中的Distinct
  • 應用場景:當後去分桶後,桶內最匹配的頂部文件列表
  • Size :按年齡分桶,找出指定資料量的分桶資訊
  • Top Hits:檢視各個工種中,年紀最大的3名員工
    #指定 bucket 的 size
    POST employees/_search
    {
      "size": 0,
      "aggs": {
        "ages_5": {
          "terms": {
            "field":"age",
            "size":3
          }
        }
      }
    }
    # 指定size,不同工種中,年紀最大的3個員工的具體資訊
    POST employees/_search
    {
    "size": 0,
    "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword"
      },
    "aggs":{
      "old_employee":{
        "top_hits": {
          "size": 3,
          "sort": [{
            "age": {
              "order": "desc"
            }
          }]
        }
      }
    }
    }
    }
    }
  • 在聚合經常發生,效能高的,索引不斷寫入
    ES 筆記三十八:Bucket & Metric 聚合分析及巢狀聚合
  • 按照數字的範圍,進行分桶
  • 在Range Aggregation中,可以自定義Key
  • Demo:

    • 按照工資的Range 分桶
    • 按照工資的間隔(Histogram)分桶

      //Salary Ranges 分桶,可以自己定義 key
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "salary_range": {
      "range": {
      "field":"salary",
      "ranges":[
        {
          "to":10000
        },
        {
          "from":10000,
          "to":20000
        },
        {
          "key":">20000",
          "from":20000
        }
      ]
      }
      }
      }
      }
      //return 
      "aggregations" : {
      "salary_range" : {
      "buckets" : [
      {
        "key" : "*-10000.0",
        "to" : 10000.0,
        "doc_count" : 1
      },
      {
        "key" : "10000.0-20000.0",
        "from" : 10000.0,
        "to" : 20000.0,
        "doc_count" : 4
      },
      {
        "key" : ">20000",
        "from" : 20000.0,
        "doc_count" : 15
      }
      ]
      }
      }
      //Salary Histogram,工資0到10萬,以 5000一個區間進行分桶
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "salary_histrogram": {
      "histogram": {
      "field":"salary",
      "interval":10000,
      "extended_bounds":{
        "min":0,
        "max":100000
      
      }
      }
      }
      }
      }
  • Bucket 聚合分析允許通過新增子聚合分析進一步分析,子聚合分析可以是
    • Bucket
    • Metric
  • Demo
    • 按照工作型別進行分桶,並統計工資資訊
    • 先按照工作型別分桶,然後按性別分桶,並統計工資資訊
      
      # 巢狀聚合1,按照工作型別分桶,並統計工資資訊
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "Job_salary_stats": {
      "terms": {
      "field": "job.keyword"
      },
      "aggs": {
      "salary": {
        "stats": {
          "field": "salary"
        }
      }
      }
      }
      }
      }
      # 多次巢狀。根據工作型別分桶,然後按照性別分桶,計算工資的統計資訊
      POST employees/_search
      {
      "size": 0,
      "aggs": {
      "Job_gender_stats": {
      "terms": {
      "field": "job.keyword"
      },
      "aggs": {
      "gender_stats": {
        "terms": {
          "field": "gender"
        },
        "aggs": {
          "salary_stats": {
            "stats": {
              "field": "salary"
            }
          }
        }
      }
      }
      }
      }
      }

# 總結
- 聚合分析的具體語法
    - 一個聚合查詢中可以包含多個聚合:每個Bucket聚合可以包含多個子聚合
- Metrix
    - 單值輸出 & 多值輸出
- Bucket
    - Terms & 數字範圍

快樂就是解決一個又一個的問題!

相關文章