Elasticsearch之分析。

孤芳不自賞發表於2017-08-31

Elasticsearch有一個功能叫做聚合(aggregations),它允許你在資料上生成複雜的分析統計。它很像SQL中的GROUP BY,但是功能更強大。

舉個例子,讓我們找到所有職員中共同點(興趣愛好)是什麼:

        GET /megacorp/employee/_search

         {

            "aggs" : {

               "all_interests" : {

                   "terms" : { "field" : "interests" }

               }

            }

         }

        暫時先忽略語法只看查詢結果:

        {

            ...

            "hits" : {...},

            "aggregations" : {

              "all_interests" : {

                  "buckets" : [

                      {

                      "key" : "music",

                      "doc_count" : 2

                       },

                     {

                      "key" : "sports",

                      "doc_count" : 1

                       }

                  ]

              }

            }

         }

       我們可以看到兩個職員對音樂有興趣,一個喜歡運動。這些資料並沒有被先計算好,它們是實時的從匹配查詢語句的文件中動態計算生成的。


找到所有姓“Smith”的人最大的共同點(興趣愛好)。

        GET      /megacorp/employee/_search

       {

          "query" : {

              "match" : {

                  "last_name" : "smith"

               }

          },

          "aggs" : {

               "all_interests" : {

                    "terms" : {

                           "field" : "interests"

                     }

                }

           }

       }


聚合也允許分級彙總。例如,讓我們統計每種興趣下職員的平均年齡:

        GET /megacorp/employee/_search

        {

              "aggs" : {

                    "all_interests" : {

                          "terms" : {"field" : "interests" },

                          "aggs" : {

                                "avg_age" : {

                                    "avg" : {"field" : "age" }

                                 }

                           }

                     }

               }

         }

        當然這次返回的聚合結果有些複雜,但仍然很容易理解:

     ...

"all_interests": {
         "buckets": [
            {
               "key": "music",
               "doc_count": 2,
               "avg_age": {
                  "value": 28.5
               }
            },
            {
               "key": "sport",
               "doc_count": 1,
               "avg_age": {
                  "value": 25
               }
            },
         ]
      }

      該聚合結果比之前的聚合結果更加豐富。我們依然得到了興趣以及數量(指具有該興趣的員工人數)的列表,但是現在每個興趣額外擁有avg_age欄位來顯示具有該興趣員工的平均年齡。

相關文章