【08】把 Elasticsearch 當資料庫使：計算後再聚合

TaoWen發表於2016-02-20

使用 https://github.com/taowen/es-monitor 可以用 SQL 進行 elasticsearch 的查詢。前面我們瞭解到在聚合之前可以做很多種變換，然後把變換之後的key再去分桶。這種變換的終極形式就是自定義表示式，當然自定義表示式的效率也是最低的。

GROUP BY ipo_year % AS ipo_year_rem

SQL

$ cat << EOF | ./es_query.py http://127.0.0.1:9200
SELECT ipo_year_rem, COUNT(*) FROM symbol GROUP BY ipo_year % 5 AS ipo_year_rem  
EOF

{"COUNT(*)": 715, "ipo_year_rem": 4.0}
{"COUNT(*)": 677, "ipo_year_rem": 0.0}
{"COUNT(*)": 537, "ipo_year_rem": 2.0}
{"COUNT(*)": 523, "ipo_year_rem": 3.0}
{"COUNT(*)": 446, "ipo_year_rem": 1.0}

Elasticsearch

{
  "aggs": {
    "ipo_year_rem": {
      "terms": {
        "field": "ipo_year", 
        "size": 0, 
        "script": {
          "lang": "expression", 
          "inline": "_value % 5"
        }
      }, 
      "aggs": {}
    }
  }, 
  "size": 0
}

{
  "hits": {
    "hits": [], 
    "total": 6714, 
    "max_score": 0.0
  }, 
  "_shards": {
    "successful": 1, 
    "failed": 0, 
    "total": 1
  }, 
  "took": 2, 
  "aggregations": {
    "ipo_year_rem": {
      "buckets": [
        {
          "key": 4.0, 
          "doc_count": 715
        }, 
        {
          "key": 0.0, 
          "doc_count": 677
        }, 
        {
          "key": 2.0, 
          "doc_count": 537
        }, 
        {
          "key": 3.0, 
          "doc_count": 523
        }, 
        {
          "key": 1.0, 
          "doc_count": 446
        }
      ], 
      "sum_other_doc_count": 0, 
      "doc_count_error_upper_bound": 0
    }
  }, 
  "timed_out": false
}

Profile

[
  {
    "query": [
      {
        "query_type": "MatchAllDocsQuery",
        "lucene": "*:*",
        "time": "1.046876000ms",
        "breakdown": {
          "score": 0,
          "create_weight": 16550,
          "next_doc": 835828,
          "match": 0,
          "build_scorer": 194498,
          "advance": 0
        }
      }
    ],
    "rewrite_time": 8693,
    "collector": [
      {
        "name": "MultiCollector",
        "reason": "search_multi",
        "time": "2.994827000ms",
        "children": [
          {
            "name": "TotalHitCountCollector",
            "reason": "search_count",
            "time": "0.3955230000ms"
          },
          {
            "name": "DoubleTermsAggregator: [ipo_year_rem]",
            "reason": "aggregation",
            "time": "0.9664500000ms"
          }
        ]
      }
    ]
  }
]

GROUP BY floor(market_cap / last_sale)

SQL

$ cat << EOF | ./es_query.py http://127.0.0.1:9200
SELECT shares_count, COUNT(*) FROM symbol GROUP BY floor(market_cap / last_sale / 1000000)  AS shares_count ORDER BY shares_count LIMIT 3
EOF

{"shares_count": "0.0", "COUNT(*)": 6007}
{"shares_count": "1.0", "COUNT(*)": 328}
{"shares_count": "10.0", "COUNT(*)": 6}

Elasticsearch

{
  "aggs": {
    "shares_count": {
      "terms": {
        "size": 3, 
        "order": {
          "_term": "asc"
        }, 
        "script": {
          "lang": "expression", 
          "inline": "floor(doc[`market_cap`].value / doc[`last_sale`].value / 1000000)"
        }
      }, 
      "aggs": {}
    }
  }, 
  "size": 0
}

如果引用了多個欄位，則無法使用_value，而只能用doc[`market_cap`].value 這樣來引用欄位。

{
  "hits": {
    "hits": [], 
    "total": 6714, 
    "max_score": 0.0
  }, 
  "_shards": {
    "successful": 1, 
    "failed": 0, 
    "total": 1
  }, 
  "took": 6, 
  "aggregations": {
    "shares_count": {
      "buckets": [
        {
          "key": "0.0", 
          "doc_count": 6007
        }, 
        {
          "key": "1.0", 
          "doc_count": 328
        }, 
        {
          "key": "10.0", 
          "doc_count": 6
        }
      ], 
      "sum_other_doc_count": 373, 
      "doc_count_error_upper_bound": 0
    }
  }, 
  "timed_out": false
}

Profile

[
  {
    "query": [
      {
        "query_type": "MatchAllDocsQuery",
        "lucene": "*:*",
        "time": "0.5422700000ms",
        "breakdown": {
          "score": 0,
          "create_weight": 14630,
          "next_doc": 475299,
          "match": 0,
          "build_scorer": 52341,
          "advance": 0
        }
      }
    ],
    "rewrite_time": 5085,
    "collector": [
      {
        "name": "MultiCollector",
        "reason": "search_multi",
        "time": "7.627612000ms",
        "children": [
          {
            "name": "TotalHitCountCollector",
            "reason": "search_count",
            "time": "0.4945110000ms"
          },
          {
            "name": "StringTermsAggregator: [shares_count]",
            "reason": "aggregation",
            "time": "5.193886000ms"
          }
        ]
      }
    ]
  }
]

可以看到這裡使用的是 StringTermsAggregator，這說明了計算結果是用字串來排序的。

[20191227]別把資料庫當作垃圾場.txt
2019-12-27
資料庫
把Github當作資料庫，搭建部落格
2021-04-08
Github資料庫
資料庫同步 Elasticsearch 後資料不一致，怎麼辦？
2023-04-18
資料庫Elasticsearch
你的企業把資料當資產了嗎？
2022-09-16
Elasticsearch資料庫 | Elasticsearch-7.5.0應用搭建實戰
2020-11-06
Elasticsearch資料庫
【ElasticSearch】給ElasticSearch資料庫配置慢查詢日誌
2021-06-18
Elasticsearch資料庫
ElasticSearch + Logstash進行資料庫同步
2019-04-01
Elasticsearch資料庫
資料庫安全-ElasticSearch漏洞復現
2024-08-09
資料庫Elasticsearch
陽振坤：資料庫天然選擇了計算機，但計算機天然並不適合資料庫
2019-03-04
資料庫計算機
如何使資料庫中取出的資料保持原有格式(轉)
2019-04-06
資料庫
小白級別，一臺計算機如何把資料傳送給另一臺計算機
2020-08-16
計算機
Elasticsearch資料庫 | Elasticsearch-7.5.0應用基礎實戰
2020-09-27
Elasticsearch資料庫
亞信安慧AntDB資料庫與流式計算
2024-01-23
資料庫
資料庫-SQL_duckdb向量化計算-vector
2024-07-15
資料庫SQL
Prometheus時序資料庫-報警的計算
2021-03-31
Prometheus資料庫
記一次 oracle 資料庫在當機後的恢復
2020-12-19
Oracle資料庫
大資料計算生態之資料計算（二）
2020-11-15
大資料
大資料計算生態之資料計算（一）
2020-11-15
大資料
如何把每日明細資料累計後按分類儲存
2020-12-25
當MySQL資料庫遇到Syn Flooding
2019-06-17
MySql資料庫
終於有人把雲端計算、大資料和 AI 講明白了
2018-11-21
大資料AI
【資料庫設計】資料庫的設計
2018-06-21
資料庫
那些把公司當家的程式設計師，後來怎麼樣了？
2020-12-04
程式設計師
使用scrapy框架把資料非同步寫入資料庫
2018-07-16
框架非同步資料庫
Elasticsearch和向量資料庫的快速入門
2024-09-15
Elasticsearch資料庫
Python3爬蟲資料入資料庫---把爬取到的資料存到資料庫，帶資料庫去重功能
2018-10-22
Python爬蟲資料庫
使用canal增量同步mysql資料庫資訊到ElasticSearch
2019-06-22
MySql資料庫Elasticsearch
將Flink計算完畢後的資料Sink到Nebula
2021-07-03
vb6 access資料庫當機
2020-10-15
資料庫
Salesforce和SAP Netweaver裡資料庫表的後設資料設計
2019-02-14
Salesforce資料庫
亞信安慧AntDB 資料庫：超融合資料庫引領實時計算新時代
2023-12-19
資料庫
Serverless 解惑——函式計算如何訪問 Redis 資料庫
2020-02-19
Server函式Redis資料庫
Serverless 解惑——函式計算如何訪問 Mongo 資料庫
2020-02-20
Server函式Go資料庫
雲端計算智慧化：讓資料庫更聰明
2020-06-16
資料庫
Serverless 解惑——函式計算如何訪問 MySQL 資料庫
2020-01-09
Server函式MySql資料庫
Serverless 解惑——函式計算如何訪問 PostgreSQL 資料庫
2020-04-26
Server函式SQL資料庫
計算資料庫中所有表的記錄條數
2024-07-27
資料庫
探秘資料庫中的平行計算技術應用
2024-07-01
資料庫
一文讀懂一臺計算機是如何把資料傳送給另外一臺計算機的
2018-07-27
計算機

【08】把 Elasticsearch 當資料庫使：計算後再聚合

GROUP BY ipo_year % AS ipo_year_rem

GROUP BY floor(market_cap / last_sale)

相關文章