【01】把 Elasticsearch 當資料庫使：表結構定義

TaoWen發表於2016-02-14

Elaticsearch 有非常好的查詢效能，以及非常強大的查詢語法。在一定場合下可以替代RDBMS做為OLAP的用途。但是其官方查詢語法並不是SQL，而是一種Elasticsearch獨創的DSL。主要是兩個方面的DSL：

Query DSL（https://www.elastic.co/guide/…）相當於SQL裡的 WHERE 部分，實現各種各樣的過濾文件的方式
Aggregation DSL (https://www.elastic.co/guide/… ) 相當於SQL裡的 GROUP BY 部分，實現文件按條件聚合並求一些指標（metric），比如求和求平均這些

這兩個DSL說實話是不好學習和理解的，而且即便掌握了寫起來也是比較繁瑣的，但是功能卻非常強大。本系列文章是為了兩個目的：

通過類比SQL的概念，實驗並學習Elasticsearch聚合DSL的語法和語義
用 python 實現一個翻譯器，能夠使用 SQL 來完成 Elasticsearch 聚合DSL一樣的功能。這個小指令碼可以在日常工作中做為一件方便的利器

基礎Elasticsearch知識（比如什麼是文件，什麼是索引）這裡就不贅述了。我們的重點是學習其查詢和聚合的語法。在本章中，我們先來準備好樣本資料。選擇的樣本資料是全美的股票列表（http://www.nasdaq.com/screeni…）。選擇這份資料的原因是因為其維度比較豐富（ipo年份，版塊，交易所等），而且有數字欄位用於聚合（最近報價，總市值）。資料下載為csv格式（https://github.com/taowen/es-…），並且有一個匯入指令碼（https://github.com/taowen/es-…）

下面是匯入Elasticsearch的mapping（相當於關係型資料庫的表結構定義）：

{
    "symbol": {
        "properties": {
            "sector": {
                "index": "not_analyzed", 
                "type": "string"
            }, 
            "market_cap": {
                "index": "not_analyzed", 
                "type": "long"
            }, 
            "name": {
                "index": "analyzed", 
                "type": "string"
            }, 
            "ipo_year": {
                "index": "not_analyzed", 
                "type": "integer"
            }, 
            "exchange": {
                "index": "not_analyzed", 
                "type": "string"
            }, 
            "symbol": {
                "index": "not_analyzed", 
                "type": "string"
            }, 
            "last_sale": {
                "index": "not_analyzed", 
                "type": "long"
            }, 
            "industry": {
                "index": "not_analyzed", 
                "type": "string"
            }
        }, 
        "_source": {
            "enabled": true
        }, 
        "_all": {
            "enabled": false
        }
    }
}

對於把 Elasticsearch 當作資料庫來使用，預設以下幾個設定

把所有欄位設定為 not_analyzed
_source 開啟，這樣就不用零散地儲存每個欄位了，大部分情況下這樣更高效
_all 關閉，因為檢索都是基於 k=v 這樣欄位已知的查詢的

執行python import-symbol.py匯入完成資料之後，執行

GET http://127.0.0.1:9200/symbol/_count

{"count":6714,"_shards":{"total":3,"successful":3,"failed":0}}

可以看到文件已經被匯入索引了。除了匯入一個股票的列表，我們還可以把歷史的股價給匯入到資料庫中。這個資料比較大，放在了網盤上下載（https://yunpan.cn/cxRN6gLX7f9md 訪問密碼 571c）(http://pan.baidu.com/s/1nufbLMx 訪問密碼 bes2)。執行python import-quote.py 匯入

 "quote": {
    "_all": {
      "enabled": false
    },
    "_source": {
      "enabled": true
    }, 
    "properties": {
      "date": {
        "format": "strict_date_optional_time||epoch_millis",
        "type": "date"
      },
      "volume": {
        "type": "long"
      },
      "symbol": {
        "index": "not_analyzed",
        "type": "string"
      },
      "high": {
        "type": "long"
      },
      "low": {
        "type": "long"
      },
      "adj_close": {
        "type": "long"
      },
      "close": {
        "type": "long"
      },
      "open": {
        "type": "long"
      }
    }
  }

從 mapping 的角度，和表結構定義是非常類似的。除了_source，_all和analyzed這幾個概念，基本上沒有什麼差異。Elasticsearch做為資料庫最大的區別是 index/mapping 的關係，以及 index 通配這些。

Salt Highstate資料結構定義
2019-08-14
資料結構
Agile PLM資料庫表結構(Oracle)
2024-05-11
資料庫Oracle
資料庫-單表結構-建表語句
2024-08-28
資料庫
JavaScript資料結構01 - 陣列
2018-07-25
JavaScript資料結構陣列
[20191227]別把資料庫當作垃圾場.txt
2019-12-27
資料庫
把Github當作資料庫，搭建部落格
2021-04-08
Github資料庫
資料庫架構和物件、定義資料完整性-SQL Server
2021-09-09
資料庫架構物件SQLServer
織夢資料庫表結構_Dedecms資料庫表和欄位詳細介紹
2024-10-05
資料庫
《大話資料結構》Swift-01
2018-03-14
資料結構Swift
mysql資料庫-資料結構
2021-06-27
MySql資料庫資料結構
資料庫崩潰恢復表結構的方法
2018-07-14
資料庫
DBus資料庫表結構變更處理方案
2019-07-25
資料庫
超給力，一鍵生成資料庫文件-資料庫表結構逆向工程
2020-08-04
資料庫
SQL Server 批量生成資料庫內多個表的表結構
2020-11-21
SQLServer資料庫
如何用PLSQL匯出資料庫存表結構資訊
2020-03-01
SQL資料庫
如何定義一個自帶資料區的結構體：三種資料結構體的比較
2021-07-22
結構體資料結構
Mybatis實現分包定義資料庫
2022-01-09
MyBatis資料庫
database資料庫的資料結構
2021-12-13
Database資料庫資料結構
自定義響應資料結構
2020-10-06
資料結構
如何比較兩個資料庫表結構的不同
2018-04-19
資料庫
資料結構——線性表
2019-01-10
資料結構
資料結構——雜湊表
2019-03-04
資料結構
資料結構 | 線性表
2019-04-12
資料結構
資料結構-線性表
2018-10-15
資料結構
資料結構—線性表
2018-07-20
資料結構
[資料結構] - 線性表
2019-05-09
資料結構
01 | 資料庫概述
2020-12-31
資料庫
幽默：不喜歡ORM的原因：憑什麼讓應用程式定義資料表結構？資料庫壽命比應用更長啊 - Michael
2020-05-31
ORM資料庫
【筆記】-《Redis實戰》- 01 Redis資料結構
2018-12-13
筆記Redis資料結構
通過命令在navicat中建立資料庫及表結構
2018-06-28
資料庫
幾千萬記錄，資料庫表結構如何平滑變更？
2019-10-16
資料庫
資料結構 - 線性表 - 順序表
2024-03-31
資料結構
資料庫分庫分表的總結
2019-02-16
資料庫
資料結構筆記——二叉樹的定義和性質
2018-05-02
資料結構筆記二叉樹
資料結構（一）--- 跳躍表
2019-08-01
資料結構
資料結構進階：ST表
2020-08-05
資料結構
資料結構之「雜湊表」
2019-03-23
資料結構
資料結構 - 雜湊表，初探
2024-10-27
資料結構
Redis資料結構—跳躍表
2021-05-15
Redis資料結構

【01】把 Elasticsearch 當資料庫使：表結構定義

相關文章