druid相關的時間序列資料庫——也用到了倒排相關的優化技術
Cattell [6] maintains a great summary about existing Scalable SQL and NoSQL data stores. Hu [18] contributed another great summary for streaming databases. Druid feature-wise sits some-
where between Google’s Dremel [28] and PowerDrill [17]. Druid has most of the features implemented in Dremel (Dremel handles arbitrary nested data structures while Druid only allows for a single
level of array-based nesting) and many of the interesting compression algorithms mentioned in PowerDrill. Although Druid builds on many of the same principles as other distributed columnar data stores [15], many of these data stores are
designed to be more generic key-value stores [23] and do not sup
port computation directly in the storage layer. There are also other
data stores designed for some of the same data warehousing issues
that Druid is meant to solve. These systems include in-memory
databases such as SAP’s HANA [14] and VoltDB [43]. These data
stores lack Druid’slowlatency ingestion characteristics. Druidalso
has native analytical features baked in, similar to ParAccel [34],
however, Druid allows system wide rolling software updates with
no downtime.
Druid is similiar to C-Store [38] and LazyBase [8] in that it has
twosubsystems,aread-optimizedsubsysteminthehistoricalnodes
andawrite-optimizedsubsysteminreal-timenodes. Real-timenodes
are designed to ingest a high volume of append heavy data, and do
not support data updates. Unlike the two aforementioned systems,
Druid is meant for OLAP transactions and not OLTP transactions.
Druid’s low latency data ingestion features share some similar-
ities with Trident/Storm [27] and Spark Streaming [45], however,
both systems are focused on stream processing whereas Druid is
focused on ingestion and aggregation. Stream processors are great
complements to Druid as a means of pre-processing the data before
the data enters Druid.
There are a class of systems that specialize in queries on top of
cluster computing frameworks. Shark [13] is such a system for
queriesontopofSpark,andCloudera’sImpala[9]isanothersystem
focused on optimizing query performance on top of HDFS. Druid
historical nodes download data locally and only work with native
Druid indexes. We believe this setup allows for faster query laten
cies.
Druid leverages a unique combination of algorithms in its archi-
tecture. Although we believe no other data store has the same set
of functionality as Druid, some of Druid’s optimization techniques
suchas using inverted indices to perform fast filter sarealsousedin
other data stores [26].
druid白皮書:http://static.druid.io/docs/druid.pdf
本文轉自張昺華-sky部落格園部落格,原文連結:http://www.cnblogs.com/bonelee/p/6433333.html,如需轉載請自行聯絡原作者
相關文章
- 資料庫效能優化-索引與sql相關優化資料庫優化索引SQL
- 時間相關的操作
- 大資料相關技術有哪些?大資料
- 時間相關的工具類
- 【OPTIMIZATION】Oracle影響優化器選擇的相關技術Oracle優化
- Mysql的優化的相關知識MySql優化
- 資料庫 (相關練習)資料庫
- MSSQL系列 (一):資料庫的相關操作SQL資料庫
- ios效能優化相關iOS優化
- MySQL資料庫部署及初始化相關MySql資料庫
- 資料庫事物相關問題資料庫
- python 時間相關模組Python
- 區塊鏈(BlockChain)技術開發相關資料區塊鏈Blockchain
- 有關動態規劃的相關優化思想動態規劃優化
- 倒排索引及ES相關概念對比MySQL索引MySql
- Hive優化相關設定Hive優化
- 記憶體優化相關記憶體優化
- 強化學習相關資料強化學習
- 驗證碼的作用和相關技術
- 樹的相關術語
- SCM通道模型和SCME通道模型的matlab特性模擬,對比空間相關性,時間相關性,頻率相關性模型Matlab
- Java Record 的一些思考 - 序列化相關Java
- SAP CRM One Order header資料庫表幾個和時間戳相關的欄位Header資料庫時間戳
- 『現學現忘』Docker相關概念 — 8、虛擬化技術和容器技術的關係Docker
- 技術乾貨 | 解鎖Redis 時間序列資料的應用Redis
- 【PG管理】postgresql資料庫管理相關SQL資料庫
- 資料庫相關知識點提要資料庫
- oracle臨時表空間相關Oracle
- 時間函式:與時間相關那些事。。。函式
- 蒐集到的Weex 相關資料
- 運維相關的資料整理運維
- Linux技術相關命令有哪些Linux
- 微服務框架相關技術整理微服務框架
- 資料的相關性或因果關係 - KDnuggets
- 圖解Linux的IO模型和相關技術圖解Linux模型
- vue相關的UI元件庫VueUI元件
- 時間序列化資料庫選型?時序資料庫的選擇?資料庫
- MySQL 資料庫相關流程圖 / 原理圖MySql資料庫流程圖
- Oracle undo保留時間的幾個相關引數Oracle