前言
- 本文測試環境
ClickHouse
版本為22.3 LTS
SummingMergeTree
官方文件:https://clickhouse.com/docs/e...AggregatingMergeTree
官方文件:https://clickhouse.com/docs/e...- 原理就是預聚合,可以把統計時掃表條數減少幾個數量級。有點
flink
流式計算的味道。
初始化
測試資料生成
建立庫
CREATE DATABASE qbit;
建立表
CREATE TABLE download ( userid UInt32, when DateTime, size Float32 ) ENGINE = MergeTree PARTITION BY toYYYYMM(when) ORDER BY(userid, when);
插入資料
INSERT INTO qbit.download SELECT rand() % 5, now() + number * 60 as when, rand() % 100000000 FROM system.numbers LIMIT 10000;
檢視資料
SELECT * FROM qbit.download LIMIT 10;
檢視每天的下載量(掃描 10000 條資料)
SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
SummingMergeTree 準實時看板
方式一(view1)
建立檢視
view1
CREATE MATERIALIZED VIEW qbit.view1 ENGINE = SummingMergeTree((cnt, size)) ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, toInt64(1) AS cnt, toFloat64(size) AS size FROM qbit.download
查詢檢視
view1
(可以看到跟前面的查詢結果是一致的)SELECT * FROM qbit.view1 ORDER BY cnt DESC LIMIT 10
方式二(view2)
建立檢視
view2
CREATE MATERIALIZED VIEW qbit.view2 ENGINE = SummingMergeTree PARTITION BY day ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day
查詢檢視
view2
(可以看到跟前面的查詢結果是一致的)SELECT * FROM qbit.view2 ORDER BY cnt DESC LIMIT 10
AggregatingMergeTree 準實時看板
建立檢視(view3)
CREATE MATERIALIZED VIEW qbit.view3 ENGINE = AggregatingMergeTree PARTITION BY day ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, countState() AS cnt, sumState(size) AS size FROM qbit.download GROUP BY userid, day
查詢檢視
view3
(可以看到跟前面的查詢結果是一致的)SELECT userid, day, countMerge(cnt) AS cnt, sumMerge(size) AS size FROM qbit.view3 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
插入更新資料
再插入 20000 條資料
INSERT INTO qbit.download SELECT rand() % 5, now() + (number * 60) AS when, rand() % 100000000 FROM system.numbers LIMIT 20000
- 檢視底表資料量
直接查詢底表統計
SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
查詢
view1
統計SELECT * FROM qbit.view1 ORDER BY cnt DESC LIMIT 10
查詢結果與直接查詢底表不一致,
OPTIMIZE
後與底表查詢結果一致OPTIMIZE TABLE qbit.view1 FINAL
如果不想強制合併彙總,也可以改變查詢語句
SELECT userid, day, sum(cnt) AS cnt, sum(size) AS size FROM qbit.view1 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
可以直接
select count
驗證,即使沒有合併彙總,檢視的資料條數也會比底表的資料條數少很多查詢
view2
統計SELECT * FROM qbit.view2 ORDER BY cnt DESC LIMIT 10
查詢結果與直接查詢底表不一致,
OPTIMIZE
後與底表查詢結果一致OPTIMIZE TABLE qbit.view2 FINAL
如果不想強制合併彙總,也可以改變查詢語句
SELECT userid, day, sum(cnt) AS cnt, sum(size) AS size FROM qbit.view2 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
查詢
view3
統計。AggregatingMergeTree
本身用了group by
,所以不用OPTIMIZE
可以得到最終結果。SELECT userid, day, countMerge(cnt) AS cnt, sumMerge(size) AS size FROM qbit.view3 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
參考文獻
- ClickHouse Materialized Views Illuminated, Part 1
- ClickHouse 通過 SummingMergeTree 優化查詢
- AggregatingMergeTree 等價替換 SummingMergeTree
本文出自 qbit snap