ClickHouse 之 SummingMergeTree/AggregatingMergeTree 實現準實時看板(qbit)

qbit發表於2022-06-20

前言

初始化

測試資料生成

  • 建立庫

    CREATE DATABASE qbit;
  • 建立表

    CREATE TABLE download (
      userid UInt32,
      when DateTime,
      size Float32
    ) ENGINE = MergeTree 
    PARTITION BY toYYYYMM(when)
    ORDER BY(userid, when);
  • 插入資料

    INSERT INTO qbit.download
    SELECT
      rand() % 5,
      now() + number * 60 as when,
      rand() % 100000000
    FROM system.numbers
    LIMIT 10000;
  • 檢視資料

    SELECT * FROM qbit.download LIMIT 10;

    image.png

  • 檢視每天的下載量(掃描 10000 條資料

    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      count() AS cnt,
      sum(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    image.png

SummingMergeTree 準實時看板

方式一(view1)

  • 建立檢視 view1

    CREATE MATERIALIZED VIEW qbit.view1
    ENGINE = SummingMergeTree((cnt, size))
    ORDER BY (userid, day) POPULATE AS
    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      toInt64(1) AS cnt,
      toFloat64(size) AS size
    FROM qbit.download

    image.png

  • 查詢檢視 view1(可以看到跟前面的查詢結果是一致的)

    SELECT *
    FROM qbit.view1
    ORDER BY cnt DESC
    LIMIT 10

    image.png

方式二(view2)

  • 建立檢視 view2

    CREATE MATERIALIZED VIEW qbit.view2
    ENGINE = SummingMergeTree
    PARTITION BY day
    ORDER BY (userid, day) POPULATE AS
    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      count() AS cnt,
      sum(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day

    image.png

  • 查詢檢視 view2(可以看到跟前面的查詢結果是一致的)

    SELECT *
    FROM qbit.view2
    ORDER BY cnt DESC
    LIMIT 10

    image.png

AggregatingMergeTree 準實時看板

  • 建立檢視(view3)

    CREATE MATERIALIZED VIEW qbit.view3
    ENGINE = AggregatingMergeTree
    PARTITION BY day
    ORDER BY (userid, day) POPULATE AS
    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      countState() AS cnt,
      sumState(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day

    image.png

  • 查詢檢視 view3(可以看到跟前面的查詢結果是一致的)

    SELECT
      userid,
      day,
      countMerge(cnt) AS cnt,
      sumMerge(size) AS size
    FROM qbit.view3
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    image.png

插入更新資料

  • 再插入 20000 條資料

    INSERT INTO qbit.download SELECT
      rand() % 5,
      now() + (number * 60) AS when,
      rand() % 100000000
    FROM system.numbers
    LIMIT 20000
  • 檢視底表資料量
    image.png
  • 直接查詢底表統計

    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      count() AS cnt,
      sum(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    image.png

  • 查詢 view1 統計

    SELECT *
    FROM qbit.view1
    ORDER BY cnt DESC
    LIMIT 10

    查詢結果與直接查詢底表不一致,OPTIMIZE 後與底表查詢結果一致

    OPTIMIZE TABLE qbit.view1 FINAL

    如果不想強制合併彙總,也可以改變查詢語句

    SELECT
      userid,
      day,
      sum(cnt) AS cnt,
      sum(size) AS size
    FROM qbit.view1
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    可以直接 select count 驗證,即使沒有合併彙總,檢視的資料條數也會比底表的資料條數少很多
    image.png

  • 查詢 view2 統計

    SELECT *
    FROM qbit.view2
    ORDER BY cnt DESC
    LIMIT 10

    查詢結果與直接查詢底表不一致,OPTIMIZE 後與底表查詢結果一致

    OPTIMIZE TABLE qbit.view2 FINAL

    如果不想強制合併彙總,也可以改變查詢語句

    SELECT
      userid,
      day,
      sum(cnt) AS cnt,
      sum(size) AS size
    FROM qbit.view2
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10
  • 查詢 view3 統計。AggregatingMergeTree 本身用了 group by,所以不用 OPTIMIZE 可以得到最終結果。

    SELECT
      userid,
      day,
      countMerge(cnt) AS cnt,
      sumMerge(size) AS size
    FROM qbit.view3
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10
    

    image.png

參考文獻

本文出自 qbit snap

相關文章