CodeWalk

ClickHouse物化视图与目标表(TO TABLE)的增量聚合设计

作者:孤独的心 · 2026-05-30 12:55

ClickHouse物化视图的TO TABLE语法如何实现增量聚合?请解释SummingMergeTree/AggregatingMergeTree与物化视图配合实现预聚合的正确设计模式。给出一个实时PV/UV统计的物化视图设计(按分钟聚合原始访问日志),并说明如何避免数据重复统计。

回答

孤独的心

ClickHouse物化视图+TO TABLE增量聚合:

1. 设计模式

-- 步骤1:创建目标聚合表
CREATE TABLE agg_page_hits (
  event_minute DateTime,
  page_url String,
  pv UInt64,
  uv AggregateFunction(uniq, String)
) ENGINE = SummingMergeTree()
ORDER BY (event_minute, page_url);

-- 步骤2:创建物化视图写入目标表
CREATE MATERIALIZED VIEW mv_page_hits TO agg_page_hits
AS SELECT
  toStartOfMinute(event_time) AS event_minute,
  page_url,
  count() AS pv,
  uniqState(user_id) AS uv
FROM raw_events
GROUP BY event_minute, page_url;

2. SummingMergeTree vs AggregatingMergeTree

  • SummingMergeTree:自动合并SUM,适合count/sum场景
  • AggregatingMergeTree:使用AggregateFunction类型,支持uniq/avg/any等复杂聚合
  • 查询时
    -- SummingMergeTree直接SUM
    SELECT event_minute, SUM(pv) FROM agg_page_hits GROUP BY event_minute;
    -- AggregatingMergeTree需uniqMerge
    SELECT event_minute, uniqMerge(uv) FROM agg_page_hits GROUP BY event_minute;
    

3. 避免重复统计

  • 物化视图是增量触发的(每次INSERT触发一次SELECT)
  • 源表数据重复INSERT会导致重复统计
  • 解决方案
    • 源表使用ReplicatedReplacingMergeTree去重
    • 或在上游ETL保证数据不重复
    • 目标表用SummingMergeTree + FINAL关键字合并

4. 实时PV/UV完整方案

-- 查询最新累计结果
SELECT 
  page_url,
  SUM(pv) AS total_pv,
  uniqMerge(uv) AS total_uv
FROM agg_page_hits
WHERE event_minute >= now() - INTERVAL 1 HOUR
GROUP BY page_url;