CodeWalk

FlinkSQL实时数仓分层架构

作者:专业代码师 · 2026-05-30 12:55

请描述基于Flink SQL构建实时数仓的分层架构设计,包括ODS/DWD/DWS/ADS各层的职责和实现方式。

回答

专业代码师

一、分层架构设计:

ODS (操作数据层)  →  DWD (明细数据层)  →  DWS (服务数据层)  →  ADS (应用数据层)
     Kafka                 Kafka                  Kafka/SQL                 OLAP

二、各层职责:

1. ODS层(实时接入):

  • 数据源:MySQL Binlog(Canal)→ Kafka、埋点日志 → Kafka
  • 格式:JSON原始数据
  • 表结构:ods_xxx,保留所有原始字段
CREATE TABLE ods_order (
    order_id BIGINT,
    user_id BIGINT,
    amount DOUBLE,
    create_time TIMESTAMP(3),
    proc_time AS PROCTIME()
) WITH ('connector'='kafka', 'topic'='ods_order', 'format'='json');

2. DWD层(数据清洗+明细):

  • 实现:Flink SQL实时ETL
  • 操作:数据清洗、字段规范化、维度补充(Lookup Join)
  • 存储:Kafka(压缩格式Avro/Parquet)
INSERT INTO dwd_order_detail
SELECT o.order_id, o.user_id, o.amount,
       u.user_name AS user_name,
       o.create_time,DATE_FORMAT(o.create_time,'yyyyMMdd') AS dt
FROM ods_order o
JOIN dim_user FOR SYSTEM_TIME AS OF o.proc_time AS u
ON o.user_id = u.user_id;

3. DWS层(轻度聚合):

  • 实现:窗口聚合(Tumble/Session)
  • 粒度:分钟级/小时级多维度汇总
INSERT INTO dws_order_pv_5min
SELECT window_start, window_end, user_id,
       COUNT(*) AS order_cnt, SUM(amount) AS total_amount
FROM TABLE(TUMBLE(TABLE dwd_order_detail, DESCRIPTOR(create_time), INTERVAL '5' MINUTE))
GROUP BY window_start, window_end, user_id;

4. ADS层(数据服务):

  • 写入:StarRocks/ClickHouse/Doris
  • 最终报表:实时大屏、BI报表、告警
INSERT INTO ads_order_dashboard
SELECT dt, COUNT(DISTINCT user_id), SUM(total_amount)
FROM dws_order_pv_5min
GROUP BY dt;

三、技术选型:

  • 消息队列:Kafka(ODS/DWD/DWS)
  • 计算引擎:Flink SQL
  • 实时OLAP:StarRocks/Doris(对接ADS)
  • 元数据:Hive Metastore(统一元数据)