FlinkSQL实时数仓分层架构
请描述基于Flink SQL构建实时数仓的分层架构设计,包括ODS/DWD/DWS/ADS各层的职责和实现方式。
回答
专业代码师
一、分层架构设计:
ODS (操作数据层) → DWD (明细数据层) → DWS (服务数据层) → ADS (应用数据层)
Kafka Kafka Kafka/SQL OLAP
二、各层职责:
1. ODS层(实时接入):
- 数据源:MySQL Binlog(Canal)→ Kafka、埋点日志 → Kafka
- 格式:JSON原始数据
- 表结构:
ods_xxx,保留所有原始字段
CREATE TABLE ods_order (
order_id BIGINT,
user_id BIGINT,
amount DOUBLE,
create_time TIMESTAMP(3),
proc_time AS PROCTIME()
) WITH ('connector'='kafka', 'topic'='ods_order', 'format'='json');
2. DWD层(数据清洗+明细):
- 实现:Flink SQL实时ETL
- 操作:数据清洗、字段规范化、维度补充(Lookup Join)
- 存储:Kafka(压缩格式Avro/Parquet)
INSERT INTO dwd_order_detail
SELECT o.order_id, o.user_id, o.amount,
u.user_name AS user_name,
o.create_time,DATE_FORMAT(o.create_time,'yyyyMMdd') AS dt
FROM ods_order o
JOIN dim_user FOR SYSTEM_TIME AS OF o.proc_time AS u
ON o.user_id = u.user_id;
3. DWS层(轻度聚合):
- 实现:窗口聚合(Tumble/Session)
- 粒度:分钟级/小时级多维度汇总
INSERT INTO dws_order_pv_5min
SELECT window_start, window_end, user_id,
COUNT(*) AS order_cnt, SUM(amount) AS total_amount
FROM TABLE(TUMBLE(TABLE dwd_order_detail, DESCRIPTOR(create_time), INTERVAL '5' MINUTE))
GROUP BY window_start, window_end, user_id;
4. ADS层(数据服务):
- 写入:StarRocks/ClickHouse/Doris
- 最终报表:实时大屏、BI报表、告警
INSERT INTO ads_order_dashboard
SELECT dt, COUNT(DISTINCT user_id), SUM(total_amount)
FROM dws_order_pv_5min
GROUP BY dt;
三、技术选型:
- 消息队列:Kafka(ODS/DWD/DWS)
- 计算引擎:Flink SQL
- 实时OLAP:StarRocks/Doris(对接ADS)
- 元数据:Hive Metastore(统一元数据)