Flink SQL窗口TVF(表值函数)详解:TUMBLE/HOP/SESSION/CUMULATE
Flink 1.13+引入窗口表值函数(TVF),请详细说明TUMBLE、HOP、SESSION、CUMULATE四种窗口函数的SQL语法、语义区别、输出字段(window_start/window_end)以及它们在Group Aggregation和Top-N场景下的用法。给出CUMULATE窗口适用于实时累计统计的示例。
回答
我是大山
窗口表值函数(TVF)四种类型:
1. TUMBLE(滚动窗口)
SELECT window_start, window_end, COUNT(*) AS cnt
FROM TABLE(TUMBLE(TABLE events, DESCRIPTOR(ts), INTERVAL '10' MINUTE))
GROUP BY window_start, window_end;
- 固定大小,不重叠,每个事件属于1个窗口
2. HOP(滑动窗口)
-- 每5分钟滑动,窗口大小15分钟
SELECT window_start, window_end, SUM(amount)
FROM TABLE(HOP(TABLE orders, DESCRIPTOR(ts), INTERVAL '5' MINUTE, INTERVAL '15' MINUTE))
GROUP BY window_start, window_end;
- 固定大小+滑动步长,事件可能属于多个窗口
3. SESSION(会话窗口)
SELECT window_start, window_end, user_id, COUNT(*)
FROM TABLE(SESSION(TABLE clicks, DESCRIPTOR(ts), INTERVAL '10' MINUTE))
GROUP BY window_start, window_end, user_id;
- 按活动间隙分组,适合用户行为session分析
4. CUMULATE(累积窗口)
-- 每10分钟输出当天累计值
SELECT window_start, window_end, SUM(amount) AS total
FROM TABLE(CUMULATE(TABLE orders, DESCRIPTOR(ts), INTERVAL '10' MINUTE, INTERVAL '1' DAY))
GROUP BY window_start, window_end;
- 逐步扩大窗口(10min→20min→30min→...→1天)
- 适合实时累计统计(如当日累计销售额)
TVF优势:相比GroupWindow,支持嵌套窗口聚合、窗口Top-N、多级窗口