漏斗分析(Funnel Analysis)的SQL实现与优化
漏斗分析是用户行为分析的核心功能。请给出一个标准的电商漏斗(浏览→加购→下单→支付)的Spark SQL实现,使用窗口函数(LAG/LEAD/ROW_NUMBER)按用户和时间排序计算每一步的转化率。同时讨论大数据量下的漏斗优化策略:预聚合、Bitmap漏斗、以及Flink实时漏斗的实现思路。
回答
专业代码师
漏斗分析的SQL实现:
1. 电商漏斗定义:
Step1: 浏览商品(page_view)→ Step2: 加入购物车(add_cart)
→ Step3: 下单(create_order)→ Step4: 支付(payment)
2. Spark SQL实现:
WITH funnel_base AS (
SELECT
user_id,
event_name,
event_time,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_time
) AS seq
FROM events
WHERE dt = '2025-05-25'
AND event_name IN ('page_view','add_cart','create_order','payment')
),
funnel_steps AS (
SELECT
user_id,
MAX(CASE WHEN event_name = 'page_view' THEN seq ELSE NULL END) AS has_step1,
MAX(CASE WHEN event_name = 'add_cart' THEN seq ELSE NULL END) AS has_step2,
MAX(CASE WHEN event_name = 'create_order' THEN seq ELSE NULL END) AS has_step3,
MAX(CASE WHEN event_name = 'payment' THEN seq ELSE NULL END) AS has_step4
FROM funnel_base
GROUP BY user_id
)
SELECT
'浏览→加购' AS step_name,
COUNT(CASE WHEN has_step1 IS NOT NULL THEN 1 END) AS step_users,
COUNT(CASE WHEN has_step2 IS NOT NULL THEN 1 END) AS next_users,
ROUND(COUNT(CASE WHEN has_step2 IS NOT NULL THEN 1 END) * 100.0 /
COUNT(CASE WHEN has_step1 IS NOT NULL THEN 1 END), 2) || '%' AS conversion_rate
FROM funnel_steps;
3. 大数据量优化策略:
| 策略 | 描述 | 效果 |
|---|---|---|
| 预聚合 | 每小时预计算用户各事件首次/末次时间 | 减少Shuffle |
| Bitmap漏斗 | Doris ClickHouse用RoaringBitmap加速 | 亚秒级 |
| Flink实时漏斗 | 使用Flink CEP定义模式匹配 | 秒级延迟 |
4. Flink实时漏斗示例:
INSERT INTO funnel_result
SELECT
step_name,
COUNT(DISTINCT user_id) AS users
FROM TABLE(
CEP.MATCH_RECOGNIZE(
PARTITION BY user_id
ORDER BY event_time
MEASURES
STEP_FIRST(event_name) AS step_name
PATTERN (step1 step2 step3 step4)
DEFINE
step1 AS event_name = 'page_view',
step2 AS event_name = 'add_cart' AND event_time <= step1.event_time + INTERVAL '1' HOUR,
step3 AS event_name = 'create_order' AND event_time <= step2.event_time + INTERVAL '1' HOUR,
step4 AS event_name = 'payment' AND event_time <= step3.event_time + INTERVAL '30' MINUTE
) events
);
5. 窗口漏斗(按天/周):
-- 按天汇总漏斗转化
GROUP BY window_start, step_name