CodeWalk

漏斗分析(Funnel Analysis)的SQL实现与优化

作者:专业代码师 · 2026-05-30 12:55

漏斗分析是用户行为分析的核心功能。请给出一个标准的电商漏斗(浏览→加购→下单→支付)的Spark SQL实现,使用窗口函数(LAG/LEAD/ROW_NUMBER)按用户和时间排序计算每一步的转化率。同时讨论大数据量下的漏斗优化策略:预聚合、Bitmap漏斗、以及Flink实时漏斗的实现思路。

回答

专业代码师

漏斗分析的SQL实现:

1. 电商漏斗定义

Step1: 浏览商品(page_view)→ Step2: 加入购物车(add_cart)
→ Step3: 下单(create_order)→ Step4: 支付(payment)

2. Spark SQL实现

WITH funnel_base AS (
  SELECT 
    user_id,
    event_name,
    event_time,
    ROW_NUMBER() OVER (
      PARTITION BY user_id 
      ORDER BY event_time
    ) AS seq
  FROM events
  WHERE dt = '2025-05-25'
    AND event_name IN ('page_view','add_cart','create_order','payment')
),
funnel_steps AS (
  SELECT
    user_id,
    MAX(CASE WHEN event_name = 'page_view' THEN seq ELSE NULL END) AS has_step1,
    MAX(CASE WHEN event_name = 'add_cart' THEN seq ELSE NULL END) AS has_step2,
    MAX(CASE WHEN event_name = 'create_order' THEN seq ELSE NULL END) AS has_step3,
    MAX(CASE WHEN event_name = 'payment' THEN seq ELSE NULL END) AS has_step4
  FROM funnel_base
  GROUP BY user_id
)
SELECT
  '浏览→加购' AS step_name,
  COUNT(CASE WHEN has_step1 IS NOT NULL THEN 1 END) AS step_users,
  COUNT(CASE WHEN has_step2 IS NOT NULL THEN 1 END) AS next_users,
  ROUND(COUNT(CASE WHEN has_step2 IS NOT NULL THEN 1 END) * 100.0 / 
    COUNT(CASE WHEN has_step1 IS NOT NULL THEN 1 END), 2) || '%' AS conversion_rate
FROM funnel_steps;

3. 大数据量优化策略

策略描述效果
预聚合每小时预计算用户各事件首次/末次时间减少Shuffle
Bitmap漏斗Doris ClickHouse用RoaringBitmap加速亚秒级
Flink实时漏斗使用Flink CEP定义模式匹配秒级延迟

4. Flink实时漏斗示例

INSERT INTO funnel_result
SELECT
  step_name,
  COUNT(DISTINCT user_id) AS users
FROM TABLE(
  CEP.MATCH_RECOGNIZE(
    PARTITION BY user_id
    ORDER BY event_time
    MEASURES
      STEP_FIRST(event_name) AS step_name
    PATTERN (step1 step2 step3 step4)
    DEFINE
      step1 AS event_name = 'page_view',
      step2 AS event_name = 'add_cart' AND event_time <= step1.event_time + INTERVAL '1' HOUR,
      step3 AS event_name = 'create_order' AND event_time <= step2.event_time + INTERVAL '1' HOUR,
      step4 AS event_name = 'payment' AND event_time <= step3.event_time + INTERVAL '30' MINUTE
  ) events
);

5. 窗口漏斗(按天/周)

-- 按天汇总漏斗转化
GROUP BY window_start, step_name