DOC: spectrum-a
STATUS: ● PUBLISHED
SYSTEM SPECTRUM

Funnels and Retention in Pure SQL

The analytics views that power the dashboards: no Python, no dbt, just SQL.

Cover image — Funnels and Retention in Pure SQL

Spectrum’s dashboards show conversion funnels, weekly retention, revenue by account, and experiment results. Every number is computed by a SQL materialized view against the star schema. This is build log 05: the view definitions and the logic behind each one.

// 01 — CONVERSION FUNNELS

A funnel answers: “of users who did step A, how many also did step B within N minutes, and then step C?”

The pattern is a sequence of correlated subqueries or self-joins, filtered by occurred_at windows:

CREATE MATERIALIZED VIEW analytics.mv_conversion_funnels AS
SELECT
    fe1.account_id,
    fe1.event_type_id                         AS step1_type,
    count(DISTINCT fe1.user_id)               AS step1_users,
    count(DISTINCT fe2.user_id)               AS step2_users,
    count(DISTINCT fe3.user_id)               AS step3_users
FROM warehouse.fact_events fe1
LEFT JOIN warehouse.fact_events fe2
    ON fe2.user_id    = fe1.user_id
    AND fe2.account_id = fe1.account_id
    AND fe2.event_type_id = /* step 2 type id */
    AND fe2.occurred_at BETWEEN fe1.occurred_at AND fe1.occurred_at + INTERVAL '30 min'
LEFT JOIN warehouse.fact_events fe3
    ON fe3.user_id    = fe1.user_id
    AND fe3.account_id = fe1.account_id
    AND fe3.event_type_id = /* step 3 type id */
    AND fe3.occurred_at BETWEEN fe2.occurred_at AND fe2.occurred_at + INTERVAL '30 min'
GROUP BY fe1.account_id, fe1.event_type_id
WITH NO DATA;

The materialized view stores the aggregated counts; Grafana reads the percentages as step2_users / step1_users.

// 02 — WEEKLY RETENTION

Retention asks: “of users who appeared in week W, what fraction returned in W+1, W+2, W+4?”

CREATE MATERIALIZED VIEW analytics.mv_retention_cohorts AS
WITH cohorts AS (
    SELECT
        account_id,
        user_id,
        date_trunc('week', MIN(occurred_at))::DATE AS cohort_week
    FROM warehouse.fact_events
    GROUP BY account_id, user_id
),
activity AS (
    SELECT DISTINCT
        account_id,
        user_id,
        date_trunc('week', occurred_at)::DATE AS event_week
    FROM warehouse.fact_events
)
SELECT
    c.account_id,
    c.cohort_week,
    count(DISTINCT c.user_id)                                                       AS cohort_size,
    count(DISTINCT a1.user_id) FILTER (WHERE a1.event_week = c.cohort_week + 7)     AS w1,
    count(DISTINCT a2.user_id) FILTER (WHERE a2.event_week = c.cohort_week + 14)    AS w2,
    count(DISTINCT a4.user_id) FILTER (WHERE a4.event_week = c.cohort_week + 28)    AS w4
FROM cohorts c
LEFT JOIN activity a1 ON a1.user_id = c.user_id AND a1.account_id = c.account_id
LEFT JOIN activity a2 ON a2.user_id = c.user_id AND a2.account_id = c.account_id
LEFT JOIN activity a4 ON a4.user_id = c.user_id AND a4.account_id = c.account_id
GROUP BY c.account_id, c.cohort_week
WITH NO DATA;

The account_id on every join is the cross-tenant isolation guard. Missing it was the anomaly documented in the retention leak log.

// 03 — REVENUE AND EVENT VOLUME

Simpler aggregations keyed by account and day:

CREATE MATERIALIZED VIEW analytics.mv_revenue_summary AS
SELECT
    f.account_id,
    date_trunc('day', f.occurred_at)::DATE AS event_day,
    count(*)                                AS total_events,
    sum((f.properties->>'amount_cents')::INT) FILTER (WHERE f.properties ? 'amount_cents') AS revenue_cents
FROM warehouse.fact_events f
GROUP BY f.account_id, event_day
WITH NO DATA;

properties ? 'amount_cents' is the JSONB “key exists” operator. It skips non-revenue events cleanly rather than casting NULL to 0.

// 04 — THE MATERIALIZED VIEW REFRESH

All four views are created WITH NO DATA and populated on the first pipeline run with REFRESH MATERIALIZED VIEW. Subsequent runs use REFRESH MATERIALIZED VIEW CONCURRENTLY, which requires a unique index on each view:

CREATE UNIQUE INDEX ON analytics.mv_retention_cohorts (account_id, cohort_week);
CREATE UNIQUE INDEX ON analytics.mv_revenue_summary (account_id, event_day);
-- and so on for each view

Without a unique index, CONCURRENTLY refuses to run.

NEXT

@frogwebp brand mark
ANTHONY PENA · @FROGWEBP
I build data systems and write about everything around them, the architecture, the failures, what each one teaches me. Documenting in public since 2021: the process, not just the result.

// NEWSLETTER — THE BUILD LOG SIGNAL

When I ship something or learn something worth keeping, it lands here first — build logs, concepts, and the honest process behind them. Come along; no spam, leave anytime.