SYS · PRODUCT ANALYTICS WAREHOUSE · ADVISORY LOCKS · COMPLETE

SPECTRUM

Product analytics warehouse built on PostgreSQL advisory locks, a proper star schema, and a built-in experiment framework. Zero dropped events at high concurrency.

FOR UPDATE SKIP LOCKED STAR SCHEMA A/B EXPERIMENTS PARQUET EXPORT

// SPECTRUM · QUEUE WORKER · ADVISORY LOCK ● RUNNING

-- Worker picks next available job
-- without blocking or double-claiming
WITH next_job AS (
  SELECT id, event_type, payload
  FROM   event_queue
  WHERE  status = 'pending'
  ORDER BY created_at ASC
  FOR UPDATE SKIP LOCKED
  LIMIT  1
)
UPDATE event_queue
SET    status  = 'processing',
       claimed = NOW()
FROM   next_job
WHERE  event_queue.id = next_job.id
RETURNING *;

// 01 · THE PROBLEM

CONCURRENT
WRITES, ZERO
DROPS

Product analytics at any meaningful scale is a queue problem. Events arrive concurrently from multiple sources. Workers must claim and process them without overlap. If two workers claim the same event, the metric counts are wrong. If a worker crashes mid-claim, the event cannot be lost.

Most systems solve this with application-level locks or a message broker. Spectrum solves it at the database level using PostgreSQL's FOR UPDATE SKIP LOCKED. Each worker atomically claims one row, skipping any row another worker has locked. No external coordinator. No message broker for the hot path.

The downstream requirement was a proper data warehouse structure for analytical queries. Not a log store. A star schema with dimension tables, a central fact table, and the ability to run cohort and experiment queries directly in SQL without ETL preprocessing.

// QUEUE STATE · LIVE COUNTERS

LOCK MECHANISMFOR UPDATE SKIP LOCKED

WORKERSN concurrent · no coordinator

DOUBLE-CLAIM RISKZERO · DB-level atomicity

SCHEMAStar · 1 fact + 4 dims

EXPORTParquet via pandas · Arrow

// 02 · ETL PIPELINE

THREE
PHASES

INGEST

EVENT CAPTURE

TRACK ENDPOINT

FastAPI · POST /track · Pydantic

Single ingestion endpoint for all event types. Schema validated on arrival. Enqueued atomically into event_queue with status pending.

BATCH LOADER

CSV/JSON · bulk insert · upsert

Historical event import. Same schema as live events. Idempotent via ON CONFLICT DO NOTHING on event_id.

DEDUPLICATION

PostgreSQL · UNIQUE · event_id

Every event carries a client-generated UUID. DB constraint enforces uniqueness. Retries are safe, replay is safe.

FOR UPDATE SKIP LOCKED · WORKER CLAIMS JOB · ATOMICALLY

TRANSFORM

DIMENSIONAL LOAD

DIM UPSERT

INSERT ... ON CONFLICT · stable IDs

User, product, session, and time dimensions upserted before fact insert. Foreign keys guaranteed by load order.

FACT INSERT

FACT_EVENTS · FK resolution

Central fact row written with resolved FK references to all four dimensions. Includes experiment_id if event is part of an active variant.

QUEUE COMMIT

UPDATE status='done' · RETURNING

Job marked done in the same transaction as the fact insert. If the fact insert fails, the job stays pending. No half-processed events.

TRANSACTION COMMITTED · STATUS UPDATED · PARQUET EXPORT QUEUED

SURFACE

ANALYTICS OUTPUT

STREAMLIT

Live SQL · funnel · retention

Dashboard queries star schema directly. Funnel, retention cohort, revenue, and experiment views. No pre-aggregation.

PARQUET EXPORT

pandas · PyArrow · partitioned

Scheduled export of FACT_EVENTS + dimension tables to Parquet. Partitioned by event date. Compatible with any Arrow-based query engine.

REST API

FastAPI · /metrics · /experiments

Endpoint layer over star schema. /metrics for aggregated event counts, /experiments for A/B variant results with statistical significance.

// 03 · DATA MODEL

STAR
SCHEMA

One central fact table surrounded by four dimension tables. Queries against any dimension join directly to FACT_EVENTS. No denormalized lookup columns in the fact row. No pre-built aggregates needed for the dashboard views.

FACT_EVENTS CENTRAL FACT TABLE

event_id PK

user_id FK → DIM_USERS

product_id FK → DIM_PRODUCTS

session_id FK → DIM_SESSIONS

time_id FK → DIM_TIME

experiment_id

event_type · value

// 4 DIMENSION TABLES · STAR SCHEMA · DIRECT FK JOIN

→ DIM_USERS USER DIMENSION

user_id PK

user_uuid

first_seen

segment

→ DIM_TIME TIME DIMENSION

time_id PK

event_date

hour · minute

day_of_week

→ DIM_PRODUCTS PRODUCT DIMENSION

product_id PK

sku · name

A/B AT THE
WAREHOUSE LAYER

Every event can carry an experiment_id and a variant. The fact table stores both. Aggregations and significance tests run as SQL queries directly over FACT_EVENTS. No separate experiment database.

VARIANT ASSIGNMENT

hash-based · deterministic

User bucketing via SHA-256 hash of user_uuid + experiment_id. Same user always gets the same variant. No state store needed for assignment.

EVENT LOGGING

FACT_EVENTS · experiment_id

Every tracked event optionally includes experiment_id and variant. Stored as columns on FACT_EVENTS. Query by experiment with a single WHERE clause.

SIGNIFICANCE TEST

chi-square · p-value · scipy

Statistical significance computed via chi-square test on conversion counts per variant. Result exposed through /experiments endpoint. Thresholds configurable.

RESULT VIEW

Streamlit · live variant data

Experiment dashboard shows active variants, conversion rates, p-value, and whether the result clears the significance threshold. Updates on each page refresh.

// 05 · TECH STACK

THE FULL
STACK

PostgreSQL

ADVISORY LOCKS · STAR SCHEMA

FOR UPDATE SKIP LOCKED for the job queue. UNIQUE constraints for deduplication. Star schema with 4 dimension tables and 1 fact table. All analytics queries run here.

FastAPI

ASYNCIO · /TRACK · /METRICS

POST /track for event ingestion. GET /metrics for aggregated results. GET /experiments for A/B variant data with significance. Pydantic models throughout.

Streamlit

ANALYTICS DASHBOARDS

Funnel analysis, retention cohorts, revenue charts, and experiment results. Direct SQL queries against the star schema. No pre-aggregation or separate cache layer.

Apache Parquet

PANDAS · PYARROW · EXPORT

Scheduled export of fact and dimension tables to columnar Parquet files. Partitioned by event date. Queryable with Spark, DuckDB, or any Arrow-based engine.

SQLAlchemy

ORM · ASYNC · MIGRATIONS

Model definitions for all five tables. Async session management for the FastAPI workers. Alembic for schema versioning and migration history.

SciPy

CHI-SQUARE · P-VALUE

Statistical significance computation for A/B experiment results. Chi-square test on conversion contingency tables. Significance threshold configurable per experiment.