SYS · PRODUCT ANALYTICS WAREHOUSE · ADVISORY LOCKS · COMPLETE

SPECTRUM

Product analytics warehouse built on PostgreSQL advisory locks, a proper star schema, and a built-in experiment framework. Zero dropped events at high concurrency.

FOR UPDATE SKIP LOCKED STAR SCHEMA A/B EXPERIMENTS PARQUET EXPORT
// SPECTRUM · QUEUE WORKER · ADVISORY LOCK ● RUNNING
-- Worker picks next available job
-- without blocking or double-claiming
WITH next_job AS (
  SELECT id, event_type, payload
  FROM   event_queue
  WHERE  status = 'pending'
  ORDER BY created_at ASC
  FOR UPDATE SKIP LOCKED
  LIMIT  1
)
UPDATE event_queue
SET    status  = 'processing',
       claimed = NOW()
FROM   next_job
WHERE  event_queue.id = next_job.id
RETURNING *;
// 01 · THE PROBLEM

CONCURRENT
WRITES, ZERO
DROPS

Product analytics at any meaningful scale is a queue problem. Events arrive concurrently from multiple sources. Workers must claim and process them without overlap. If two workers claim the same event, the metric counts are wrong. If a worker crashes mid-claim, the event cannot be lost.

Most systems solve this with application-level locks or a message broker. Spectrum solves it at the database level using PostgreSQL's FOR UPDATE SKIP LOCKED. Each worker atomically claims one row, skipping any row another worker has locked. No external coordinator. No message broker for the hot path.

The downstream requirement was a proper data warehouse structure for analytical queries. Not a log store. A star schema with dimension tables, a central fact table, and the ability to run cohort and experiment queries directly in SQL without ETL preprocessing.

// QUEUE STATE · LIVE COUNTERS
LOCK MECHANISMFOR UPDATE SKIP LOCKED
WORKERSN concurrent · no coordinator
DOUBLE-CLAIM RISKZERO · DB-level atomicity
SCHEMAStar · 1 fact + 4 dims
EXPORTParquet via pandas · Arrow
// 02 · ETL PIPELINE

THREE
PHASES

INGEST
EVENT CAPTURE
TRACK ENDPOINT
FastAPI · POST /track · Pydantic
Single ingestion endpoint for all event types. Schema validated on arrival. Enqueued atomically into event_queue with status pending.
BATCH LOADER
CSV/JSON · bulk insert · upsert
Historical event import. Same schema as live events. Idempotent via ON CONFLICT DO NOTHING on event_id.
DEDUPLICATION
PostgreSQL · UNIQUE · event_id
Every event carries a client-generated UUID. DB constraint enforces uniqueness. Retries are safe, replay is safe.
FOR UPDATE SKIP LOCKED · WORKER CLAIMS JOB · ATOMICALLY
TRANSFORM
DIMENSIONAL LOAD
DIM UPSERT
INSERT ... ON CONFLICT · stable IDs
User, product, session, and time dimensions upserted before fact insert. Foreign keys guaranteed by load order.
FACT INSERT
FACT_EVENTS · FK resolution
Central fact row written with resolved FK references to all four dimensions. Includes experiment_id if event is part of an active variant.
QUEUE COMMIT
UPDATE status='done' · RETURNING
Job marked done in the same transaction as the fact insert. If the fact insert fails, the job stays pending. No half-processed events.
TRANSACTION COMMITTED · STATUS UPDATED · PARQUET EXPORT QUEUED
SURFACE
ANALYTICS OUTPUT
STREAMLIT
Live SQL · funnel · retention
Dashboard queries star schema directly. Funnel, retention cohort, revenue, and experiment views. No pre-aggregation.
PARQUET EXPORT
pandas · PyArrow · partitioned
Scheduled export of FACT_EVENTS + dimension tables to Parquet. Partitioned by event date. Compatible with any Arrow-based query engine.
REST API
FastAPI · /metrics · /experiments
Endpoint layer over star schema. /metrics for aggregated event counts, /experiments for A/B variant results with statistical significance.
// 03 · DATA MODEL

STAR
SCHEMA

One central fact table surrounded by four dimension tables. Queries against any dimension join directly to FACT_EVENTS. No denormalized lookup columns in the fact row. No pre-built aggregates needed for the dashboard views.

FACT_EVENTS CENTRAL FACT TABLE
event_id PK
user_id FK → DIM_USERS
product_id FK → DIM_PRODUCTS
session_id FK → DIM_SESSIONS
time_id FK → DIM_TIME
experiment_id
event_type · value
// 4 DIMENSION TABLES · STAR SCHEMA · DIRECT FK JOIN
DIM_USERS USER DIMENSION
user_id PK
user_uuid
first_seen
segment
DIM_TIME TIME DIMENSION
time_id PK
event_date
hour · minute
day_of_week
DIM_PRODUCTS PRODUCT DIMENSION
product_id PK
sku · name
category
price_usd
DIM_SESSIONS SESSION DIMENSION
session_id PK
session_uuid
entry_page
duration_s
// 04 · EXPERIMENT FRAMEWORK

A/B AT THE
WAREHOUSE LAYER

Every event can carry an experiment_id and a variant. The fact table stores both. Aggregations and significance tests run as SQL queries directly over FACT_EVENTS. No separate experiment database.

VARIANT ASSIGNMENT
hash-based · deterministic
User bucketing via SHA-256 hash of user_uuid + experiment_id. Same user always gets the same variant. No state store needed for assignment.
EVENT LOGGING
FACT_EVENTS · experiment_id
Every tracked event optionally includes experiment_id and variant. Stored as columns on FACT_EVENTS. Query by experiment with a single WHERE clause.
SIGNIFICANCE TEST
chi-square · p-value · scipy
Statistical significance computed via chi-square test on conversion counts per variant. Result exposed through /experiments endpoint. Thresholds configurable.
RESULT VIEW
Streamlit · live variant data
Experiment dashboard shows active variants, conversion rates, p-value, and whether the result clears the significance threshold. Updates on each page refresh.
// 05 · TECH STACK

THE FULL
STACK

PostgreSQL
ADVISORY LOCKS · STAR SCHEMA
FOR UPDATE SKIP LOCKED for the job queue. UNIQUE constraints for deduplication. Star schema with 4 dimension tables and 1 fact table. All analytics queries run here.
FastAPI
ASYNCIO · /TRACK · /METRICS
POST /track for event ingestion. GET /metrics for aggregated results. GET /experiments for A/B variant data with significance. Pydantic models throughout.
Streamlit
ANALYTICS DASHBOARDS
Funnel analysis, retention cohorts, revenue charts, and experiment results. Direct SQL queries against the star schema. No pre-aggregation or separate cache layer.
Apache Parquet
PANDAS · PYARROW · EXPORT
Scheduled export of fact and dimension tables to columnar Parquet files. Partitioned by event date. Queryable with Spark, DuckDB, or any Arrow-based engine.
SQLAlchemy
ORM · ASYNC · MIGRATIONS
Model definitions for all five tables. Async session management for the FastAPI workers. Alembic for schema versioning and migration history.
SciPy
CHI-SQUARE · P-VALUE
Statistical significance computation for A/B experiment results. Chi-square test on conversion contingency tables. Significance threshold configurable per experiment.
// 06 · OUTCOMES

BY THE
NUMBERS

0 DROPPED EVENTS AT CONCURRENCY
5 TABLE STAR SCHEMA
4 STREAMLIT DASHBOARD VIEWS
SQL ONLY ANALYTICS LAYER
1-TX FACT INSERT + QUEUE COMMIT
PARQUET COLUMNAR EXPORT READY
// 07 · THE PLATFORM

DASHBOARD
IN PRODUCTION

Spectrum dashboard overview
// SPECTRUM · DASHBOARD OVERVIEW · LIVE METRICS · FUNNEL + RETENTION + REVENUE
Funnel analysis chart
// FUNNEL ANALYSIS · CONVERSION STEPS · DROP-OFF RATES
Retention cohorts
// RETENTION COHORTS · WEEKLY · STAR SCHEMA QUERY
Revenue chart
// REVENUE · PRODUCT DIMENSION JOIN · DIRECT SQL
Experiment results
// A/B EXPERIMENTS · VARIANT RESULTS · CHI-SQUARE · P-VALUE
// 08 · SOURCE CODE

OPEN
SOURCE

$ git clone https://github.com/frogwebp/spectrum
$ docker-compose up -d
$ curl -X POST http://localhost:8000/track \
-d '{"event":"page_view","user_id":"u-001"}'
# postgresql · for update skip locked · star schema · parquet

→ VIEW ON GITHUB
SPECTRUM · PRODUCT ANALYTICS WAREHOUSE · ADVISORY LOCKS · @FROGWEBP