DOC: contextflo
STATUS: ● PUBLISHED
SYSTEM CONTEXTFLOW

2.5MB Through a 48KB Door: The XCom Size Limit

A silent empty result, and why Airflow swallowed the data without an error.

Cover image — 2.5MB Through a 48KB Door: The XCom Size Limit

The Airflow pipeline reported success. The load task ran, finished green, and upserted zero chunks. No error, no exception, no failed task. The data just wasn’t there. A silent empty result is the worst kind, because nothing tells you to go looking.

// 01 — THE SETUP

The DAG is validate >> extract >> transform >> load. The transform task embedded the chunks and returned them; load received them from the previous task and upserted into ChromaDB. Tasks pass data through Airflow’s XCom.

// 02 — THE SYMPTOM

load got no chunks and wrote nothing, but raised nothing either. Every task showed success. The only symptom was a collection that stayed empty after a “successful” run.

// 03 — THE CULPRIT

XCom stores inter-task data in Airflow’s metadata database, and it caps payloads at roughly 48 KB. The transform task was returning the full embedded chunk list: about 800 chunks × 384 floats ≈ 2.5 MB. Payloads over the cap aren’t loudly rejected; they’re silently truncated or dropped. So transform “returned” 2.5 MB, XCom quietly discarded it, and load received nothing, with no error anywhere in the chain. XCom is a channel for small control values, and I’d used it as a data bus.

// 04 — THE FIX

Pass a pointer, not the payload. transform writes the chunks to a timestamped JSON file on the shared /data volume and returns only the file path through XCom; load reads the file, upserts, and deletes it:

# transform: write data to disk, hand XCom a path
path = f"/data/chunks_{ts}.json"; write_json(path, chunks); return path
# load: read the path, do the work, clean up
chunks = read_json(path); batch_upsert(chunks); os.remove(path)

XCom now carries a few dozen bytes. The data never touches the metadata database.

TAKEAWAYS

NEXT

@frogwebp brand mark
ANTHONY PENA · @FROGWEBP
I build data systems and write about everything around them, the architecture, the failures, what each one teaches me. Documenting in public since 2021: the process, not just the result.

// NEWSLETTER — THE BUILD LOG SIGNAL

When I ship something or learn something worth keeping, it lands here first — build logs, concepts, and the honest process behind them. Come along; no spam, leave anytime.