DOC: phronis-re
STATUS: ● PUBLISHED
SYSTEM PHRONIS

Real-Time Circuit Breaking for AI Agents — What Batch Tools Can't Do

Detect and halt a runaway agent in under 500ms, before the blast radius grows.

Cover image — Real-Time Circuit Breaking for AI Agents — What Batch Tools Can't Do

An AI agent in a loop doesn’t fail politely. It calls a tool, gets a result it misreads, calls again and again, and by the time a dashboard shows you the spike, it’s made a thousand API calls and spent real money. The tools built to watch agents, LangSmith, Helicone, Arize, are reactive loggers: they batch their aggregations every 30–60 seconds. That window is the whole problem.

Phronis closes it. It’s middleware between your agent and the infrastructure that detects behavioral anomalies and trips a circuit breaker in under 500ms, while the incident is still small. This is the first entry in its codex: the gap it fills, and the shape of the system.

// 01 — THE GAP

Batch observability answers “what happened?” minutes later. For a runaway agent, minutes is thousands of calls too late. Named after phronēsis, practical wisdom, Phronis is built to act, not just log: detect the storm and stop it before it cascades.

The number that matters: event-to-alert p99 under 500ms. Total time from first bad call to enforced stop is ~600ms.

// 02 — THE FLOW

Agent (instrumented with @agent / @tool)
   │  JSON events, ~1ms emit

Redpanda  (Kafka-compatible broker, ~10ms p99)
   │  streaming ingest

RisingWave  (incremental materialized views — not micro-batch)
   │  TUMBLE window detects a call storm → publishes an alert

AlertExecutor → Kill Switch (PostgreSQL-backed) → CircuitBreakerTripped

Each layer is chosen to keep latency under budget: Redpanda for low-latency ingest, RisingWave for incremental SQL (state updates as each event lands, not every 30s), a circuit breaker that raises inside the agent the moment the next tool call is attempted.

// 03 — WHY IT’S DIFFERENT

The competitors aren’t wrong. They’re built for a different question. They answer “how is my agent behaving over time?” Phronis answers “is my agent misbehaving right now, and can I stop it?” That requires streaming SQL that updates continuously, a broker measured in milliseconds, and an enforcement path that runs inside the agent’s own call loop.

It’s production-ready: 10 build phases complete, 11 containers, Kubernetes manifests and a Helm chart.

TAKEAWAYS

NEXT

@frogwebp brand mark
ANTHONY PENA · @FROGWEBP
I build data systems and write about everything around them, the architecture, the failures, what each one teaches me. Documenting in public since 2021: the process, not just the result.

// NEWSLETTER — THE BUILD LOG SIGNAL

When I ship something or learn something worth keeping, it lands here first — build logs, concepts, and the honest process behind them. Come along; no spam, leave anytime.