Real-Time Circuit Breaking for AI Agents — What Batch Tools Can't Do

An AI agent in a loop doesn’t fail politely. It calls a tool, gets a result it misreads, calls again and again, and by the time a dashboard shows you the spike, it’s made a thousand API calls and spent real money. The tools built to watch agents, LangSmith, Helicone, Arize, are reactive loggers: they batch their aggregations every 30–60 seconds. That window is the whole problem.

Phronis closes it. It’s middleware between your agent and the infrastructure that detects behavioral anomalies and trips a circuit breaker in under 500ms, while the incident is still small. This is the first entry in its codex: the gap it fills, and the shape of the system.

// 01 — THE GAP

Batch observability answers “what happened?” minutes later. For a runaway agent, minutes is thousands of calls too late. Named after phronēsis, practical wisdom, Phronis is built to act, not just log: detect the storm and stop it before it cascades.

The number that matters: event-to-alert p99 under 500ms. Total time from first bad call to enforced stop is ~600ms.

// 02 — THE FLOW

Agent (instrumented with @agent / @tool)
   │  JSON events, ~1ms emit
   ▼
Redpanda  (Kafka-compatible broker, ~10ms p99)
   │  streaming ingest
   ▼
RisingWave  (incremental materialized views — not micro-batch)
   │  TUMBLE window detects a call storm → publishes an alert
   ▼
AlertExecutor → Kill Switch (PostgreSQL-backed) → CircuitBreakerTripped

Each layer is chosen to keep latency under budget: Redpanda for low-latency ingest, RisingWave for incremental SQL (state updates as each event lands, not every 30s), a circuit breaker that raises inside the agent the moment the next tool call is attempted.

// 03 — WHY IT’S DIFFERENT

The competitors aren’t wrong. They’re built for a different question. They answer “how is my agent behaving over time?” Phronis answers “is my agent misbehaving right now, and can I stop it?” That requires streaming SQL that updates continuously, a broker measured in milliseconds, and an enforcement path that runs inside the agent’s own call loop.

It’s production-ready: 10 build phases complete, 11 containers, Kubernetes manifests and a Helm chart.

TAKEAWAYS

For runaway agents, detection latency is everything. A 30–60s batch window is thousands of calls of blast radius; sub-500ms catches the incident while it’s small.
Real-time enforcement needs three things batch tools lack: a millisecond broker, incremental (not micro-batch) stream processing, and an enforcement hook inside the agent loop.
Observability that only describes isn’t enough for autonomous systems. Phronis is built to act.

Build log 02: the SDK: @agent / @tool decorators and ~1ms event emission.

// 01 — THE GAP

// 02 — THE FLOW

// 03 — WHY IT’S DIFFERENT

TAKEAWAYS

NEXT