An AI agent in a loop doesn’t fail politely. It calls a tool, gets a result it misreads, calls again and again, and by the time a dashboard shows you the spike, it’s made a thousand API calls and spent real money. The tools built to watch agents, LangSmith, Helicone, Arize, are reactive loggers: they batch their aggregations every 30–60 seconds. That window is the whole problem.
Phronis closes it. It’s middleware between your agent and the infrastructure that detects behavioral anomalies and trips a circuit breaker in under 500ms, while the incident is still small. This is the first entry in its codex: the gap it fills, and the shape of the system.
// 01 — THE GAP
Batch observability answers “what happened?” minutes later. For a runaway agent, minutes is thousands of calls too late. Named after phronēsis, practical wisdom, Phronis is built to act, not just log: detect the storm and stop it before it cascades.
The number that matters: event-to-alert p99 under 500ms. Total time from first bad call to enforced stop is ~600ms.
// 02 — THE FLOW
Agent (instrumented with @agent / @tool)
│ JSON events, ~1ms emit
▼
Redpanda (Kafka-compatible broker, ~10ms p99)
│ streaming ingest
▼
RisingWave (incremental materialized views — not micro-batch)
│ TUMBLE window detects a call storm → publishes an alert
▼
AlertExecutor → Kill Switch (PostgreSQL-backed) → CircuitBreakerTripped
Each layer is chosen to keep latency under budget: Redpanda for low-latency ingest, RisingWave for incremental SQL (state updates as each event lands, not every 30s), a circuit breaker that raises inside the agent the moment the next tool call is attempted.
// 03 — WHY IT’S DIFFERENT
The competitors aren’t wrong. They’re built for a different question. They answer “how is my agent behaving over time?” Phronis answers “is my agent misbehaving right now, and can I stop it?” That requires streaming SQL that updates continuously, a broker measured in milliseconds, and an enforcement path that runs inside the agent’s own call loop.
It’s production-ready: 10 build phases complete, 11 containers, Kubernetes manifests and a Helm chart.
TAKEAWAYS
- For runaway agents, detection latency is everything. A 30–60s batch window is thousands of calls of blast radius; sub-500ms catches the incident while it’s small.
- Real-time enforcement needs three things batch tools lack: a millisecond broker, incremental (not micro-batch) stream processing, and an enforcement hook inside the agent loop.
- Observability that only describes isn’t enough for autonomous systems. Phronis is built to act.
NEXT
- Build log 02: the SDK: @agent / @tool decorators and ~1ms event emission.
