The circuit breaker is one of the oldest patterns in distributed systems. AI agents give it a new and urgent home.
The classic pattern
Borrowed from electrical engineering: when calls to a downstream service keep failing, a circuit breaker “trips” and short-circuits further calls, failing fast instead of piling on a service that’s already struggling. After a cooldown it lets a trial call through; if that succeeds, it closes again. The point is to stop a local problem from cascading.
Three states: closed (calls flow), open (calls are blocked), half-open (a tentative trial).
Why agents need it
An AI agent is a process that decides its own next action. When it misreads a result, it doesn’t crash. It keeps going, often faster, calling tools and spending tokens in a tight loop. Nothing in the classic failure model catches “working exactly as coded, toward the wrong end, at high speed.” You need a breaker that trips on behavior, things like a call-rate storm, runaway cost, or output drift, not just on errors.
What changes in the AI version
- The trip condition is behavioral, not failure-based. “120 tool calls in 10 seconds” or “cost spiking”: the agent isn’t erroring; it’s misbehaving.
- Detection has to be real-time. A classic breaker reacts to failures as they return; an agent breaker has to detect a pattern across a window of events, fast enough to matter (see: streaming aggregation).
- Enforcement lives inside the agent’s loop. The breaker is checked before each tool call, so an open breaker raises immediately rather than after the next failure.
Takeaway
The circuit breaker survives because the core idea, detect a bad pattern, stop fast, prevent cascade, generalizes. Pointed at AI agents, it trips on behavior instead of errors, needs streaming detection instead of return-value checks, and enforces inside the agent loop. Same pattern, new failure mode.
