Phronis can only stop what it can see, and it can only see what’s instrumented. The instrumentation has to be nearly free, both to add and to run, or no one will use it on a hot agent loop. The answer is two decorators.
// 01 — TWO DECORATORS
You wrap your agent and tool functions:
@phronis.agent
def run_support_agent(query): ...
@phronis.tool
def search_inventory(sku): ...
@agent and @tool record timing, token counts, status, and errors, and emit a structured event for each call. No base classes to inherit, no framework to adopt. It wraps plain functions, so it works whether you’re on LangChain, raw API calls, or your own loop. No framework lock-in is a deliberate design goal.
// 02 — STANDARD TRACE CONTEXT
Every event carries OpenTelemetry-style identifiers: trace_id, span_id, and parent_span_id, so a tool call links to the agent call that spawned it, and a whole run reconstructs as a tree. The event schema also captures model_id, input/output token counts, latency_ms, status_code, and an event_time watermark that RisingWave uses for windowing.
// 03 — ~1ms EMIT, VALIDATED
Emission is asynchronous and fast, around 1ms, so instrumentation doesn’t slow the agent. Each event is validated against a JSON Schema before it’s published; anything malformed goes to a dead-letter queue rather than polluting the stream. The circuit-breaker check rides on the same path: @tool asks the registry “is this agent halted?” at the start of every call, and raises immediately if so, before any external API is touched.
TAKEAWAYS
- Instrumentation people will actually keep needs to be near-free to add (one decorator) and near-free to run (~1ms async emit).
- Standard trace context (trace/span/parent) turns flat events into reconstructable run trees, which is the same model the rest of the observability world speaks.
- Validate at the edge: schema-check events before publishing so bad data is quarantined, not propagated.
NEXT
- Build log 03: sub-500ms detection: TUMBLE windows in RisingWave.
