Observability and governance for agentic systems
Observability and governance for agentic systems
AI agents make decisions, call tools, and coordinate with other agents. Waxell Observe captures what happens — automatically — so teams can understand, debug, and govern agent behavior in production.
AI agents make decisions, call tools, and coordinate with other agents. Waxell Observe captures what happens — automatically — so teams can understand, debug, and govern agent behavior in production.
Built for Operational Experience (OX)
Built for Developer Experience (DX)
Auto-instrumentation — two lines
import waxell_observe as waxell # One line. Every OpenAI / Anthropic / Groq call is now traced. waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")
Auto-instrumentation — two lines
import waxell_observe as waxell waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")
Observability and governance for agentic systems
AI agents make decisions, call tools, and coordinate with other agents. Waxell Observe captures what happens — automatically — so teams can understand, debug, and govern agent behavior in production.
Built for Operational Experience (OX)
Built for Developer Experience (DX)
Auto-instrumentation — two lines
import waxell_observe as waxell # One line. Every OpenAI / Anthropic / Groq call is now traced. waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")
Observability and governance for agentic systems
AI agents make decisions, call tools, and coordinate with other agents. Waxell Observe captures what happens — automatically — so teams can understand, debug, and govern agent behavior in production.
Auto-instrumentation — two lines
import waxell_observe as waxell # One line. Every OpenAI / Anthropic / Groq call is now traced. waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")
Built for Operational Experience (OX)
Built for Developer Experience (DX)
Everyone observes. Only Waxell governs.
Other tools show you what happened. Waxell Observe does that too — but it also enforces what's allowed to happen next. A dashboard after the fact is not governance. It's an autopsy.
Waxell introduces dynamic governance: policies that evaluate agent behavior in real time — before execution, between steps, and after completion. When a policy triggers, the agent receives structured feedback: retry, escalate, or halt.
Observability tells you what your agents did. Governance ensures they only do what they should.



Everyone observes. Only Waxell governs.
Other tools show you what happened. Waxell Observe does that too — but it also enforces what's allowed to happen next. A dashboard after the fact is not governance. It's an autopsy.
Waxell introduces dynamic governance: policies that evaluate agent behavior in real time — before execution, between steps, and after completion. When a policy triggers, the agent receives structured feedback: retry, escalate, or halt.
Observability tells you what your agents did. Governance ensures they only do what they should.
What we govern
Eleven policy categories. Each one is a class of problem you no longer solve with hope. Configure rules in the dashboard, enforce them during execution.
Audit
Configure logging and compliance. Every decision, every call, every cost — recorded immutably for review.
Kill
Emergency stop controls. Halt any agent, any workflow, immediately. The button you need when autonomy goes wrong.
Rate-Limit
Control how often workflows can run. Prevent runaway loops, enforce cooldowns, throttle expensive operations.
Content
Input/output content scanning and filtering. Block PII, detect prompt injection, redact sensitive data before it leaves your stack.
LLM
Model-specific constraints. Restrict which models an agent can call, set token ceilings per model, enforce provider allowlists.
Safety
Set safety limits and controls. Define boundaries for what agents are allowed to do — and what they must never do.
Control
Flow control and notifications. Define escalation paths, approval gates, and alerting rules for specific agent behaviors.
Operations
Timeouts, retries, and circuit breakers. Define how agents fail — gracefully, with structure, not silently.
Scheduling
Control when workflows can run. Business hours only, maintenance windows, time-zone-aware execution rules.
Cost
Set spending and token limits. Per-agent, per-user, per-session. Budgets that enforce themselves — not spread-sheets you check on Friday.
Quality
Output validation and quality gates. Score outputs automatically, flag low-confidence responses, block inadequate results.
Audit
Configure logging and compliance. Every decision, every call, every cost — recorded immutably for review.
Content
Input/output content scanning and filtering. Block PII, detect prompt injection, redact sensitive data before it leaves your stack.
Control
Flow control and notifications. Define escalation paths, approval gates, and alerting rules for specific agent behaviors.
Cost
Set spending and token limits. Per-agent, per-user, per-session. Budgets that enforce themselves — not spread-sheets you check on Friday.
Kill
Emergency stop controls. Halt any agent, any workflow, immediately. The button you need when autonomy goes wrong.
LLM
Model-specific constraints. Restrict which models an agent can call, set token ceilings per model, enforce provider allowlists.
Operations
Timeouts, retries, and circuit breakers. Define how agents fail — gracefully, with structure, not silently.
Quality
Output validation and quality gates. Score outputs automatically, flag low-confidence responses, block inadequate results.
Rate-Limit
Control how often workflows can run. Prevent runaway loops, enforce cooldowns, throttle expensive operations.
Safety
Set safety limits and controls. Define boundaries for what agents are allowed to do — and what they must never do.
Scheduling
Control when workflows can run. Business hours only, maintenance windows, time-zone-aware execution rules.
Audit
Configure logging and compliance. Every decision, every call, every cost — recorded immutably for review.
Control
Flow control and notifications. Define escalation paths, approval gates, and alerting rules for specific agent behaviors.
Kill
Emergency stop controls. Halt any agent, any workflow, immediately. The button you need when autonomy goes wrong
Operations
Timeouts, retries, and circuit breakers. Define how agents fail — gracefully, with structure, not silently.
Rate-Limit
Control how often workflows can run. Prevent runaway loops, enforce cooldowns, throttle expensive operations.
Scheduling
Control when workflows can run. Business hours only, maintenance windows, time-zone-aware execution rules.
Content
Input/output content scanning and filtering. Block PII, detect prompt injection, redact sensitive data before it leaves your stack.
Cost
Set spending and token limits. Per-agent, per-user, per-session. Budgets that enforce themselves — not spread-sheets you check on Friday.
LLM
Model-specific constraints. Restrict which models an agent can call, set token ceilings per model, enforce provider allowlists.
Quality
Output validation and quality gates. Score outputs automatically, flag low-confidence responses, block inadequate results.
Safety
Set safety limits and controls. Define boundaries for what agents are allowed to do — and what they must never do.
Worried about runaway spend?
Worried about runaway spend?
Cost policies enforce per-agent and per-user budgets in real time. Set a ceiling and forget about it.
Cost policies enforce per-agent and per-user budgets in real time. Set a ceiling and forget about it.
Start in five minutes
As easy as pip install.
Install the SDK. Set your API key. Initialize before your imports. From that point on, every LLM call, tool invocation, and agent decision is captured automatically — with cost, latency, and token counts attached.
No decorators required to start. No wrapper classes. No changes to your agent logic. When you need more structure, add decorators and context managers incrementally. Governance, scoring, and prompt management are there when you're ready.
Works with any Python agent framework. Supports sync, async, Jupyter notebooks, and production servers. If your agent runs Python, you can observe it.

Need agents that stay inside the lines?
Need agents that stay inside the lines?
Safety and content policies scan inputs and outputs in real time. PII detection, prompt injection blocking, and output filtering — enforced, not suggested.
Safety and content policies scan inputs and outputs in real time. PII detection, prompt injection blocking, and output filtering — enforced, not suggested.
What Waxell captures
LLM Calls
Every model invocation — tokens, latency, cost, provider — captured automatically across OpenAI, Anthropic, Groq, and 200+ providers.
Decisions
Routing choices, classifications, and agent dispatching — with the options considered, the choice made, and the reasoning.
Reasoning
Chain-of-thought steps — the thought, the evidence, and the conclusion — captured so they can be inspected and replayed.
COST MANAGEMENT
Function calls, API requests, database queries, vector searches — recorded with inputs, outputs, timing, and status.
Retrieval
Document searches across vector databases — queries, results, relevance scores — structured for inspection and evaluation.
Traces
Full execution trees with parent-child span relationships, powered by OpenTelemetry. Every agent run becomes a navigable trace.
What Waxell captures
LLM Calls
Every model invocation — tokens, latency, cost, provider — captured automatically across OpenAI, Anthropic, Groq, and 200+ providers.
Decisions
Routing choices, classifications, and agent dispatching — with the options considered, the choice made, and the reasoning.
Reasoning
Chain-of-thought steps — the thought, the evidence, and the conclusion — captured so they can be inspected and replayed.
Cost Management
Function calls, API requests, database queries, vector searches — recorded with inputs, outputs, timing, and status.
Function calls, API requests, database queries, vector searches — recorded with inputs, outputs, timing, and status.
Retrieval
Document searches across vector databases — queries, results, relevance scores — structured for inspection and evaluation.
Traces
Full execution trees with parent-child span relationships, powered by OpenTelemetry. Every agent run becomes a navigable trace.

What Waxell captures
LLM Calls
Every model invocation — tokens, latency, cost, provider — captured automatically across OpenAI, Anthropic, Groq, and 200+ providers.
COST MANAGEMENT
Function calls, API requests, database queries, vector searches — recorded with inputs, outputs, timing, and status.
Decisions
Routing choices, classifications, and agent dispatching — with the options considered, the choice made, and the reasoning.
Retrieval
Document searches across vector databases — queries, results, relevance scores — structured for inspection and evaluation.
Reasoning
Chain-of-thought steps — the thought, the evidence, and the conclusion — captured so they can be inspected and replayed.
Traces
Full execution trees with parent-child span relationships, powered by OpenTelemetry. Every agent run becomes a navigable trace.
Observability that enforces policy
Observability that enforces policy
Waxell Observe is not a standalone analytics tool. It feeds directly into the Waxell governance plane.
Policies evaluate agent behavior in real time — before execution, between steps, and after completion. When a policy triggers, the agent receives structured feedback: retry with guidance, escalate to a human, or halt.


Tired of silent failures?
Tired of silent failures?
Operations policies define how agents fail — with timeouts, retries, circuit breakers, and structured fallback. Not silently. Not at 3am.
Operations policies define how agents fail — with timeouts, retries, circuit breakers, and structured fallback. Not silently. Not at 3am.
Built for multi-agent systems
Production agents don't run alone. A coordinator dispatches to a planner, which spawns researchers, which call tools. Waxell Observe traces the full tree — every child agent, every span, every decision — linked by session and lineage.
Parent-child relationships are detected automatically. Session IDs, user context, and the observe client propagate through nested calls without manual wiring.

Agents stuck in infinite loops?
Agents stuck in infinite loops?
Rate-limit policies prevent runaway execution. Set cooldowns, enforce throttles, and cap invocation frequency — per agent, per workflow, per user.
Rate-limit policies prevent runaway execution. Set cooldowns, enforce throttles, and cap invocation frequency — per agent, per workflow, per user.
Works with what you already use
Auto-instruments 200+ libraries. Initialize before importing — no code changes required.
LLM PROVIDERS
OpenAI
Anthropic
Azure
+MORE
+MORE
OpenAI
Anthropic
Azure
+MORE
+MORE
OpenAI
Anthropic
+MORE
+MORE
Azure
VECTOR DATABASES
Pinecone
Chroma
+MORE
Pinecone
+MORE
Chroma
Weaviate
+MORE
Pinecone
+MORE
Chroma
+MORE
Weaviate
frameworks
LangChain
LlamaIndex
CrewAI
+MORE
LangChain
+MORE
LlamaIndex
CrewAI
+MORE
LangChain
+MORE
LlamaIndex
+MORE
CrewAI
INFRASTRUCTURE
PostgreSQL
Redis
MongoDB
+MORE
PostreSQL
+MORE
Redis
MongoDB
+MORE
PostreSQL
+MORE
Redis
+MORE
MongoDB
Autonomy, observed
Install the SDK, connect to your Waxell instance, and start capturing what your agents actually do.

Autonomy, observed
Install the SDK, connect to your Waxell instance, and start capturing what your agents actually do.


