Waxell Observe — AI Agent Observability & Governance

Waxell

Overview

Observe

Product

Early Access

Waxell

Observability and governance for agentic systems

AI agents make decisions, call tools, and coordinate with other agents. Waxell Observe captures what happens — automatically — so teams can understand, debug, and govern agent behavior in production.

Built for Operational Experience (OX)

Built for Developer Experience (DX)

Auto-instrumentation — two lines

import waxell_observe as waxell

# One line. Every OpenAI / Anthropic / Groq call is now traced.
waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")

Auto-instrumentation — two lines

import waxell_observe as waxell

waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")

Observability and governance for agentic systems

Built for Operational Experience (OX)

Built for Developer Experience (DX)

Auto-instrumentation — two lines

import waxell_observe as waxell

# One line. Every OpenAI / Anthropic / Groq call is now traced.
waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")

Observability and governance for agentic systems

Auto-instrumentation — two lines

import waxell_observe as waxell

# One line. Every OpenAI / Anthropic / Groq call is now traced.
waxell.init(api_key="wax_sk_...", api_url="https://api.waxell.dev")

Built for Operational Experience (OX)

Built for Developer Experience (DX)

Everyone observes. Only Waxell governs.

Other tools show you what happened. Waxell Observe does that too — but it also enforces what's allowed to happen next. A dashboard after the fact is not governance. It's an autopsy.

Waxell introduces dynamic governance: policies that evaluate agent behavior in real time — before execution, between steps, and after completion. When a policy triggers, the agent receives structured feedback: retry, escalate, or halt.

Observability tells you what your agents did. Governance ensures they only do what they should.

compare waxell

Everyone observes. Only Waxell governs.

Other tools show you what happened. Waxell Observe does that too — but it also enforces what's allowed to happen next. A dashboard after the fact is not governance. It's an autopsy.

Observability tells you what your agents did. Governance ensures they only do what they should.

compare waxell

What we govern

Eleven policy categories. Each one is a class of problem you no longer solve with hope. Configure rules in the dashboard, enforce them during execution.

Audit

Configure logging and compliance. Every decision, every call, every cost — recorded immutably for review.

Kill

Emergency stop controls. Halt any agent, any workflow, immediately. The button you need when autonomy goes wrong.

Rate-Limit

Control how often workflows can run. Prevent runaway loops, enforce cooldowns, throttle expensive operations.

Content

Input/output content scanning and filtering. Block PII, detect prompt injection, redact sensitive data before it leaves your stack.

LLM

Model-specific constraints. Restrict which models an agent can call, set token ceilings per model, enforce provider allowlists.

Safety

Set safety limits and controls. Define boundaries for what agents are allowed to do — and what they must never do.

Control

Flow control and notifications. Define escalation paths, approval gates, and alerting rules for specific agent behaviors.

Operations

Timeouts, retries, and circuit breakers. Define how agents fail — gracefully, with structure, not silently.

Scheduling

Control when workflows can run. Business hours only, maintenance windows, time-zone-aware execution rules.

Cost

Set spending and token limits. Per-agent, per-user, per-session. Budgets that enforce themselves — not spread-sheets you check on Friday.

Quality

Output validation and quality gates. Score outputs automatically, flag low-confidence responses, block inadequate results.

Audit

Configure logging and compliance. Every decision, every call, every cost — recorded immutably for review.

Content

Input/output content scanning and filtering. Block PII, detect prompt injection, redact sensitive data before it leaves your stack.

Control

Flow control and notifications. Define escalation paths, approval gates, and alerting rules for specific agent behaviors.

Cost

Set spending and token limits. Per-agent, per-user, per-session. Budgets that enforce themselves — not spread-sheets you check on Friday.

Kill

Emergency stop controls. Halt any agent, any workflow, immediately. The button you need when autonomy goes wrong.

LLM

Model-specific constraints. Restrict which models an agent can call, set token ceilings per model, enforce provider allowlists.

Operations

Timeouts, retries, and circuit breakers. Define how agents fail — gracefully, with structure, not silently.

Quality

Output validation and quality gates. Score outputs automatically, flag low-confidence responses, block inadequate results.

Rate-Limit

Control how often workflows can run. Prevent runaway loops, enforce cooldowns, throttle expensive operations.

Safety

Set safety limits and controls. Define boundaries for what agents are allowed to do — and what they must never do.

Scheduling

Control when workflows can run. Business hours only, maintenance windows, time-zone-aware execution rules.

Audit

Configure logging and compliance. Every decision, every call, every cost — recorded immutably for review.

Control

Flow control and notifications. Define escalation paths, approval gates, and alerting rules for specific agent behaviors.

Kill

Emergency stop controls. Halt any agent, any workflow, immediately. The button you need when autonomy goes wrong

Operations

Timeouts, retries, and circuit breakers. Define how agents fail — gracefully, with structure, not silently.

Rate-Limit

Control how often workflows can run. Prevent runaway loops, enforce cooldowns, throttle expensive operations.

Scheduling

Control when workflows can run. Business hours only, maintenance windows, time-zone-aware execution rules.

Content

Input/output content scanning and filtering. Block PII, detect prompt injection, redact sensitive data before it leaves your stack.

Cost

Set spending and token limits. Per-agent, per-user, per-session. Budgets that enforce themselves — not spread-sheets you check on Friday.

LLM

Model-specific constraints. Restrict which models an agent can call, set token ceilings per model, enforce provider allowlists.

Quality

Output validation and quality gates. Score outputs automatically, flag low-confidence responses, block inadequate results.

Safety

Set safety limits and controls. Define boundaries for what agents are allowed to do — and what they must never do.

Worried about runaway spend?

Cost policies enforce per-agent and per-user budgets in real time. Set a ceiling and forget about it.

Start in five minutes

As easy as pip install.

Install the SDK. Set your API key. Initialize before your imports. From that point on, every LLM call, tool invocation, and agent decision is captured automatically — with cost, latency, and token counts attached.

No decorators required to start. No wrapper classes. No changes to your agent logic. When you need more structure, add decorators and context managers incrementally. Governance, scoring, and prompt management are there when you're ready.

Works with any Python agent framework. Supports sync, async, Jupyter notebooks, and production servers. If your agent runs Python, you can observe it.

Need agents that stay inside the lines?

Safety and content policies scan inputs and outputs in real time. PII detection, prompt injection blocking, and output filtering — enforced, not suggested.

What Waxell captures

LLM Calls

Every model invocation — tokens, latency, cost, provider — captured automatically across OpenAI, Anthropic, Groq, and 200+ providers.

Decisions

Routing choices, classifications, and agent dispatching — with the options considered, the choice made, and the reasoning.

Reasoning

Chain-of-thought steps — the thought, the evidence, and the conclusion — captured so they can be inspected and replayed.

COST MANAGEMENT

Function calls, API requests, database queries, vector searches — recorded with inputs, outputs, timing, and status.

Retrieval

Document searches across vector databases — queries, results, relevance scores — structured for inspection and evaluation.

Traces

Full execution trees with parent-child span relationships, powered by OpenTelemetry. Every agent run becomes a navigable trace.

What Waxell captures

LLM Calls

Every model invocation — tokens, latency, cost, provider — captured automatically across OpenAI, Anthropic, Groq, and 200+ providers.

Decisions

Routing choices, classifications, and agent dispatching — with the options considered, the choice made, and the reasoning.

Reasoning

Chain-of-thought steps — the thought, the evidence, and the conclusion — captured so they can be inspected and replayed.

Cost Management

Function calls, API requests, database queries, vector searches — recorded with inputs, outputs, timing, and status.

Retrieval

Document searches across vector databases — queries, results, relevance scores — structured for inspection and evaluation.

Traces

Full execution trees with parent-child span relationships, powered by OpenTelemetry. Every agent run becomes a navigable trace.

What Waxell captures

LLM Calls

Every model invocation — tokens, latency, cost, provider — captured automatically across OpenAI, Anthropic, Groq, and 200+ providers.

COST MANAGEMENT

Function calls, API requests, database queries, vector searches — recorded with inputs, outputs, timing, and status.

Decisions

Routing choices, classifications, and agent dispatching — with the options considered, the choice made, and the reasoning.

Retrieval

Document searches across vector databases — queries, results, relevance scores — structured for inspection and evaluation.

Reasoning

Chain-of-thought steps — the thought, the evidence, and the conclusion — captured so they can be inspected and replayed.

Traces

Full execution trees with parent-child span relationships, powered by OpenTelemetry. Every agent run becomes a navigable trace.

Observability that enforces policy

Waxell Observe is not a standalone analytics tool. It feeds directly into the Waxell governance plane.

Policies evaluate agent behavior in real time — before execution, between steps, and after completion. When a policy triggers, the agent receives structured feedback: retry with guidance, escalate to a human, or halt.

Tired of silent failures?

Operations policies define how agents fail — with timeouts, retries, circuit breakers, and structured fallback. Not silently. Not at 3am.

Built for multi-agent systems

Production agents don't run alone. A coordinator dispatches to a planner, which spawns researchers, which call tools. Waxell Observe traces the full tree — every child agent, every span, every decision — linked by session and lineage.

Parent-child relationships are detected automatically. Session IDs, user context, and the observe client propagate through nested calls without manual wiring.