Logan Kelly
LangSmith gives you unmatched visibility into LangChain agents. Waxell governs what agents can do at runtime. Here's how to choose — and when you need both.

Two teams, same problem: they need visibility into their production AI agents. Team A is all-in on LangChain, building a customer support bot with limited external tool access. Team B has a multi-framework agent fleet that touches customer PII, calls financial APIs, and operates under compliance requirements. Both teams evaluate LangSmith.
Team A deploys it and never looks back. The native LangChain integration requires almost no setup, the trace explorer is excellent for debugging, and the dashboards show everything they need.
Team B runs into a wall. LangSmith shows them everything their agents are doing — and gives them no way to stop it. When an agent starts calling a PII-adjacent endpoint it shouldn't have access to, they see it in the trace. They can't block it. When they need to demonstrate to their compliance team that certain data handling policies are enforced at runtime, not just monitored, there's nothing to show. LangSmith is doing exactly what it's designed to do. It just isn't designed for what Team B needs.
The distinction here — between observability and runtime governance — is what this comparison is actually about.
LangSmith is an observability and evaluation platform built primarily for LangChain applications. It gives you traces, debugging, and quality dashboards. Waxell is a runtime governance platform that combines observability with policy enforcement, guardrails, and compliance controls that operate before agent actions execute. LangSmith tells you what your agent did. Waxell governs what your agent can do. If you need one, you probably know it. If you need both — that's a different conversation.
What does LangSmith actually do well?
LangSmith's core value is its deep, native integration with LangChain, and it earns that reputation. For teams building on LangChain, the developer experience is the best in the observability space.
Tracing and debugging. One decorator — @traceable — and your LangChain chains automatically stream to LangSmith's dashboard. The trace explorer renders the full execution tree: each LLM call, its inputs and outputs, latency, token count, and cost. For debugging a misbehaving chain, the visibility is exceptional. You can see exactly where the model went wrong, replay specific steps, and compare outputs across runs.
Evaluation framework. LangSmith includes built-in tooling to score outputs, run test datasets against your production prompts, and track quality regressions over time. If you're iterating on prompts and want to measure whether your changes actually improved things, the eval workflow is well-integrated with the tracing data.
Cost and latency dashboards. Per-run cost tracking, latency distributions, and alert configuration are first-class features. For teams optimizing their LangChain applications, the operational dashboards are solid.
For a pure LangChain shop, LangSmith is hard to beat on the observability dimension. The integration overhead is nearly zero, the UX is polished, and the debugging workflow is purpose-built for the way LangChain structures execution.
The limits appear when you step outside that narrow focus.
Where does LangSmith fall short?
No governance. This is the fundamental gap. LangSmith has no runtime policy enforcement, no tool access controls, no output filtering, and no rate limiting. It can show you that an agent called an endpoint it shouldn't have — after the call completed. It cannot stop that call from happening. For teams that need "prevent this, not just detect this," LangSmith is the wrong tool.
LangChain vendor lock-in. LangSmith is designed around LangChain's execution model. If you use CrewAI, LlamaIndex, or custom Python agents, the integration works but the native developer experience dissolves. Multi-framework teams end up with uneven coverage — LangChain components visible in LangSmith, everything else instrumented manually at significant overhead.
No MCP governance. As Model Context Protocol becomes the standard for agent tool definitions, LangSmith has no native instrumentation for MCP-based agent tool calls and no MCP-mediated policy enforcement. Teams building MCP-native agent stacks need a different integration point.
Compliance limitations. LangSmith provides logs, but not the enforcement record that compliance audits require. "Here are our traces" is different from "here is documented evidence that Policy X was evaluated and enforced before every relevant action." Enterprise teams operating under SOC 2, HIPAA, or financial compliance requirements find the gap significant.
Feature comparison
Capability | Waxell | LangSmith |
|---|---|---|
Observability | ||
Trace collection | ✅ Yes (3-line SDK) | ✅ Yes (native LangChain) |
Cost & latency dashboards | ✅ Yes | ✅ Yes |
Debugging UI | ✅ Yes | ✅ Yes (best-in-class for LC) |
Evaluation framework | ⚠️ Manual | ✅ Yes (built-in) |
Governance & Runtime Control | ||
Runtime policy enforcement | ✅ Yes (core capability) | ❌ No |
Tool access control | ✅ Yes | ❌ No |
Output filtering / guardrails | ✅ Yes | ❌ No |
Rate limiting (per session) | ✅ Yes | ❌ No |
Compliance audit trail | ✅ Yes | ⚠️ Limited |
Human-in-the-loop approval gates | ✅ Yes | ❌ No |
Framework & Stack | ||
LangChain | ✅ Yes | ✅ Yes (native) |
CrewAI, LlamaIndex, custom Python | ✅ Yes (core design) | ⚠️ Clunky |
MCP-native support | ✅ Yes | ❌ No |
Deployment | ||
Cloud SaaS | ✅ Yes | ✅ Yes |
Self-hosted / on-premises | ✅ Yes | ⚠️ Enterprise only |
What governance actually looks like in production
The governance gap between the two platforms isn't abstract. Here's what it looks like in practice.
Scenario: An agent with access to a customer database starts querying fields it shouldn't.
LangSmith: You see the query in the trace — after it executed.
Waxell: The tool call is checked against a scope policy before execution. If the field is out of scope, the call is blocked and logged. The database is never queried.
Scenario: A cost spike in an agent session — one session is making hundreds of tool calls.
LangSmith: You see the spend spike in the dashboard. The session runs until it hits an external rate limit or you manually intervene.
Waxell: A cost policy fires when the session exceeds its threshold. The session is halted. Other sessions continue unaffected.
Scenario: A compliance audit requires documentation that PII handling policies were enforced at runtime.
LangSmith: You can show traces. You cannot show evidence that policies were evaluated and enforced before each relevant action — because they weren't.
Waxell: The execution trace includes policy evaluation records alongside every tool call. The audit trail demonstrates enforcement, not just logging.
These aren't edge cases. They're the standard operational questions that production agent deployments at scale encounter.
What migration looks like
If you're moving from LangSmith to Waxell — or running them in parallel:
From LangSmith (LangChain):
The instrumentation change is minimal. The difference is that Waxell evaluates governance policies at each step of the execution — not just records it.
Running both: You can run LangSmith for its debugging and evaluation UX while adding Waxell as the governance layer. The two tools don't conflict — LangSmith observes, Waxell governs. If your team relies on LangSmith's trace explorer for day-to-day debugging, you don't have to give it up.
When to choose LangSmith
Your stack is LangChain end-to-end and you have no plans to change that.
Debugging and trace exploration are your primary use case — LangSmith's UI is purpose-built for this.
You're in development or iteration mode rather than production governance mode.
You don't have compliance requirements that need runtime enforcement documentation.
Evaluation tooling (scoring, regression tracking) is important to your workflow.
When to choose Waxell
Your agent fleet spans multiple frameworks (LangChain plus anything else).
You need runtime governance: tool access control, output filtering, cost enforcement, rate limiting.
You operate in a regulated environment where compliance documentation requires enforcement records, not just logs.
Your agents use MCP for tool definitions and you need governance at the MCP layer.
You need the ability to stop specific running sessions without credential revocation or service restarts.
You want a single platform for observability and governance rather than two separate systems.
How Waxell handles this: Waxell's governance plane operates above agent code — evaluated before each tool call and output, independent of the agent's own logic. Governance policies enforce tool access scope, cost limits, output filters, and approval gates at runtime. Execution tracing captures the full session record including policy evaluations and enforcement actions, producing an audit trail that documents both what the agent did and what the governance layer enforced. Framework-agnostic: the same three-line integration works across LangChain, CrewAI, and custom Python agents.
LangSmith is good at what it does. If you're a LangChain team that needs debugging visibility and evaluation tooling, it's the right choice and there's no reason to move off it.
The limit isn't a product shortcoming — it's a design decision. LangSmith is built to help you understand agent behavior. Waxell is built to control it. When your agents are in production at scale, touching sensitive data, or operating under compliance requirements, "understand" and "control" aren't the same requirement.
If you're evaluating both: build the prototype with whatever gives you the fastest feedback loop. When you're ready to deploy to production with governance, get early access to Waxell.
Frequently Asked Questions
Is Waxell a LangSmith alternative? It depends on what you mean by "alternative." For observability and tracing, Waxell is a full replacement: it provides traces, spans, cost tracking, and session monitoring across any framework, not just LangChain. For evaluation and debugging workflows — scoring outputs, running test datasets, A/B comparing prompts — LangSmith has more purpose-built tooling. The more important comparison is governance: Waxell adds runtime policy enforcement, tool access control, output filtering, and compliance audit trails that LangSmith doesn't offer. Teams that need governance need Waxell; teams that only need observability can use either.
Can I use Waxell and LangSmith together? Yes, and many teams do. LangSmith provides excellent debugging and evaluation tooling for LangChain applications; Waxell provides the governance layer that operates at runtime. The two tools don't conflict — LangSmith records execution after the fact, Waxell enforces policies before tool calls execute. If your team depends on LangSmith's trace explorer for debugging, you don't have to give it up to add Waxell governance.
Does Waxell work with LangChain? Yes. Waxell integrates with LangChain, CrewAI, LlamaIndex, and custom Python agents through the same SDK. For LangChain specifically, the instrumentation replaces LangSmith's @traceable decorator with a Waxell trace context manager — a minimal change that adds governance enforcement to the existing observability. LangChain support is part of Waxell's core framework compatibility, not a special integration.
What governance features does LangSmith lack? LangSmith has no runtime policy enforcement: it records what agents do but cannot prevent actions, block tool calls, enforce tool access scope, filter outputs, or rate-limit sessions. It also has limited compliance audit trail capability — it produces logs but not enforcement records that demonstrate policies were evaluated and applied before sensitive actions executed. For teams with compliance requirements that need evidence of runtime controls, not just post-hoc logs, LangSmith's audit coverage is insufficient.
What does migration from LangSmith to Waxell involve? For LangChain applications, the instrumentation change is minimal: replace LangSmith's @traceable decorators with Waxell trace context managers. The SDK initialization adds three lines at setup. Framework detection is automatic for LangChain chains. The main effort is configuration: defining governance policies (tool access scope, cost limits, output filters) that match your production requirements. LangSmith trace history stays in LangSmith; Waxell begins a fresh record. Teams can run both in parallel during transition to validate coverage parity.
Agentic Governance, Explained




