Waxell

Product

Compare

Waxell

Logan Kelly

Mar 19, 2026

The Kill Switch Problem: How to Stop an AI Agent That's Gone Wrong

Every production agent eventually needs to be stopped — mid-run, immediately, or permanently. Most teams have no defined way to do it. Here's what an emergency stop layer actually looks like.

An email triage agent is deployed to a company's customer support operation. It reads incoming tickets, categorizes them, routes them to queues, and sends acknowledgment emails. Testing goes well. Staging behaves correctly. Two weeks into production, a configuration error in the routing logic creates a loop: the agent misclassifies a ticket, the misclassification triggers a re-routing rule, the re-routing rule causes the agent to read the ticket again. The acknowledgment email fires on each iteration. By the time a support manager notices that the same customer has received 89 identical emails in 31 minutes, the agent is mid-cycle on its ninetieth. The manager calls the on-call engineer and asks how to stop it.

There is no documented answer. The engineer revokes the API key. The agent stops — but not before routing cycles that were already in flight complete, and the customer's final count reaches 104 emails. The key revocation also cuts off two other agent sessions that were behaving correctly.

This scenario plays out in different forms every time an agent fleet hits production at scale. The tool changes — it might be a database write, an external API call, a downstream notification — but the structure is the same: something goes wrong, and the team has no defined, clean way to stop it.

Software has answers to this. Services have health checks, circuit breakers, and graceful shutdown procedures. Databases have transaction rollbacks. Distributed systems have dead man's switches. Agents — systems that run autonomously, take sequences of actions, and produce effects in the world — typically have whatever ad hoc mechanism the engineer reached for the first time something went sideways.

An agent circuit breaker is a mechanism that automatically detects and halts agent execution when specified behavioral thresholds are exceeded — action counts, cost limits, error rates, or repeated-call patterns — without requiring manual intervention. A kill switch is the manual counterpart: a defined procedure for immediately and cleanly terminating a specific running agent session and preventing further tool calls from executing. Together they form the emergency stop layer that every production agentic system requires. The key word is layer: neither mechanism works reliably when it lives inside the agent's own code or context.

Why is stopping an agent harder than stopping other software?

The standard answer to runaway software is straightforward: kill the process. For most components — a web server, a batch job, a microservice — that answer is accurate and sufficient. The process terminates, the work stops, and you have a restart procedure if the state needs recovering.

Agents differ in two ways that make naive process termination dangerous.

The first is tool calls in flight. An agent isn't just a process — it's an orchestrator of external effects. At the moment you decide to terminate it, the agent may have initiated tool calls that haven't returned yet. A database write is mid-transaction. An external API call is pending. An email is queued. Kill the agent process and those calls either complete without supervision or fail mid-execution, potentially leaving external state inconsistent. The process termination answers the computational problem. It doesn't answer the external state problem, which is the one that requires cleanup.

The second is multi-step task context. Agents execute multi-step workflows where intermediate steps create commitments: records opened but not closed, webhooks triggered with expected follow-up calls that won't arrive, downstream agents or systems notified of a workflow that was never completed. Stopping mid-sequence without cleanup leaves the surrounding systems in a state they weren't designed to handle. The agent is gone; the partial effects remain.

This is the kill switch problem in its precise form: it's not technically difficult to stop an agent. It's difficult to stop an agent cleanly — in a way that's scoped to the right session, that handles in-flight calls gracefully, and that leaves the external world in a known state. Most teams discover this distinction under pressure for the first time, which is the worst possible moment to be working it out.

What breaks when there's no emergency stop layer?

Runaway loops without rate control. Without an automated circuit breaker, a looping agent runs until it exhausts an external limit — API rate limit, account balance, infrastructure quota — or until a human intervenes manually. In the interval, it executes at full speed: burning compute, writing to databases, sending messages, triggering downstream workflows. Loops that look benign in low-traffic testing become expensive quickly at production volume. A circuit breaker is a rate-limiting device at its simplest; at a more sophisticated level, it detects patterns that pure rate limits miss — the same parameters called repeatedly, tool calls outside the expected sequence, cost accumulating faster than the task should require.

Improvised incident response. Without a defined kill switch procedure, every incident requiring a stop becomes an improvised response. Engineers need to locate the running session, identify the right credential to revoke, assess whether revoking it causes downstream problems, and verify that the agent actually stopped. This improvisation takes time measured in minutes, during which the agent continues running. The 31-minute gap in the email scenario above wasn't caused by technical complexity — it was caused by the absence of a procedure. Documented, rehearsed kill switch procedures turn that gap into seconds.

Unclean state on termination. An agent stopped via credential revocation doesn't know it was stopped. In-flight tool calls may complete after termination; partial state written during the session remains; cleanup logic that would have run on graceful shutdown doesn't execute. The external world is left in whatever condition the agent happened to be in at the moment of interruption. Teams that encounter this typically add "manual state audit after agent incidents" to their runbooks — which isn't a solution, it's a record of a repeated operational problem.

Opaque post-incident analysis. "Stop the agent" and "understand what it did before we stopped it" are separate problems. Without execution tracing, the second problem is archaeological reconstruction: reading logs, inferring intent from outputs, trying to determine what state existed at each step. If you have a replay-capable trace of the agent's execution captured up to the termination point, post-incident analysis is forensics on a known record. If you don't, it's speculation. The absence of a kill switch and the absence of a trace aren't independent failures — they compound each other.

What does an emergency stop layer actually look like?

There are three distinct capabilities a production emergency stop layer requires. They're often conflated in ways that lead teams to believe they have coverage when they don't.

A session-scoped kill switch with a defined procedure. This is the simplest component and the one most teams believe they have until they need it. The procedure needs to specify: how to identify the specific session to terminate (not the agent class, the running instance), how to prevent new tool calls from executing immediately, what handling applies to in-flight calls, and how to verify that the session is stopped. "Revoke the API key" is a kill switch with bad scope: it terminates all sessions for that credential regardless of which one is misbehaving, it does nothing about tool calls already in flight, and it leaves other correct-behaving sessions offline as collateral damage. A properly scoped kill switch terminates a named session, signals in-flight calls to abort or complete without further chaining, and leaves other sessions running.

An automated circuit breaker with behavioral triggers. The kill switch handles situations where a human has noticed a problem. The circuit breaker handles situations where the problem is moving faster than human detection, or where the behavioral anomaly is too subtle to be immediately obvious. Circuit breaker policy operates at the infrastructure layer: if the agent makes more than N tool calls in M minutes, suspend and alert. If the same tool call fires with identical parameters three times in sequence, halt and require human confirmation. If a cost threshold is exceeded mid-session, pause and surface the alert. These aren't instructions the agent follows — they're checks that execute regardless of what the agent decided to do, at a layer the agent can't bypass.

A rollback definition for what "stopped" means. Terminating an agent doesn't reverse what it already did. The question of how far to roll back external state, and to what target, needs to be defined before an incident. Not all agent actions are reversible — a sent email is permanent, an external API call is permanent, a third-party notification may or may not be retractable depending on the service. The rollback definition documents the realistic scope: which effects can be undone and which can't, what the target state is for effects that are reversible, and what procedure triggers rollback. This isn't a technical mechanism — it's an operational document. Its value is that it makes incident response execution rather than improvisation. Teams that define rollback scope in advance make different decisions when they configure tool access, because the question "what happens if we need to stop this mid-run?" has a concrete answer.

What does this architecture unlock?

An emergency stop layer changes what's operationally possible in ways that compound with scale.

Broad tool access becomes less risky. One of the practical constraints on how aggressively teams give agents tool access is the question of blast radius: if something goes wrong, how bad can it get? An agent with narrow tool access has limited blast radius by construction. An agent with a reliable circuit breaker has bounded blast radius by policy — it will self-limit when behavior diverges from expectations, regardless of what tools it has access to. The circuit breaker substitutes for artificially narrow scope in many scenarios, allowing agents to be more capable because they're more safely stoppable.

Human-in-the-loop checkpoints become enforceable. The "agent runs autonomously until a threshold, then pauses for confirmation" pattern requires that pause actually works. Without a reliable pause mechanism at the infrastructure level, human-in-the-loop is a best-effort arrangement — the agent will try to pause, unless it's in a state where it doesn't reach the pause logic. With a circuit breaker that operates outside the agent's execution context, pause is infrastructure-enforced, not agent-behavior-dependent.

Incidents leave clean records. When an agent is stopped via a defined procedure, the execution trace records a clean termination event: the state at stop, the policy that triggered it, the in-flight calls and their resolution. When an agent is stopped by credential revocation under pressure, the trace — if it exists — shows an abrupt interruption with no context for why or what was happening. The difference matters for post-incident analysis, compliance documentation, and determining whether the same failure can recur.

How Waxell handles this: Waxell's governance plane includes a kill policy type that terminates agent sessions at the infrastructure layer — scoped to specific named sessions rather than credentials, with configurable handling for in-flight tool calls at the point of termination. Operations policies implement circuit breaker behavior: action count limits, cost thresholds, and repeated-call detection that halt execution and surface alerts before manual intervention becomes necessary. Execution tracing captures the full execution state at the point of any termination, making every stopped session a replay-capable forensic record rather than a gap in the trace.

The kill switch problem sounds like an edge case until the first 31-minute loop. Most teams encounter it once, add something improvised to their runbooks, and treat it as resolved. The improvisation works until the loop runs faster, the external effects are harder to reverse, or the incident happens outside business hours when the one engineer who knows which key to revoke is unavailable.

Circuit breakers and kill switches aren't features you add once infrastructure is mature. They're prerequisites for production deployment, in the same category as execution tracing and policy enforcement. An agent without a defined emergency stop procedure isn't production-ready — it's deployed with no off-ramp.

The teams that build emergency stop infrastructure before they need it spend an incident resolving a technical problem. The teams that don't spend the same incident improvising one.

If you're building agent infrastructure and want kill policy and circuit breaker capabilities at the governance layer, get early access to Waxell.

Frequently Asked Questions

What is an AI agent kill switch? An AI agent kill switch is a defined procedure for immediately and cleanly terminating a specific running agent session, preventing further tool calls from executing, and leaving external systems in a known state. A properly scoped kill switch differs from credential revocation in that it targets a named session rather than all sessions sharing a credential, signals in-flight tool calls to abort or complete without further chaining, and preserves the execution trace up to the termination point. The kill switch is the manual component of the emergency stop layer; the automated component is the circuit breaker.

What is an AI agent circuit breaker? An AI agent circuit breaker is an automated mechanism that monitors agent behavior against defined thresholds — action counts, cost limits, repeated-call patterns, anomaly scores — and halts execution when those thresholds are exceeded, without requiring manual intervention. Unlike a kill switch, which is triggered by a human who has noticed a problem, a circuit breaker fires automatically based on behavioral rules defined in advance. It operates at the infrastructure layer, outside the agent's execution context, so it fires regardless of what the agent's own logic does or fails to do.

Why is stopping an AI agent harder than stopping other software? Standard software can usually be stopped by killing the process. Agents are harder to stop cleanly because they have tool calls in flight at the moment of termination — external API calls, database writes, downstream notifications — that may complete or fail mid-execution, leaving external state inconsistent. Agents also execute multi-step workflows where stopping mid-sequence leaves partially-completed chains that other systems may expect to be complete. A robust kill switch needs to address both: preventing new tool calls from starting and handling in-flight calls gracefully, not just terminating the local process.

What should an agent rollback procedure include? A rollback procedure should define three things: what external effects are reversible (database writes, internal state) versus permanent (sent emails, external API calls, third-party notifications); what the rollback target is (last confirmed checkpoint, last known-good state); and who is authorized to initiate rollback. Rollback procedures that imply full reversibility fail when applied to actions that are inherently permanent. The goal is to document the realistic scope of recovery before an incident, so the response during an incident is execution rather than improvisation under pressure.

How do I implement an emergency stop for my AI agent? The minimum implementation requires two components: a session-scoped termination mechanism (not credential revocation) and a policy check before every tool call executes. The termination mechanism marks a specific running session for shutdown so that the next pre-call policy check sees the signal and stops execution without completing the call. The policy check must run at the infrastructure layer — not inside the agent code, where a misbehaving agent could bypass it. For the circuit breaker, you need a state store that tracks per-session action counts, call patterns, and cost, and can trigger the termination mechanism automatically. Many teams start with a wrapper around tool functions that performs pre-call checks; a production-grade implementation puts this logic in a dedicated policy layer outside the agent's execution context, so it can't be circumvented by the agent it governs.

Sources

Hadfield-Menell, D. et al., The Off-Switch Game (IJCAI 2017) — https://arxiv.org/abs/1611.08219
Soares, N. et al., Corrigibility (AAAI AI & Ethics Workshop, 2015) — https://cdn.aaai.org/ocs/ws/ws0067/10124-45900-1-PB.pdf
NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0) (2023) — https://doi.org/10.6028/NIST.AI.100-1
LangChain, State of Agent Engineering (2026) — https://www.langchain.com/state-of-agent-engineering

Agentic Governance, Explained

Don't Build Governance Into Your Agents. Build It Above Them.

Most teams enforce agent governance through system prompt rules and conditional code. That's the wrong architecture — and it fails in exactly the situations where you need it most.

Logan Kelly

Mar 18, 2026

AgentOps: The Discipline That Comes After You Ship Your First Agent

MLOps taught us that shipping models to production is a different discipline than training them. AgentOps is the same lesson, one layer up — and it's harder.

Logan Kelly

Mar 17, 2026

Prompt Injection Is an Agent Problem, Not a Model Problem

Security vendors frame prompt injection as a model safety issue. They're wrong. When models have tools, the attack surface changes shape entirely — and model-level defenses stop working.

Logan Kelly

Mar 16, 2026

Testing Governance, Not Just Behavior: What's Different About Agent QA

Behavioral testing tells you if your agent works. Governance testing tells you if the control layer that's supposed to stop it actually will. Most teams only do one.

Logan Kelly

Mar 13, 2026

Don't Build Governance Into Your Agents. Build It Above Them.

Most teams enforce agent governance through system prompt rules and conditional code. That's the wrong architecture — and it fails in exactly the situations where you need it most.

Logan Kelly

Mar 18, 2026

AgentOps: The Discipline That Comes After You Ship Your First Agent

MLOps taught us that shipping models to production is a different discipline than training them. AgentOps is the same lesson, one layer up — and it's harder.

Logan Kelly

Mar 17, 2026

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

Product

Company