Logan Kelly
Feb 10, 2026
"Guardrails" means different things in different contexts. Here's how policy enforcement for AI agents actually works — pre-execution blocking, mid-execution interception, and post-execution audit.

"Guardrails" is the word that does the most work in the AI safety conversation while doing the least amount of specification.
Ask ten engineers what they mean by guardrails and you'll get ten different answers. Some mean output filtering — checking what the model says before it goes to the user. Some mean system prompt instructions that tell the model to behave well. Some mean topic restrictions that prevent the model from engaging with certain domains. These are all real things. None of them is what I mean by policy enforcement, and none of them constitutes a governance strategy.
This post is about what policy enforcement for AI agents actually looks like when it needs to be reliable, auditable, and effective in production — not just plausible on a demo.
AI agent policy enforcement is the practice of implementing rules that govern agent behavior through technical mechanisms outside the agent's own reasoning — not through system prompt instructions, but through enforcement layers that act regardless of what the model decides. Effective enforcement operates at three temporal moments: pre-execution (blocking a proposed action before it fires), mid-execution (intercepting the action stream as it's happening), and post-execution (auditing completed actions and triggering remediation). Each moment catches different risks; together they constitute a complete governance layer. (See also: What is agentic governance → · The governance gap →)
The Fundamental Problem with Prompt-Based Guardrails
The most common approach to agent behavior control is the system prompt. You include instructions like "do not share confidential information," "always ask for confirmation before sending emails," "do not call the payment API without explicit user approval." These feel authoritative. They're not.
System prompt instructions are suggestions to a probabilistic system. LLMs follow them most of the time. They don't follow them all of the time. Under adversarial conditions — prompt injection, unusual input formats, carefully constructed edge cases — compliance rates with system prompt instructions drop significantly. Under distribution shift (inputs that don't match your training or testing distribution), they drop unpredictably.
This doesn't mean you shouldn't use system prompts thoughtfully. You should. But system prompts are not a governance layer. They're part of the user experience design. Governance requires enforcement mechanisms that exist outside the model's reasoning process — mechanisms that act regardless of what the model decides.
Those three enforcement moments — pre-execution, mid-execution, and post-execution — each catch different risks. Together they constitute a complete governance layer.
Pre-Execution: Block Before It Happens
Pre-execution enforcement intercepts a proposed action before it's executed. The agent has decided it wants to do something. The policy layer evaluates whether it's allowed to. If not, the action is blocked, and the agent receives a response indicating the block and why.
This is the most powerful enforcement position because it prevents consequences before they occur. No data is transmitted. No tool is called. No cost is incurred. The bad action simply doesn't happen.
What pre-execution enforcement covers:
Input inspection. Before the agent's input is processed by the LLM, it can be scanned for content that violates policy — PII that shouldn't enter the context, injection patterns, content categories you've flagged as restricted. If the input fails inspection, you can sanitize it, reject it, or route it differently before it ever reaches the model.
Action authorization. Before a tool call is executed — before an API is hit, a file is written, a database is updated, an email is sent — a policy check determines whether this action is permitted. The authorization decision can be based on the action type, the parameters of the action, the session context, the user's permission level, or any combination. This is where you enforce "do not call the payment API without explicit user approval" in a way that actually works — not through a prompt instruction, but through an enforcement gate that the model cannot reason its way around.
Spend pre-authorization. Before initiating an operation that will incur cost — a long context call, an expensive tool invocation — a budget check determines whether the session has remaining allocation. If not, the operation is blocked before cost is incurred.
Mid-Execution: Intercept In-Flight
Pre-execution enforcement assumes you can predict what actions an agent will want to take. For simple, well-defined agents, you can. For more complex agents with multi-step reasoning and dynamic tool selection, there will be cases where an action sequence you didn't fully anticipate emerges.
Mid-execution enforcement intercepts the agent's action stream as it's happening and applies policies in real time, including policies based on accumulated context that wasn't available at the start of the session.
What mid-execution enforcement covers:
Tool result inspection. A tool call was made and permitted. The result comes back. Before that result is appended to the agent's context, it's inspected — for PII that shouldn't enter context, for injection patterns, for content policy violations, for schema anomalies that indicate something unexpected happened. If the result fails inspection, it can be redacted, replaced with a sanitized version, or flagged.
Sequence-level policy evaluation. Some policies only make sense at the sequence level, not the individual action level. If an agent has made five different external API calls in a single session, that pattern may be a policy violation even if each individual call was permitted. Mid-execution monitoring can track patterns across a session and trigger policy responses based on accumulated behavior.
Budget enforcement. As token spend accumulates within a session, mid-execution monitoring tracks against the budget ceiling and triggers predefined responses — compression, warning, capping — as thresholds are approached and crossed.
Post-Execution: Audit and Remediate
Post-execution isn't enforcement in the sense of preventing actions — the action has already occurred. It's the foundation of your audit trail and the trigger for remediation workflows.
What post-execution covers:
Audit record creation. Every action the agent took, every policy evaluation that was performed, every enforcement decision that was made — logged with full context. The audit record should be sufficient to reconstruct what happened and why, including what the agent's context was at the time a decision was made. "The call was made" is not sufficient. "The call was made, here is the full context at that moment, here is the policy that was evaluated, here is the outcome" is sufficient.
Violation flagging. Actions that completed but should be reviewed — either because they barely passed policy or because they fit a pattern that warrants attention — get flagged for human review. This creates a workflow for operationalizing governance, not just logging it.
Retrospective detection. For behavioral patterns that are only apparent in aggregate — a class of queries where the agent consistently underperforms, a tool call pattern that's technically within policy but warrants investigation, a cost distribution that's shifted in a concerning direction — post-execution analysis surfaces signals that weren't visible at the individual event level.
Policy Definition: The Work Before the Enforcement
None of this enforcement machinery matters if you haven't done the harder work of defining what your policies actually are.
Policy definition requires answering questions that feel abstract but have concrete implications:
What are the hard constraints — the things the agent must never do, regardless of context? These become blocking rules at the pre-execution layer.
What are the conditional permissions — things the agent may do under certain conditions? These become conditional authorization rules with context-dependent evaluation logic.
What are the budget parameters — token allocation per session, per user, per day; cost ceilings at various granularities? These become spend guardrails with defined response actions at threshold crossings.
What needs to be auditable — which actions, which data flows, which policy evaluations? These determine your audit log schema and retention requirements.
Policies need to be explicit, versioned, testable, and documented. An implicit policy — "the agent shouldn't do X" based on a system prompt instruction — is not a policy in the governance sense. An explicit policy — "action type Y with parameter P matching pattern Z is blocked for sessions with context flag Q" — is.
Testing Your Policies
A policy layer that only gets exercised in production incidents is not sufficient. Your policies need to be tested against known scenarios before they're deployed, and they need to be validated against your real production traffic on an ongoing basis.
This means having a test suite for your governance layer — not just for your agent's core behavior. Tests that verify:
Known bad inputs are blocked at the right layer
Known good inputs pass without unnecessary friction
Budget guardrails trigger at the right thresholds with the right responses
PII detection catches the patterns you care about with acceptable false positive rates
Audit records are created for the events that need records
The teams that have done this work have a dramatically different experience during incidents than the teams that haven't. When something goes wrong, the question isn't "do our policies work?" — it's "which policy handled this, and did it handle it correctly?"
That's a much better problem to have.
How Waxell handles this: Waxell implements all three enforcement layers — pre-execution blocking (input inspection, action authorization, spend pre-authorization), mid-execution interception (tool result inspection, sequence-level policy evaluation, budget enforcement), and post-execution audit (full decision context, violation flagging, retrospective analysis) — as a deployable layer over your existing agents. You define policies explicitly; Waxell enforces them in production. No rewrites required. See the policy enforcement docs →
Frequently Asked Questions
What is AI agent policy enforcement? AI agent policy enforcement is implementing governance rules through technical mechanisms that act outside an agent's reasoning process, regardless of what the model decides. It's distinct from system prompt guardrails, which are suggestions to a probabilistic model and fail under adversarial conditions. Policy enforcement requires an external layer — typically at the infrastructure level — that evaluates and acts on agent behavior independently of the model.
How is AI agent policy enforcement different from system prompt guardrails? System prompt instructions are suggestions to a probabilistic model. LLMs follow them most of the time — not all of the time. Under adversarial conditions (prompt injection, unusual inputs), compliance with system prompt constraints drops significantly. Policy enforcement uses mechanisms outside the model's reasoning: pre-execution gates that block actions before they fire, mid-execution interceptors that apply rules regardless of what the model decided, and post-execution audit that documents every governance decision. The model cannot reason its way around these.
What are the three layers of AI agent policy enforcement? Pre-execution enforcement blocks proposed actions before they execute — input inspection, action authorization, spend pre-authorization. This is the most powerful position because it prevents consequences before they occur. Mid-execution enforcement intercepts the agent's action stream in real time — tool result inspection, sequence-level policy evaluation, budget tracking. Post-execution enforcement creates the audit trail and triggers remediation — logging every governance decision with full context, flagging violations for human review, enabling retrospective behavioral analysis.
What is pre-execution enforcement for AI agents? Pre-execution enforcement intercepts a proposed action before it executes. The agent has decided it wants to take an action; the policy layer evaluates whether it's permitted to. If not, the action is blocked and the agent receives a response explaining the block. This is the strongest enforcement position — no data transmitted, no tool called, no cost incurred, no consequences to remediate. Pre-execution covers input inspection (PII, injection patterns), action authorization (is this tool call permitted under current conditions?), and spend pre-authorization (does this session have remaining budget?).
How do you test AI agent policies? Policies need a test suite separate from your agent's core behavior tests. The suite should verify: known bad inputs are blocked at the right layer; known good inputs pass without friction; budget guardrails trigger at the correct thresholds with the correct responses; PII detection catches targeted patterns at acceptable false positive rates; and audit records are created for all events that require them. Running this suite before every deployment that changes policies is the difference between finding out a policy broke in production and finding out in CI.
Agentic Governance, Explained




