Waxell

Product

Compare

START FREE

Waxell

Logan Kelly

Apr 7, 2026

AWS Security Agent Is Generally Available. Is Your Governance?

AWS Security Agent went GA on March 31, 2026. It runs autonomous penetration tests at $50/task-hour with no built-in human approval gate before high-risk actions. Here's what that means for governance.

Waxell blog cover: AWS Security Agent Is GA. Is Your Governance?

On March 31, 2026, AWS announced that AWS Security Agent — its autonomous AI penetration tester — is generally available in six regions (US East, US West, Europe Ireland, Europe Frankfurt, Asia Pacific Sydney, and Asia Pacific Tokyo), charging $50 per task-hour with a full application security evaluation running up to $1,200 for a 24-hour engagement.

That's a compelling price point. External pen testing firms charge between roughly $15,000 and $50,000 for mid-range enterprise engagements, take weeks to schedule, and hand back a PDF. AWS Security Agent operates 24/7, scales to your development velocity, and starts testing immediately. For security teams that have been rationing pen tests to once-per-year due to cost and lead time, this is transformative.

Here's what the launch announcement didn't lead with: AWS describes the Security Agent as a "frontier agent" that operates "without constant human oversight," executing "sophisticated attack chains" autonomously and exploiting identified vulnerabilities "with targeted payloads" — without a required human confirmation gate before it proceeds to each step in an exploit sequence. AWS's own agentic AI governance blog has separately noted that "for fully autonomous systems, humans must maintain supervisory oversight with the ability to provide strategic guidance, course corrections, or interventions" — a requirement with no built-in enforcement mechanism in the Security Agent itself.

An autonomous agent that can enumerate vulnerabilities, chain exploit sequences, and take actions with real consequences in production-adjacent environments — without a required human gate before the high-risk steps — is not a security problem waiting to happen. It's a governance problem that already arrived.

Agentic governance for autonomous security agents is the set of runtime policies and approval workflows that determine which actions an agent can take autonomously versus which require human confirmation before execution. It is distinct from the agent's underlying capability (what it can do) and from after-the-fact logging (what it did do). Without human-in-the-loop approval gates, a security agent's scope and blast radius are bounded only by what it was allowed to access — not by what a human decided was appropriate for each engagement.

What does AWS Security Agent actually do autonomously?

AWS Security Agent is one of what AWS calls its "frontier agents" — a class of autonomous AI systems designed to perform multi-step work without human hand-holding at each step. The Security Agent specifically handles application security testing: it receives a target scope, performs reconnaissance, identifies vulnerabilities, chains exploit sequences, and produces a report.

In preview, AWS and its customers reported that the agent "compresses penetration testing timelines from weeks to hours" and delivers results with "significantly fewer false positives" than traditional automated scanners. LG CNS reported 50% faster testing and ~30% lower costs. Wayspring and HENNGE reported similar results.

What the performance data doesn't answer is the governance question that every enterprise deploying this needs to answer: at what point in a testing engagement does a human need to confirm before the agent proceeds?

The difference between a routine reconnaissance scan and an active exploit attempt is significant. A routine scan discovers your attack surface. An active exploit attempt — even in a sandboxed test environment — can cause downtime, expose data, trigger IDS alerts, and in misconfigured environments, cross into production systems. The blast radius between these two actions is not the same, and the appropriate oversight threshold is not the same.

AWS Security Agent executes both. With the same autonomy. Without a built-in requirement to surface the step change to a human reviewer.

Why can't you just set the scope and trust the agent?

The instinct to answer the governance question with scope configuration is understandable. Define the target scope tightly enough, and the agent can't wander outside it. AWS's own policy framework for frontier agents notes that "policy defines what an agent can and cannot do — enforced externally so even a misaligned LLM cannot bypass it."

This is scope governance. It's necessary but not sufficient.

Scope defines the space the agent can operate in. It doesn't determine when within that space a human should review a decision. Consider three actions all within a correctly scoped engagement:

Action 1: Port scan against the target IP range. Low risk, no side effects, generates reconnaissance data.

Action 2: Attempt SQL injection against an identified form endpoint. Moderate risk, contained to the test target, might produce noise in application logs.

Action 3: Chain the SQL injection with a discovered path traversal to extract a configuration file that includes credentials to an adjacent system. High risk — even in a test environment, this credential exposure has real-world consequences if credentials are shared across environments.

All three actions are within scope. None require scope expansion. But Action 3 is the kind of step that, in a human-led pen test, the tester would typically call out to the client before proceeding: "We've found a chain that gives us access to credentials — do you want us to continue and demonstrate full impact, or stop here?"

An autonomous agent executing Action 3 without surfacing that decision is not violating its scope. It's operating without the approval gate that the moment requires.

This is the human-in-the-loop problem in security agents specifically, and it's not unique to AWS Security Agent. It's the governance gap that opens any time an autonomous agent acquires multi-step capability in a domain where intermediate steps have asymmetric consequences.

What does the $50/task-hour model mean for cost governance?

There's a second governance dimension the launch coverage has mostly ignored: cost.

At $50 per task-hour, a full 24-hour AWS Security Agent evaluation costs up to $1,200. That's dramatically cheaper than traditional pen testing — but it's still a metered agentic workload with real per-session cost accumulation.

The question teams should be asking: what controls prevent an engagement from running significantly longer than planned? If the agent discovers an unusually complex attack surface midway through an evaluation, what stops it from continuing to accrue hours against the original task without a human confirming the expanded scope and cost?

Per-session cost enforcement — a ceiling that triggers a human review or terminates the session before it exceeds a defined threshold — is not a default feature of the AWS Security Agent pricing model. It's a governance control that teams need to build into how they invoke and monitor the agent.

For teams that are running multiple concurrent Security Agent evaluations across a large application portfolio, this adds up quickly. An uncontrolled fleet of security agents could cost $5,000–$10,000 in a single business day without any individual evaluation appearing obviously wrong.

What does 81% adoption with 14% governance coverage mean for security agents specifically?

According to Gravitee's 2026 State of AI Agent Security report, 81% of enterprise teams have moved past the planning phase with AI agents, but only 14.4% have full security approval processes in place for those agents. The same report found that more than half of all agents operate without any security oversight or logging.

Apply that ratio to security agents specifically, and the picture gets uncomfortable. A security agent without an approval audit trail creates a scenario where an autonomous system is taking actions against your infrastructure — actions that include vulnerability enumeration, exploit attempts, and credential exposure — with no durable record of which steps were approved, by whom, at what time, against what reasoning.

This is the compliance gap that happens before the regulatory gap. For organizations in financial services, healthcare, or government contracting, operating autonomous security agents without a human-in-the-loop approval trail for high-risk actions is an audit finding waiting to happen. For SOC 2 Type II, ISO 27001, and FedRAMP auditors, the question is not just "did the pen test find vulnerabilities?" — it's "who authorized each stage of the testing, and what is your documentation?"

An autonomous agent that self-authorizes its own escalation steps doesn't produce that documentation by default.

How Waxell handles this

How Waxell handles this: Waxell's approval policies allow you to define escalation triggers that apply to agentic workloads regardless of the underlying agent framework. For a security agent deployment, this means configuring human sign-off rules that fire before the agent proceeds to a higher-risk action class — for example, requiring explicit approval before any exploit chaining step, before any credential extraction, or before any action that touches adjacent systems outside the originally defined target. The approval gate is enforced at the governance layer, not inside the agent's code, which means it can't be bypassed by an LLM that reasons its way to "this is within scope."

On the cost dimension: Waxell's human oversight guarantees extend to cost enforcement — per-session budget policies that trigger a mandatory human review when a session approaches a defined spend threshold, rather than letting a metered agentic workload run to completion unmonitored.

The approval audit trail embedded in Waxell's execution tracing produces the documentation that compliance requires: every policy evaluation, every approval gate triggered, every human decision recorded alongside the agent action it preceded. When your auditor asks who authorized the credential extraction step, you have an answer.

Frequently Asked Questions

What is AWS Security Agent?
AWS Security Agent is an autonomous AI penetration testing system from AWS, generally available as of March 31, 2026, in six regions (US East, US West, Europe Ireland, Europe Frankfurt, Asia Pacific Sydney, Asia Pacific Tokyo). It performs on-demand application security testing — including vulnerability enumeration and exploit sequencing — at $50 per task-hour, with a full 24-hour evaluation costing up to $1,200. AWS classifies it as a "frontier agent": an autonomous system capable of multi-speed operation that runs "without constant human oversight."

Does AWS Security Agent require human approval before high-risk actions?
Not by default. AWS describes the Security Agent as operating autonomously through exploit sequences without surfacing each step for human review — this is the intentional design as a "frontier agent." Teams deploying AWS Security Agent need to implement their own human-in-the-loop approval gates at the governance layer. AWS's own agentic AI governance documentation acknowledges that "for fully autonomous systems, humans must maintain supervisory oversight with the ability to provide strategic guidance, course corrections, or interventions" — but implementing this requirement is left to the deploying team, not enforced by the product.

What is the human-in-the-loop problem for security agents?
The human-in-the-loop problem for security agents is the absence of required human confirmation before an autonomous agent escalates to higher-risk actions within a correctly scoped engagement. Scope configuration defines where an agent can operate; approval workflows determine when a human must confirm before the agent proceeds. A security agent that can chain exploits, extract credentials, and access adjacent systems has asymmetric risk across its action classes — and the appropriate oversight threshold differs by action class, not just by scope.

How much does AWS Security Agent cost to run?
AWS Security Agent charges $50 per task-hour. A small API test costs approximately $173; a full application penetration test costs up to $1,200 for a 24-hour engagement. AWS reports that customers are saving 70–90% compared to traditional external pen testing firms ($15,000–$50,000 per engagement). Cost governance — per-session ceilings and human review triggers when spend approaches a threshold — is not built into the default pricing model and needs to be implemented at the governance layer.

What documentation should compliance teams require from autonomous security agent deployments?
Compliance teams should require: (1) a record of who authorized each engagement and its defined scope, (2) an audit trail of approval gate triggers — specifically, which action classes required human sign-off and which were self-authorized by the agent, (3) evidence of what the agent did versus what it was explicitly approved to do, and (4) documentation of any scope expansions requested or rejected during the engagement. This is materially different from a traditional pen test report — it's the governance documentation layer that autonomous agents require on top of the technical findings.

Is AWS Security Agent safe to use in production environments?
AWS Security Agent is designed for security testing, not production traffic manipulation, but the governance gap around approval workflows applies regardless of environment labeling. The primary risk is not the agent's capability — it's the absence of required human confirmation before high-impact actions. In well-governed deployments, mandatory approval gates before exploit chaining and credential exposure steps limit the blast radius. In ungoverned deployments operating purely within scope configuration, the agent's autonomy extends to the full range of its permitted actions without intermediate human review.

If you're deploying autonomous agents — security or otherwise — and need approval gates that enforce before high-risk actions execute, not after they're logged, get early access to Waxell.

Sources

AWS, AWS Weekly Roundup: AWS DevOps Agent & Security Agent GA, Product Lifecycle updates, and more (April 6, 2026) — verified April 7, 2026 — https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-devops-agent-security-agent-ga-product-lifecycle-updates-and-more-april-6-2026/
AWS, AWS Security Agent on-demand penetration testing is now generally available (March 31, 2026) — verified April 7, 2026 — https://aws.amazon.com/about-aws/whats-new/2026/03/aws-security-agent-ondemand-penetration/
AWS Machine Learning Blog, AWS launches frontier agents for security testing and cloud operations — verified April 7, 2026 — https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/
AWS Security Blog, The Agentic AI Security Scoping Matrix: A framework for securing autonomous AI systems — verified April 7, 2026 — https://aws.amazon.com/blogs/security/the-agentic-ai-security-scoping-matrix-a-framework-for-securing-autonomous-ai-systems/
AWS Machine Learning Blog, Can your governance keep pace with your AI ambitions? AI risk intelligence in the agentic era — verified April 7, 2026 — https://aws.amazon.com/blogs/machine-learning/can-your-governance-keep-pace-with-your-ai-ambitions-ai-risk-intelligence-in-the-agentic-era/
MPT Solutions, AWS Frontier Agents: What $50/Hour Pen Testing and $30/Hour SRE Means for Platform Teams — verified April 7, 2026 — https://www.mpt.solutions/aws-frontier-agents-what-50-hour-pen-testing-and-30-hour-sre-means-for-platform-teams/
Gravitee, State of AI Agent Security 2026 Report: When Adoption Outpaces Control — verified April 7, 2026 — https://www.gravitee.io/blog/state-of-ai-agent-security-2026-report-when-adoption-outpaces-control
Tenable, 2026 Cloud Security and AI Security Risk Report — verified April 7, 2026 — https://www.tenable.com/blog/cloud-ai-research-report-2026-governance-vs-innovation

Agentic Governance, Explained

Waxell blog cover: Multi-Agent Governance Blind Spot

Your Multi-Agent System Has a Governance Blind Spot. Here's Where to Look.

Governing each agent individually isn't enough when they delegate to each other. The coordination layer — context handoffs, policy inheritance, trust boundaries — is where multi-agent incidents originate.

Logan Kelly

Apr 7, 2026

Waxell blog cover: ForcedLeak — Salesforce Agentforce's Prompt Injection Governance Lesson

ForcedLeak: What Salesforce Agentforce's CVSS 9.4 Exploit Reveals About AI Agent Governance

ForcedLeak exposed sensitive CRM data via a $5 domain purchase and a public web form. Here's the governance gap that made it possible — and what would have stopped it.

Logan Kelly

Apr 6, 2026

Waxell blog cover: AI Agent PII Protection — Detection vs. Prevention

PII Protection for AI Agents: Why Detection Is Not the Same as Prevention

Most teams detect PII after it enters the agent context window. Prevention blocks it before it reaches the LLM. Here's why you need both layers — and what most teams are missing.

Logan Kelly

Apr 6, 2026

Waxell blog cover: Indirect Prompt Injection — The Trusted Document Problem

The Trusted Document Problem: Why Indirect Prompt Injection Is Now Your AI Agent's #1 Security Risk

CIS and OWASP both ranked prompt injection as the top AI security risk. Here's why the threat is worse than most teams think — and why it comes from trusted documents, not user inputs.

Logan Kelly

Apr 3, 2026

Your Multi-Agent System Has a Governance Blind Spot. Here's Where to Look.

Logan Kelly

Apr 7, 2026

ForcedLeak: What Salesforce Agentforce's CVSS 9.4 Exploit Reveals About AI Agent Governance

ForcedLeak exposed sensitive CRM data via a $5 domain purchase and a public web form. Here's the governance gap that made it possible — and what would have stopped it.

Logan Kelly

Apr 6, 2026

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

Product

Company