AI Agent Security: Two Architectures That Stop the Blast ...

The AI Agent Security Crisis Hiding in Plain Sight

Learn more about on-device ai inference: the ciso's new blind spot

Seventy-nine percent of organizations already use AI agents, according to PwC's 2025 AI Agent Survey. Only 14.4% reported full security approval for their entire agent fleet.

The gap between deployment velocity and security readiness has become what the Cloud Security Alliance calls a governance emergency.

Four separate RSAC 2026 keynotes arrived at the same conclusion without coordinating. Microsoft's Vasu Jakkal told attendees that zero trust must extend to AI. Cisco's Jeetu Patel called for a shift from access control to action control, saying agents behave "more like teenagers, supremely intelligent, but with no fear of consequence."

CrowdStrike's George Kurtz identified AI governance as the biggest gap in enterprise technology. Splunk's John Morgan called for an agentic trust and governance model.

The problem is structural. AI agent credentials live in the same box as untrusted code. A prompt injection gives the attacker everything: OAuth tokens, API keys, git credentials.

The blast radius is not the agent but the entire container and every connected service.

Then two companies shipped architectures that answer the question differently. The gap between their designs reveals where the real risk sits and what security teams must audit now.

Why Do Default Agent Patterns Create Security Liabilities?

The default enterprise agent pattern is a monolithic container. The model reasons, calls tools, executes generated code, and holds credentials in one process. Every component trusts every other component.

Matt Caulfield, VP of Product for Identity and Duo at Cisco, put it bluntly in an exclusive VentureBeat interview at RSAC. "While the concept of zero trust is good, we need to take it a step further. It's not just about authenticating once and then letting the agent run wild."

"It's about continuously verifying and scrutinizing every single action the agent's trying to take, because at any moment, that agent can go rogue."

A CSA and Aembit survey of 228 IT and security professionals quantifies how common this remains:

43% use shared service accounts for agents
52% rely on workload identities rather than agent-specific credentials
68% cannot distinguish agent activity from human activity in their logs
No single function claimed ownership of AI agent access

For a deep dive on trump officials push banks to test anthropic's mythos model, see our full guide

Security said it was a developer's responsibility. Developers said it was a security responsibility. Nobody owned it.

CrowdStrike CEO George Kurtz highlighted ClawHavoc at RSAC during his keynote. The supply chain campaign targeted the OpenClaw agentic framework. Antiy CERT confirmed 1,184 malicious skills tied to 12 publisher accounts.

For a deep dive on iphone 18 pro deep red color likely as rivals prep shade, see our full guide

Average breakout time has dropped to 29 minutes. Fastest observed: 27 seconds, according to CrowdStrike's 2026 Global Threat Report.

How Does Anthropic Separate the Brain from the Hands?

Anthropic's Managed Agents, launched April 8 in public beta, split every agent into three components that do not trust each other. The architecture includes a brain (Claude and the harness routing its decisions), hands (disposable Linux containers where code executes), and a session (an append-only event log outside both).

Credentials never enter the sandbox. Anthropic stores OAuth tokens in an external vault.

When the agent needs to call an MCP tool, it sends a session-bound token to a dedicated proxy. The proxy fetches real credentials from the vault, makes the external call, and returns the result. The agent never sees the actual token.

For security directors, this means a compromised sandbox yields nothing an attacker can reuse. Single-hop credential exfiltration is structurally eliminated.

The security gain arrived as a side effect of a performance fix. Anthropic decoupled the brain from the hands so inference could start before the container booted. Median time to first token dropped roughly 60%.

The zero-trust design is also the fastest design. That kills the enterprise objection that security adds latency.

Session durability is the third structural gain. A container crash in the monolithic pattern means total state loss. In Managed Agents, the session log persists outside both brain and hands.

If the harness crashes, a new one boots, reads the event log, and resumes.

Pricing: $0.08 per session-hour of active runtime, idle time excluded, plus standard API token costs. Security directors can now model agent compromise cost per session-hour against the cost of the architectural controls.

How Does Nvidia Lock Down and Monitor AI Agents?

Nvidia's NemoClaw, released March 16 in early preview, takes the opposite approach. It does not separate the agent from its execution environment. It wraps the entire agent inside four stacked security layers and watches every move.

NemoClaw stacks five enforcement layers between the agent and the host. Sandboxed execution uses Landlock, seccomp, and network namespace isolation at the kernel level. Default-deny outbound networking forces every external connection through explicit operator approval via YAML-based policy.

A privacy router directs sensitive queries to locally-running Nemotron models, cutting token cost and data leakage to zero.

The layer that matters most to security teams is intent verification. OpenShell's policy engine intercepts every agent action before it touches the host. The agent does not know it is inside NemoClaw.

In-policy actions return normally. Out-of-policy actions get a configurable denial.

Observability is the strongest layer. A real-time Terminal User Interface logs every action, every network request, every blocked connection. The audit trail is complete.

The problem is cost: operator load scales linearly with agent activity. Every new endpoint requires manual approval.

The trade-off for organizations evaluating NemoClaw is straightforward. Stronger runtime visibility costs more operator staffing. Observation quality is high. Autonomy is low.

That ratio gets expensive fast in production environments running dozens of agents.

What Is the Credential Proximity Gap?

Both architectures are a real step up from the monolithic default. Where they diverge is the question that matters most to security teams: how close do credentials sit to the execution environment?

Anthropic removes credentials from the blast radius entirely. If an attacker compromises the sandbox through prompt injection, they get a disposable container with no tokens and no persistent state. Exfiltrating credentials requires a two-hop attack: influence the brain's reasoning, then convince it to act through a container that holds nothing worth stealing.

NemoClaw constrains the blast radius and monitors every action inside it. Four security layers limit lateral movement. Default-deny networking blocks unauthorized connections.

But the agent and generated code share the same sandbox. Nvidia's privacy router keeps inference credentials on the host, outside the sandbox. But messaging and integration tokens (Telegram, Slack, Discord) are injected into the sandbox as runtime environment variables.

Credentials are policy-gated, not structurally removed. That distinction matters most for indirect prompt injection, where an adversary embeds instructions in content the agent queries as part of legitimate work. A poisoned web page. A manipulated API response.

In the Anthropic architecture, indirect injection can influence reasoning but cannot reach the credential vault. In the NemoClaw architecture, injected context sits next to both reasoning and execution inside the shared sandbox. That is the widest gap between the two designs.

CrowdStrike CTO Elia Zaitsev, in an exclusive VentureBeat interview, said the pattern should look familiar. "A lot of what securing agents look like would be very similar to what it looks like to secure highly privileged users. They have identities, they have access to underlying systems, they reason, they take action."

"There's rarely going to be one single solution that is the silver bullet. It's a defense in depth strategy."

What Must Security Teams Audit Now?

The zero-trust architecture audit for AI agents distills to five priorities:

Audit every deployed agent for the monolithic pattern. Flag any agent holding OAuth tokens in its execution environment. The CSA data shows 43% use shared service accounts. Those are the first targets.

Require credential isolation in agent deployment RFPs. Specify whether the vendor removes credentials structurally or gates them through policy. Both reduce risk. They reduce it by different amounts with different failure modes.

Test session recovery before production. Kill a sandbox mid-task. Verify state survives. If it does not, long-horizon work carries a data-loss risk that compounds with task duration.

Staff for the observability model. Anthropic's console tracing integrates with existing observability workflows. NemoClaw's TUI requires an operator-in-the-loop. The staffing math is different.

Track indirect prompt injection roadmaps. Neither architecture fully resolves this vector. Anthropic limits the blast radius of a successful injection. NemoClaw catches malicious proposed actions but not malicious returned data.

Require vendor roadmap commitments on this specific gap.

Why Should Organizations Act Now?

A CSA survey presented at RSAC found that only 26% have AI governance policies. The 65-point gap between deployment velocity and security approval is where the next class of breaches will start.

Snyk's ToxicSkills research found that 36.8% of the 3,984 ClawHub skills scanned contain security flaws at any severity level. 13.4% rated critical. The monolithic default is not a theoretical risk.

It is an inherited liability sitting in production environments right now.

Zero trust for AI agents stopped being a research topic the moment two architectures shipped. Security teams have a decision matrix: structural credential removal versus policy-gated constraint. The choice depends on autonomy requirements, operator capacity, and risk tolerance.

Both approaches beat the monolithic default. Neither fully solves indirect prompt injection. The vendors shipping solutions acknowledge the gap and are iterating.

Organizations deploying agents today need to audit the credential proximity question before the next supply chain campaign targets their agent fleet.

Continue learning: Next, explore play switch 2 games on imac display at 4k resolution

The window between deployment and breach is measured in minutes, not months. The architecture audit cannot wait for the governance policy to catch up.