Coding Agents Degrade Sandboxes to Security Theater

Coding Agents Degrade Sandboxes to Security Theater

Sandboxing a coding agent sounds like a security primitive. It is often assumed that sandboxing your coding agent will prevent wiping your home directory or reading ~/.ssh keys. That is incorrect.

The sandbox gap

Most coding harnesses ship with sandboxing support (Codex, Claude Code, Cursor, Gemini CLI, Goose), but no two agree on what “sandboxed” means. And the security primitives they provide are weaker than one would hope.

Claude Code for example only sandboxes bash tool calls; the native write_file and read_file tools run with full system access. Gemini and Codex CLI both sandbox all tool calls, yet not the agent process itself. Cursor sandboxes the full subprocess tree spawned from terminal calls, but reads of sensitive files like ~/.ssh stay accessible to the agent.

The natural response is to sandbox everything: the agent process, all tool calls, the full subprocess tree. That pushes you toward containerization or virtualization, which dramatically impacts the developer experience. Bubblewrap and Seatbelt require extensive configuration to be useful. Each repo ends up with its own long list of mounts and allowed binaries: Node, Python, the package manager, the compiler, the test runner, the language server, the parent directory the agent keeps wandering into. By the time the sandbox runs a real developer workflow, you have punched so many holes in it that you are sharing more of the host than you are isolating.

Coding agents reason

Sandboxes are a pre-AI primitive. Put the dangerous thing in a box, restrict what the box can touch, and call the risk contained. The model assumes the thing in the box doesn’t reason about the box itself.

Coding agents do. A sandbox restricts access; it does nothing to govern what the agent decides to do within those restrictions, or how it routes around them. When a bash tool call is blocked, an agent will write a Python script that tries to accomplish the same goal. This holds even when you deploy kernel-level security policies. Ona demonstrated an agent disabling the sandbox and later bypassing kernel-level verification hooks.

The same reasoning lets agents exploit gaps between sandboxed and unsandboxed tool calls. In Claude Code, the native Write and Edit tools run outside the bubblewrap sandbox entirely. An agent that is blocked from writing a config file through bash can write the same file through the Write tool and the sandbox never sees it. This class of gap already produced a CVSS 10.0 vulnerability.

What it actually takes

Closing that gap means combining three layers at once.

At the kernel level: every file open, every process spawn, every syscall the agent’s process tree makes. This is ground truth. The agent can lie about what it intends to do. It can route around application-level checks. It cannot fake a syscall. If the kernel says the agent wrote to ~/.ssh/config, it wrote to ~/.ssh/config.

At the network layer: every outbound connection, every DNS lookup, every payload leaving the machine. A sandbox that blocks filesystem access but leaves the network open is a data exfiltration pipeline. Traffic monitoring catches what the kernel layer alone cannot: slow exfiltration over DNS subdomains, callbacks to attacker-controlled servers, credential leaks to redirected API endpoints.

At the agent application layer: the sequence of tool calls, the reasoning behind them, the drift between stated intent and observed behavior. An agent that reads ~/.ssh/id_rsa and then calls a network tool is exhibiting a pattern that neither the kernel nor the network layer understands in context. Intent monitoring connects the dots across individual operations.

No single layer does this alone. Not a sandbox. Not a VM. Not a proxy. Not even a kernel-level policy. It takes defense in depth: layers that complement each other, where bypassing one is caught by another.

The foundation of that defense is kernel-level enforcement scoped to the agent process tree. Kernel-level enforcement protects a file regardless of which tool or code path tries to write it. It doesn’t matter whether the write comes from bash, the native Write tool, a Python subprocess, or the dynamic linker. If the kernel denies the write, the write fails. Crucially, these policies target the agent, not the developer. The developer can still edit their own config files, SSH keys, and shell profiles the way they always have. The agent cannot. These primitives already exist. They are just not wired into any coding agent harness today.

Sandboxing isn’t wrong. It’s just not sufficient when you want to contain coding harnesses.

Share this article

Start securing your agents

Get comprehensive security coverage for your AI agents with a single integration.