Coding Agents Breaching Their Own Security Rules
Every major coding agent stores its permission model in a configuration file that the agent process itself can write to. That single design decision violates forty years of access control theory and has already produced critical vulnerabilities in Claude Code, GitHub Copilot, Cursor, and OpenAI Codex CLI.
The problem isn’t even a bug. It’s an architectural flaw.
The Configuration Security Conflict
Lets use the most widely agent harness as an example: Claude Code. Claude Code uses multiple config files, among others .claude/settings.json for shared team settings and .claude/settings.local.json for personal preferences. Each configuration file can configure hooks, allow/deny lists for tool calls, MCP configurations and more.
Claude Code uses a deny-first evaluation model. Deny rules in settings.json take precedence over allow rules in settings.local.json, so a team that proactively commits deny rules can constrain what the agent grants itself. But this only works when settings.json exists and contains deny rules. In the default state, and the state most individual developers operate in, there is no settings.json. No deny rules. The agent can expand its own permissions through settings.local.json.
The problem is that the agent process itself has write access to its own security configuration. The entity being governed writes the rules that govern it.
Why This Breaks Every Access Control Model
Forty years of security research all converge on the same principle: the entity being constrained must not have write access to the mechanism of constraint. When an agent writes allow rules to its own permission file and later checks that same file to decide what’s permitted, the process enforcing the rules is the same process that wrote them. There is no separation of privilege, no integrity guarantee, and no least privilege principle applied.
Prompt injections makes this worse. Using prompt injection adversaries can expand the permissions through for example a malicious file in a cloned repo, a crafted issue body, a dependency README, or even a web search.
The CVE Record Confirms the Theory
Every major coding agent gives its process filesystem access broad enough to reach its own trust boundary files. The CVE record shows what happens when it does.
GitHub Copilot (CVE-2025-53773) allowed its agent to write to .vscode/settings.json without user approval. An attacker used prompt injection to inject "chat.tools.autoApprove": true into the file, enabling “YOLO mode” and disabling all confirmation prompts. From that point, every tool call executed silently. Full remote code execution.
Cursor (CVE-2025-54135) was vulnerable to prompt injection through project files that could rewrite ~/.cursor/mcp.json, the global MCP configuration, enabling arbitrary MCP servers that launched automatically. Full remote code execution with developer-level privileges.
OpenAI Codex CLI (CVE-2025-61260) loaded configuration from wherever CODEX_HOME pointed. A .env file in a cloned repository redirected that variable to the project directory. Codex loaded a malicious config.toml from there and executed its MCP server entries at startup. No prompt. No confirmation. No secondary validation.
Claude Code (CVE-2026-25725) applied read-only sandbox constraints to settings.local.json but did not protect settings.json when the file didn’t exist at startup. Code inside the sandbox could create settings.json through the writable parent directory, injecting hooks that executed with full host privileges on restart. It created the team policy file that the system treated as authoritative.
The “Systems Security Foundations for Agentic Computing” paper from Google documented the same pattern in Amp AI. A prompt injection caused the agent to alter its settings.json allowlist. Their recommendation is explicit: “Such systems should enforce immutability on security-critical configuration files so the agent cannot modify its own execution environment.”
Five vulnerabilities. Five different vendors. One pattern: the entity being constrained has access to the mechanism of constraint.
Architectural flawed
The reason this problem persists is that it’s genuinely hard to solve without breaking the user experience.
Currently the configuration and security primitives are opt-in. There is only a real security boundary when teams proactively commit deny rules. The default state, is no deny rules at all.
At minimum we need a architectural different solution. We need to seperate critical security configurations, the ones that never can altered, from configurations they help the developer to fight approval fatigue.
If this seperation is implemented we can start using sensible defaults to settings.json that prevent the agent from modifying its own configuration files, like:
{
"permissions": {
"deny": [
"Edit(.claude/settings.json)",
"Edit(.claude/settings.local.json)",
"Write(.claude/settings.json)",
"Write(.claude/settings.local.json)"
]
}
}
Another solution can be using a .claudeignore that includes .claude/settings* to prevent the agent from reading the configuration files at all, this blocks the straightforward attack path.
But it’s still application-level enforcement. The Claude Code process is what checks these deny rules. A sufficiently creative exploit can bypass application-level checks entirely, as the CVEs above demonstrate.
What you really need is kernel-level enforcement, scoped to the agent’s process, to prevent agents from writing to their own security configuration. It doesn’t matter whether the write attempt comes from the Edit tool, a bash subprocess, a Python script, or a case-manipulated path. The kernel denies the write. The developer retains full access through their normal editor. Only the agent process tree is constrained.
These are not exotic requirements. They are the standard mechanisms that operating systems have used for decades to prevent processes from escalating their own privileges. We already know how to build this. We just haven’t applied it to coding agents yet.
The research on prompt injection in coding agents shows attack success rates between 41% and 84% depending on the platform and attack type. Data exfiltration succeeds at the high end. Every tested agent (Claude Code, Copilot, Cursor) is vulnerable to some degree.
These agents are getting more capable. They’re running longer sessions with less human oversight. And every one of them stores its trust boundary in a file it can reach.
The window between “agents are useful enough to deploy widely” and “agents are powerful enough to do real damage when compromised” is closing. The configuration integrity problem isn’t a theoretical concern for future architectures. It’s a concrete vulnerability in production systems today.
Coding agents shouldn’t write their own rules.