Every new wave of tooling gets its cowboy phase.
In the early days of self-hosting, that phase looked like SSHing into a production box as root, editing a config file live, restarting the service, and hoping the logs looked friendly. If it worked, you felt brilliant. If it failed, you spent the night reconstructing your own decisions from shell history and vibes.
We eventually learned. Not perfectly, but enough. Version control, least privilege, staging environments, maintenance windows, backups, rollbacks, audit logs, and pull requests are not fancy rituals. They are scar tissue turned into procedure.
AI agents are now having their own cowboy phase.
The fashionable goal is autonomy: give the model a filesystem, a terminal, a browser, an API token, maybe an MCP server or three, then ask it to go solve the problem. When something goes sideways, the first instinct is to add more prompt text. Be careful. Never delete files. Ask first. Do not break production. Think step by step. You can feel the prompt slowly turning into a nervous employee handbook.
That is backwards. If the agent can touch real systems, the safety boundary cannot live mostly in prose.
AI agents do not need bigger prompts as much as they need change control.
Excessive Agency Is Not a Vibe
OWASP’s GenAI Security Project has a useful name for the problem: LLM06:2025 Excessive Agency.
The risk is not simply that a model might hallucinate. OWASP frames excessive agency as a system design failure where damaging actions become possible because an LLM-based application has too much functionality, too many permissions, or too much autonomy. The trigger might be prompt injection, bad model output, a compromised extension, a malicious peer agent, or an ordinary ambiguous request. The root issue is the same: the agent has been allowed to do more than the task actually requires.
That distinction matters.
If an agent with a mail tool summarizes your inbox, a bad summary is annoying. If the same agent can read every message and send outbound email without review, a prompt-injected newsletter can become an exfiltration path. If an agent needs to inspect a repository, read-only access may be enough. If it also gets write, delete, secret access, package publishing, and production deploy rights, you have not made it more capable. You have widened the blast radius.
The boring security answer is still the right one: minimize the tools, minimize the functions exposed by those tools, minimize downstream permissions, avoid open-ended shell-like interfaces where possible, run actions in the user’s real security context, and require approval for high-impact actions.
That sounds like ops because it is ops.
MCP Makes The Tool Boundary Visible
The Model Context Protocol is often described as a way to connect AI applications to tools and data. That is true, but the more important operational shift is visibility.
The current MCP specification describes a standard way for hosts, clients, and servers to share context, expose tools, and build composable integrations. It also says the uncomfortable part plainly: MCP enables powerful capabilities through arbitrary data access and code execution paths. The protocol is not just a convenience layer. It is a boundary where trust decisions accumulate.
The MCP Tools specification is especially relevant here. Tools can be discovered and invoked by language models, and servers can notify clients when the tool list changes. The spec also says applications should make exposed tools clear, show visual indicators when tools are invoked, and present confirmation prompts so a human can deny operations.
That turns the tool list into a change-control surface.
Before an agent starts work, you should be able to answer:
- Which servers are connected?
- Which tools are visible to the model?
- Which tools are read-only?
- Which tools can write, delete, send, deploy, purchase, publish, or change permissions?
- Which tools require approval?
- What happens if the server’s tool list changes mid-session?
If you cannot answer those questions, you do not have an agent platform. You have a guessing machine with credentials.
Prompts Are Not Policy
Prompts are useful. They set goals, style, sequence, and expectations. They can reduce mistakes. They can make an agent behave more like a careful teammate and less like a caffeinated autocomplete.
But prompts are not policy.
A prompt that says “do not delete files” is weaker than a filesystem permission that prevents deletion. A prompt that says “do not send email without asking” is weaker than an email tool that requires a confirmed approval token before send. A prompt that says “stay inside this repository” is weaker than a sandbox that only mounts that repository read-only.
OpenAI’s MCP and connector guidance makes the same practical point from a platform angle. Remote MCP servers and connectors can let models access third-party services and take actions in them. That creates prompt-injection risk, sensitive-data exposure risk, third-party server risk, and risk from tool behavior changing unexpectedly. The recommended controls are not mystical: filter allowed tools, connect to trusted servers, require approvals for sensitive actions, and log or review the data shared with MCP servers.
In other words: do not ask the model to be the access-control system.
The Agent Change-Control Stack
For a small team, a self-hoster, or a solo operator, agent change control does not need to become enterprise theater. It can be a tight checklist.
1. Scope The Workspace
An agent should not run as your full user account unless the task genuinely requires that identity. For code work, use a disposable checkout or a working tree with a clear diff. For system work, use a constrained service account. For web operations, prefer draft or staging access over production access.
The rule is simple: the agent gets the smallest workspace that can complete the job.
Good scopes look boring:
- read-only source tree for review;
- one writable branch or artifact directory for edits;
- no access to secrets unless the task explicitly requires them;
- no production API token for draft-only work;
- no broad filesystem access for a narrow file operation.
2. Inventory The Tools
Before the run starts, capture the tool manifest. That can be as simple as a JSON file or Markdown table:
- server name;
- tool name;
- read/write/destructive classification;
- permission scope;
- approval mode;
- expected data exposure;
- owner or maintainer;
- last reviewed date.
This is not paperwork for its own sake. It is how you notice that yesterday’s harmless summarizer now has a delete_document tool, or that a “read repository” connector is using a token with write access.
MCP’s tool discovery makes this more practical, but discovery alone is not governance. Someone still has to decide which discovered tools should be available for the task.
3. Gate Side Effects
High-impact actions need approval outside the model’s judgment.
That includes:
- deleting or overwriting files;
- sending email or messages;
- publishing posts;
- deploying code;
- changing DNS, users, roles, billing, or permissions;
- running commands outside a declared allowlist;
- accessing private data unrelated to the task.
The approval should show the action, target, arguments, and expected effect. “Approve tool call” is not enough if the user cannot see what the tool call will do.
This is where human-in-the-loop is not a philosophical stance. It is a circuit breaker.
4. Keep A Useful Audit Trail
If an agent breaks something, “the AI did it” is not a postmortem.
You need enough trace to reconstruct:
- the user’s request;
- the instructions active for the run;
- the tools available;
- the tool calls made;
- the arguments passed;
- the files or remote resources changed;
- the approvals granted or denied;
- the final diff or published output.
You do not need to store every private token, every irrelevant page, or every sensitive document the agent saw. You do need a record that supports debugging, rollback, and accountability.
NIST’s Generative AI Profile for the AI Risk Management Framework is useful here because it pushes teams to manage generative AI risks across the lifecycle, aligned with their goals and priorities. For agents, that lifecycle includes tool onboarding, permissions review, live execution, monitoring, incident response, and retirement. A one-time prompt review is not a lifecycle.
5. Define Rollback Before The Run
The worst time to invent rollback is after the agent has touched production.
For code, rollback might be a Git branch, a patch file, and a test command. For WordPress, it might be a draft, a revision, a database backup, or a known previous page body. For infrastructure, it might be an exported config, a snapshot, or a documented command to restore the previous state.
If you cannot name the rollback path, the action probably needs a smaller scope.
A Practical Agent Runbook
Here is a compact version for everyday work:
- State the goal in one sentence.
- Name the workspace or target system.
- Load only the tools needed for that job.
- Classify tools as read-only, write, destructive, external-send, or privileged.
- Require approval for anything beyond read-only and low-risk writes.
- Save a pre-change snapshot, branch, export, or draft.
- Run the agent.
- Review the diff, artifact, or proposed action.
- Apply the change only after review.
- Verify the result from outside the agent’s own output.
- Record what changed and how to roll it back.
That might sound slower than letting an agent rip through the task on its own. In practice, it is often faster, because it prevents the expensive part: cleaning up a confident mess.
What Not To Automate Yet
Some actions are still bad candidates for unattended agents.
Do not let an agent freely manage identity, payment, DNS, production publishing, legal communications, customer emails, destructive filesystem operations, or security boundary changes unless the surrounding system has strong controls and the action is routine enough to test.
The more public, irreversible, privileged, expensive, or reputation-affecting the action is, the more the control should live outside the model.
That does not mean agents cannot help. They can draft, inspect, propose, prepare, simulate, diff, summarize, and queue. But there is a big difference between “prepare a DNS change for review” and “change DNS because the prompt sounded right.”
The Goal Is Not Slower AI
Change control is not anti-agent. It is what lets agents become useful without turning every workflow into a trust fall.
Good controls make autonomy more meaningful because they define where autonomy is allowed. Inside a read-only analysis sandbox, let the agent roam. Inside a branch with tests, let it edit. Inside a draft post, let it revise. At the boundary to production, publishing, payments, identity, or external communication, slow down and ask for proof.
The mature version of agentic AI will not be the system with the longest prompt. It will be the system where permissions are visible, tools are scoped, approvals are boring, logs are useful, and rollback is not a heroic act.
That is not glamorous.
It is how real systems survive contact with real work.