Securing LLM Agents - The Security Trifecta in Dokugent Plans¶

Why AI Workflow Security Matters¶

When agents can access private data, process untrusted content, and call external APIs — you have a recipe for chaos. This is what Simon Willison recently called the “lethal trifecta” of agent security risks — and Andrej Karpathy just boosted the signal.

“I should clarify that the risk is highest if you're running local LLM agents (e.g. Cursor, Claude Code, etc.). If you're just talking to an LLM on a website (e.g. ChatGPT), the risk is much lower unless you start turning on Connectors.”

“For example I just saw ChatGPT is adding MCP support. This will combine especially poorly with all the recently added memory features — e.g. imagine ChatGPT telling everything it knows about you to some attacker on the internet just because you checked the wrong box in the Connectors settings.”

— Andrej Karpathy

At Dokugent, we treat this as a first-class design problem, not an afterthought.

Security Metadata at the Plan Level¶

As of June 2025, every step in a Dokugent plan now supports a security object:

"security": {
  "trifecta": ["privateData", "untrustedContent"],
  "riskLevel": "High"
}

This metadata is collected interactively when designing an agent via dokugent plan, and is embedded into the resulting plan.json. This same plan.json also forms the core of the final signed .cert.json file used in Dokugent's certification system — enabling cryptographic traceability without requiring a blockchain layer.

How It Works in the CLI

During the plan wizard:

Involves any of the following? (space to select) [x] Access to Private Data [x] External Communication

During dokugent simulate, this gets evaluated automatically:

⚠️ Step "web_lookup" is marked HIGH RISK due to:

untrustedContent
externalComms

→ Skipping this step unless --force is passed.

Why This Matters

Just like secure coding practices, secure agent design needs to be traceable, auditable, and testable. Dokugent enables that by:

Making risks explicit
Logging metadata
Allowing simulation to enforce policy (or warn)
Enabling future plugins to gate deployment

What’s Next

This is just the beginning. We’re currently experimenting with:

--secure mode in simulate to skip or sandbox risky steps
Security scoring in dokugent certify
A future security --doctor command to lint entire agent workflows

Try It Yourself

npx dokugent init dokugent plan dokugent simulate

Agent workflows should be safe by default and traceable when they aren’t.