The Protocol Will Not Save You

The Protocol Will Not Save You

On the NSA’s May 2026 MCP Security CSI, and the defensive and adversarial work I’ve been doing in this space.

In May 2026, the NSA published Model Context Protocol (MCP): Security Design Considerations for AI-Driven Automation — a Cybersecurity Information Sheet (CSI), the NSA’s format for unclassified operational guidance to network defenders. It is one of the first major government publications specifically targeting MCP. The framing is deliberate. NSA treats MCP the way they treated early web protocols. Flexible. Underspecified. Secure by implementation discipline, not by protocol guarantee.

For anyone building or defending agentic systems, this CSI is the most authoritative government statement on MCP threats currently available. The OWASP MCP Top 10 (in beta as of 2026) is the more comprehensive community-maintained threat catalog; the CSI’s distinct weight is that it is operational guidance from a national security agency, aimed at network defenders. Both deserve reading. This piece focuses on the CSI — what it says, and how it lines up with the defensive and adversarial work I’ve been doing in this space independently.

What the CSI is actually saying

Before getting to tooling, it’s worth being precise about what the CSI puts on the table. Three observations from it stand out as the most important for practitioners.

The protocol inversion. MCP reverses the familiar client-server pattern. Instead of clients requesting data from servers, MCP often expects servers to query and sometimes execute actions on behalf of the connected clients. The CSI names this directly and notes that the inversion creates attack paths that traditional threat models don’t trace.

Output poisoning is systemic. The CSI explicitly describes prompt injection via tool output as systemic rather than isolated across multi-agent pipelines. In plain terms: one agent’s output becomes another’s input, and a poisoned output can propagate through chained MCP processes.

The protocol will not save you. The CSI quotes the MCP specification directly: MCP “cannot enforce these security principles at the protocol level.” That’s the load-bearing sentence. There is no version of “deploy MCP correctly” that means “deploy MCP and the protocol handles the security.” Every control the CSI recommends — sandboxing, parameter validation, message signing, output filtering, audit logging — has to be implemented above the protocol, by the operator, deliberately.

I wrote about what implementing those controls deliberately actually looks like, in Building a Secure-By-Design AI Agent with MCP Tools. That piece focuses on agent-side defensive layers — input guard, tool authorizer, output filter — but also walks through hardening the MCP server itself: shell=False, canonical path resolution, tool descriptions treated as attack surface. The MCP server hardening section is the part that lines up most directly with the CSI’s recommendations on parameter validation, tool execution constraints, and tool description hygiene. The agent layers are the wider context — the system the CSI’s controls live inside.

A red-team tool in this space

The defensive side is one half of the picture. The other half is testing whether defensive controls actually hold against the threats they’re supposed to defend against. That is a separate problem and it needs separate tooling.

ai-redteam-orchestrator is a small, single-file, three-layer automated red-team pipeline for auditing LLMs and MCP tool servers. I built it before the CSI was published, as a general-purpose practitioner tool. It runs entirely locally against Ollama. You point it at your own MCP server, and it throws four industry-standard attack frameworks at the deployment.

I want to be precise about what it is and isn’t. It is a practitioner tool. It exists to give a single engineer a fast adversarial feedback loop on a development or staging MCP deployment. It is not a platform, not a SOC product, not a replacement for adversarial review by a real red team. It is what you reach for on a Saturday afternoon when you want to know whether your tool definitions can be coerced into doing something you didn’t intend.

The pipeline has three layers, and each one does a different job.

The broad scan layer is the wide net. It runs Garak’s vulnerability sweep alongside a Promptfoo evaluation pass against the deployment. This layer surfaces the obvious issues quickly. Unbounded inputs. Permissive tool descriptions. Missing input validation. The kind of finding that should never reach production but routinely does.

The compliance scan layer narrows in on documented vulnerability classes. It runs the Promptfoo OWASP LLM Top 10 preset against the LLM and pairs it with mcp-scan, which statically audits the MCP server’s tool descriptions for prompt-injection and tool-poisoning patterns. One caveat worth stating plainly: Promptfoo’s redteam generator is cloud-gated, so the OWASP preset is skipped unless you’ve run promptfoo auth login (free, one-time) — without it, this layer falls back to mcp-scan alone. Where Layer 1 casts the widest net, Layer 2 is scoped to known taxonomies — slower, more deliberate, and easier to map back to specific OWASP categories.

The deep exploit layer runs PyRIT’s Crescendo orchestrator — gradual multi-turn escalation toward an objective — and PyRIT’s Tree of Attacks with Pruning, a branching adversarial search that prunes unpromising paths. These are the kind of attacks where a static test would miss but a thinking adversary would find. They are also the slowest and most resource-hungry.

The layered methodology — Garak for the broad scan, Promptfoo for the compliance pass, PyRIT for the deep exploit, and the mapping of each framework to a class of attack — follows the approach laid out by Amine Raji; the orchestration, the single-file implementation, and the MCP-server integration are mine.

The whole thing is one file. You can read it end to end in under an hour. The repo is josephManzambi/ai-redteam-orchestrator.

Mapping the orchestrator to the CSI

The orchestrator was not designed against the CSI taxonomy — it predates this guidance — but its capabilities map onto a useful subset of it. The question that matters for a practitioner is: if I run this tool against my deployment, how much of the CSI threat catalog am I actually exercising? Here is the honest answer.

Each threat class below is marked by how directly the orchestrator exercises it — direct, partial, indirect, or not covered.

Tool parameter injection — direct coverage. The broad scan and deep exploit layers probe for unsanitized parameter handling and malformed message structures — the specific failure modes the CSI cites from the HiddenLayer research.

Output poisoning for downstream automation — partial coverage. Single-hop only. Garak’s latentinjection probes test susceptibility to poisoned input, and the PyRIT/Promptfoo cases test coercion into dangerous output — but the tool instantiates no downstream consumer, so the multi-agent chains and output propagation the CSI describes are not exercised. Deeper toxic-flow analysis requires chain instrumentation, not endpoint probing.

Tool invocation path confusion / naming collisions — not covered. Detecting this requires inspecting the MCP client’s tool-name resolution behavior — whether a malicious server can register a tool name that shadows or impersonates a trusted one, and whether the client routes calls correctly. That is a client-side question, not a server-side probing question, so endpoint adversarial probing does not exercise it. A meaningful gap.

Token and session security — not covered. The CSI’s specific concern is that OAuth 2.1 expiry and rotation are recommended but not enforced at protocol level. Verifying that for a given deployment is conventional API security work — token lifecycle audits, replay testing — not adversarial probing of the LLM surface. Out of scope by design.

Denial of service and fatigue-based techniques — not covered. The deep exploit layer’s PyRIT orchestrators are bounded conversational attacks — Crescendo runs 8 turns, TAP a ~24-call tree — not volume or fatigue testing. This is not a load-testing tool. The CSI’s “lethargy” concern needs purpose-built tooling at scale.

Misconfigurations and poor implementation — indirect coverage. Surfaces the symptoms — over-permissive tool definitions, missing input validation — but does not audit configuration directly. mcp-scan covers some of this statically. A separate audit problem overall.

The pattern in the coverage map is intentional. The orchestrator targets the threat classes where adversarial probing of the LLM-plus-tool surface is the right testing approach. It deliberately does not try to cover the threat classes where the right approach is conventional API security testing, infrastructure auditing, or chain-level instrumentation. A tool that tries to cover everything covers nothing well.

What’s been validated so far

I have run the pipeline end-to-end against an intentionally vulnerable MCP server I built as a test target — shell=True command injection, raw path traversal, the standard sins. The run completes, the four frameworks fire as expected, and the report surfaces findings against the known-vulnerable surface. The classifier produces a coherent severity table; the test suite is green.

What I do not have yet is a characterized finding set — what recurs, what the severity distribution looks like across deployments, which CSI threat classes actually trip and how often. That requires running the orchestrator against more than one target and writing up the patterns honestly. A dedicated post on that will follow.

For now, the claim is narrower and truer: the tool works, it covers the threat classes the table above describes, and the gap a tool fills that a policy document cannot is making those threats findable in your deployment, by you, in an afternoon.

Known limitations

A few things worth being upfront about before anyone runs this in anger.

The orchestrator currently uses the same model as target, adversarial attacker, and scorer in the PyRIT layer. A 3B open-weight model red-teaming itself is a weak attacker compared to a frontier model in the adversarial role. Layer 3 results should be read as a floor on the attack surface, not a ceiling.

The classifier that produces the severity table is mixed. mcp-scan findings are parsed structurally from the JSON severity counts it emits. The heuristic pattern matching applies to the LLM-output steps — Garak, Promptfoo, and PyRIT — where it works against tool output with word-boundary checks and negation guards to avoid false positives on phrases like “0 failures” or “no vulnerabilities detected.” That is more precise than substring matching, but it is still heuristic. Treat the severity for those steps as a starting point for review, not a verdict.

The tool is v0.1. It is stable enough to use, the test suite is green, and it does what it claims to do within the scope of the coverage map above. It is not finished. There is work to do on parsing structured tool output rather than console text, on configurable adversarial models, and on tighter integration with MCP-specific scanners. Those are follow-ups, not blockers.


What this is

The CSI is the most authoritative government publication on MCP threats currently available, and it is correct on the threats it names. Reading it is necessary. Acting on it is necessary. And acting on it means more than implementing the recommended defensive controls — it means having a way to test whether those controls actually hold up against the threats they’re supposed to defend against.

Two pieces of work in this space, on either side of that line, both already in progress when the CSI landed: a defensive walkthrough that implements OWASP, NIST, and CSA prescriptions deliberately, and a small red-team pipeline that exercises the LLM-plus-tool surface against industry-standard attack frameworks. Neither was built against the CSI taxonomy. Both end up exercising a useful subset of what it names. The convergence is the interesting part — not the framing.

The repo is at github.com/josephManzambi/ai-redteam-orchestrator. The defensive companion piece is at manzambi.com/writing/secure-by-design-agentic-v1. The NSA CSI is at nsa.gov.

The rest is execution.


Joseph Manzambi is a Cloud and AI Security Architect based in Málaga. He writes periodically on AI security. manzambi.com/writing.

Powered by Buttondown.