Threats · Deep dive

Indirect prompt injection in tool-using agents

When an agent reads attacker-controlled content, that content can become instructions. The anatomy, the blast radius, and the controls that contain it.

Read time: 14 min
Threat coverage: LLM01 · Prompt injection
Frameworks: OWASP LLM · MITRE ATLAS · NIST AI RMF
Audience: Security architects · AppSec

Tool-using agents blur a line that classic application security kept sharp: the boundary between data and instructions. The moment retrieved content can steer what an agent does, every document it reads becomes a potential control channel.

Anatomy of the attack

An agent retrieves a web page, a support ticket, or an email and folds it into its context window. If that content carries an instruction — Prompt injection — the model may follow it, invoking tools with the agent's own privileges. The model has no innate way to tell a trusted system instruction from a hostile sentence pasted into the same context. The defense is a matter of Input isolation and tool scoping, not better prompt wording.

Untrusted content is treated as data, never as a control channel.
Every tool call requires an allow-list and per-call authorization.
Hidden text, encoded payloads, and active content are stripped on ingest.

Blast radius

The reachable systems behind an agent's credentials define the damage a hijack can do. An agent that runs as a broad service identity turns a text manipulation into action across every system that identity can touch. Scope is the lever: the narrower the credential, the smaller the blast radius.

Fig 1 — agent → credential → reachable systems.

Worked scenario

Support agent → ticket exfiltration

Objective: Read another tenant's support tickets.
Path: Poisoned ticket body → injected instruction → over-scoped search tool.
Impact: Cross-tenant data disclosure.
Detection: Tool call referencing a tenant outside the session's scope.
Mitigation: Per-session tenant binding enforced at the tool boundary.

Controls that hold

Quarantine untrusted content into a non-instruction context the model treats as data.
Gate every tool call behind explicit, per-call authorization tied to the session.
Bind sensitive tools to the requesting principal so reach can't exceed the session's scope.

Framework mapping

Control	OWASP LLM	NIST AI RMF
Input isolation	LLM01	Measure 2.7
Tool scoping	LLM06	Manage 1.3
Per-session binding	LLM06	Manage 2.2

Checklist

Untrusted content is never concatenated into the instruction context.
Every tool has an allow-list and per-call authorization.
Agent credentials are scoped to the task's minimum reach and short-lived.
Tool calls that cross the session's tenant/scope are detected and alerted.

Industries

Indirect prompt injection in tool-using agents

Anatomy of the attack

Blast radius

Worked scenario

Controls that hold

Framework mapping

Checklist

Put the research to work.

Keep reading.

Securing Agentic AI: Why Autonomy Changes the Risk Model

Prompt Injection: The #1 Risk Every AI Product Team Must Understand

The New AI Attack Surface: Model, RAG, Tools, Memory, Identity