AcademyResourcesCompanyResearchBook a demo ↗
Threats · Deep dive

Indirect prompt injection in tool-using agents

When an agent reads attacker-controlled content, that content can become instructions. The anatomy, the blast radius, and the controls that contain it.

Read time
14 min
Threat coverage
LLM01 · Prompt injection
Frameworks
OWASP LLM · MITRE ATLAS · NIST AI RMF
Audience
Security architects · AppSec

Tool-using agents blur a line that classic application security kept sharp: the boundary between data and instructions. The moment retrieved content can steer what an agent does, every document it reads becomes a potential control channel.

Anatomy of the attack

An agent retrieves a web page, a support ticket, or an email and folds it into its context window. If that content carries an instruction — Prompt injection — the model may follow it, invoking tools with the agent's own privileges. The model has no innate way to tell a trusted system instruction from a hostile sentence pasted into the same context. The defense is a matter of Input isolation and tool scoping, not better prompt wording.

  • Untrusted content is treated as data, never as a control channel.
  • Every tool call requires an allow-list and per-call authorization.
  • Hidden text, encoded payloads, and active content are stripped on ingest.

Blast radius

The reachable systems behind an agent's credentials define the damage a hijack can do. An agent that runs as a broad service identity turns a text manipulation into action across every system that identity can touch. Scope is the lever: the narrower the credential, the smaller the blast radius.

Fig 1 — agent → credential → reachable systems.

Worked scenario

Support agent → ticket exfiltration

Objective
Read another tenant's support tickets.
Path
Poisoned ticket body → injected instruction → over-scoped search tool.
Impact
Cross-tenant data disclosure.
Detection
Tool call referencing a tenant outside the session's scope.
Mitigation
Per-session tenant binding enforced at the tool boundary.

Controls that hold

  1. Quarantine untrusted content into a non-instruction context the model treats as data.
  2. Gate every tool call behind explicit, per-call authorization tied to the session.
  3. Bind sensitive tools to the requesting principal so reach can't exceed the session's scope.

Framework mapping

Control OWASP LLM NIST AI RMF
Input isolationLLM01Measure 2.7
Tool scopingLLM06Manage 1.3
Per-session bindingLLM06Manage 2.2

Checklist

  • Untrusted content is never concatenated into the instruction context.
  • Every tool has an allow-list and per-call authorization.
  • Agent credentials are scoped to the task's minimum reach and short-lived.
  • Tool calls that cross the session's tenant/scope are detected and alerted.

Put the research to work.

See how SecuraAI discovers, scores, and governs every AI asset in your environment.