Indirect Prompt Injection & Hidden Content - How Pages Attack Your Agent

TL;DR

Modern browser/desktop AI agents often treat whatever is visible on a page (DOM text, screenshots, fragments) as input. Attackers embed instructions aimed at the agent - not the human - using invisible or obscured content (white-on-white text, zero-width Unicode, hidden DOM nodes, URL fragment payloads). Because agents read everything and may act automatically, these hidden instructions can steer an agent into leaking data, clicking malicious controls, running JS, or triggering autofill. Defenses are a mix of agent-side sanitization and policy, site-side hygiene and detection, and user-level compartmentalization.

What "indirect prompt injection" is

Indirect prompt injection = attacker-controlled content that is not a direct prompt to you but becomes input to the agent, which the agent interprets as instructions.

Key vectors:

Hidden text / white-on-white - instructions written in text that humans won't notice but the agent will parse.
Invisible DOM nodes / transparent overlays - elements that are in the DOM but visually invisible.
Non-printing / zero-width Unicode - invisible characters that encode instructions or trigger parsing quirks.
Hash fragments (example.com/page#payload) - content after # that client-side code or the agent may expose even though the server never receives it.
Rendered screenshots - agents reading screenshots may see what humans wouldn't scan closely.

Why it's indirect: the page doesn't tell the user what to do - it secretly tells the agent. The agent converts that instruction into actions.

A simple attack flow

Drop payloadAttacker plants hidden instructions on a page the agent will visit (forum post, calendar event, ad, or vendor page).
Agent visitsAgent reads visible DOM or screenshot, including hidden text/characters.
Agent actsAgent executes an action (click, type, run JS, fill).
AmplifyA click triggers autofill into attacker fields or downloads a script; the agent then reads the downloaded content or copied text and exfiltrates secrets.

This chain is why a single hidden instruction plus an autofill or click can become a total compromise.

Concrete classes of hidden instruction techniques

1. White / transparent / off-screen text

CSS sets color: #fff on white background, or opacity: 0, visibility: hidden, or positions text off-screen with left: -9999px. Humans won't notice; agents that parse all text nodes will ingest it.

2. Invisible DOM instructions (overlays)

Invisible elements with pointer-events enabled can intercept clicks or present hidden text the agent reads. Attackers combine overlays with visible UI so clicks or agent actions map to hidden controls.

3. Zero-width & non-printing Unicode

Characters such as U+200B (ZERO WIDTH SPACE), U+2060, or other control characters can encode directives or obfuscate text so human eyeballs skip it but agent tokenizers include it. Sequences of zero-width chars can act as a covert channel for instructions.

4. Hash fragments / URL tricks ("HashJack")

Malicious content placed in the fragment #... may be visible to the browser or agent via location.hash but not sent to the server. Sites that blindly reflect fragment contents into the DOM or agent-visible text create a stealth channel.

5. Render-to-image tricks

Text rendered in images with white text or hidden layers / metadata that agents OCR can read but humans won't examine closely.

Why normal defenses often fail

Human noticing no longer appliesAgents will happily parse text even if it's invisible to people.
Traditional XSS/CSP/anti-phishing tools focus on server inputs or known script patterns - they don't catch non-printing characters, CSS-hidden text, or fragment-based payloads.
Agent privileges expand the damage surfaceIf an agent can click, run JS, or access clipboard, simple instructions become powerful actions.

Detection heuristics - what to log & watch for

Candidate signals to flag

High density of hidden text nodesfraction of text nodes with computed display:none, visibility:hidden, opacity:0 or color matching background-color.
Zero-width/rare Unicode usageunusually high ratio of zero-width / control codepoints in text nodes.
Long fragmentsURL fragments > N characters containing punctuation or base64-looking blobs.
Invisible overlay elementselements matching overlay CSS patterns (absolute positioned, full-size, pointer-events enabled while invisible).
Sudden behavior after a small inputone click producing multiple downstream automated actions (download + fill + post) in the agent trace.
Agent-run JS or clicks on non-interactive elementsclicks that target elements with no visible affordance (no text, zero size).

Sanitization strategies (agent-side)

High-level rule

Agents should not treat the raw page DOM as a single trusted prompt. Instead, use a sanitized, canonicalized, and auditable agent view.

Sanitize before tokenizing

Drop hidden nodes - exclude nodes where any of the computed styles imply invisibility:

display: none or visibility: hidden
opacity < 0.05
color equals computed background-color (within tolerance)
Bounding box area near-zero (width × height < small threshold)

Also:

Strip zero-width / control unicodenormalize text with NFKC and remove U+200B, U+2060, and other control characters.
Normalize whitespace & punctuationcollapse repeated punctuation or long runs that look like payload markers.
Ignore fragmentsby default, do not include location.hash in the agent prompt; treat it as untrusted input.

Action gating & human-in-the-loop

Before an agent runs any non-trivial action (download, run JS, autofill, file write), show a concise natural-language action summary for user approval. Example: "Agent will click 'Approve' on PayX and submit this form. Approve?"

Developer & site-side hygiene

Don't hide agent-targeted instructions inside the pageIf you want to provide agent-only instructions, use a separate well-known endpoint (e.g., /.agent-instructions) guarded by content signing.
Don't reflect untrusted input without cleaningEscape and strip non-printing characters. Normalize Unicode and strip zero-width chars on ingest.
Sign and canonicalize agent dataOffer an agent-safe summary (JSON) with explicit fields: title, visibleText, forms, actionHints. Sign it with a site key or HMAC.
Frame/overlay protectionsset X-Frame-Options: DENY or CSP frame-ancestors rules to stop clickjacking via frames.
Testing & CIadd synthetic tests that inject hidden text, zero-width characters, and fragment payloads and assert your sanitization removes them.

Developer / vendor checklist

Sanitize agent input: drop hidden nodes, strip zero-width Unicode, normalize text.
Provide /.agent-view or signed agent summary endpoint.
Ignore location.hash by default for agent views.
Avoid rendering user-submitted instructions verbatim; sanitize on ingest.
Add frame/overlay protections: CSP frame-ancestors, detect pointer-events/opacity overlays.
CI tests: inject hidden text, fragments, zero-width chars and ensure sanitization removes them.
Agent-safe policy: deny JS execution & file access by default.
Telemetry: record sanitized vs raw differences; alert on payload patterns.
Provide a disclosure & contact path for security reports.

User-level guidance - what to do right now

Separate agent profilerun agents only from a dedicated browser/profile that never logs into sensitive sites.
Disable automatic actionsturn off auto-run JS, auto-click, and automatic autofill in agent contexts.
Treat clipboard & screenshots as sensitivedisable agent clipboard access by default.
Use confirmation gatingrequire the agent to summarize proposed actions and ask for an explicit approval.
Compartmentalize secretskeep secrets in a shared vault and avoid letting the agent access vault unless explicitly approved.

Protect your agent activity

Get Ivy's Risk Checkup, masked emails, and virtual cards to reduce blast radius when agents encounter the web.