If an Agent Is Hijacked: Detection, Containment & Forensics

TL;DR

If an AI agent appears to have been steered by malicious content (hidden text, overlays, fragments) or performed unexpected sensitive actions (autofill into strange fields, unapproved clicks, data exfil), stop the agent immediately, preserve evidence (raw DOM, sanitized diffs, agent action logs), rotate exposed credentials (prioritize by Risk Checkup), and run a forensics path that captures provenance while minimizing further contamination.

Quick safeguards - do these first:

</p><ul><li>Pause the agent (kill switch / bookmarklet).</li><li>Lock your vault and revoke agent autofill permissions.</li><li>Revoke agent screenshot/clipboard permissions.</li><li>Rotate any credentials the agent may have accessed (use Risk Checkup to prioritize).</li></ul><p>

Short threat recap

Agent-mediated incidents typically follow one of these patterns:

Prompt injection → actionhidden/obscured content tricks the agent into issuing an action that leaks data or performs an operation.
Overlay / clickjacking → autofillinvisible UI tricks plus autofill produce credentials in attacker-controlled fields.
Chained automationa sequence of small, legitimate-sounding actions culminates in exfiltration or destructive behavior.

Because the agent acts faster and may skip the "human noticing" defense, response must focus on immediate containment and careful evidence preservation.

Early detection signals

Fast signals (user or agent telemetry)

Unexpected agent action logsfill/click/run-JS/file-write on domains outside expected scope.
Action without user approvalagent executes action but no recorded confirmation.
High hidden-content densitysanitizer flagged many hidden nodes / zero-width sequences.
Rapid click chainmany deterministic clicks in short time from an agent session.
Outgoing calls to unknown endpoints after an agent action (post requests to unusual domains).
Autofill into non-standard selectors (inputs not matching expected id/name patterns).

Longer-term signals

Increased number of Do-Not-Automate overrides.
Multiple users reporting similar artifacts.
Spikes in masked-email/virtual-card cancellations.

First 10 minutes - stop the bleeding

Immediate containment checklist

Pause / kill the agentClick the agent toolbar toggle or run the Kill Switch bookmarklet (localStorage.agent_paused='1') or trigger the global hotkey. If desktop agent, pause process / revoke permissions.
Isolate the host/browser profileDisconnect from network (airplane mode) only if you need to prevent ongoing exfil; otherwise preserve network logs for forensics.
Lock password vault & revoke autofillLock any unlocked vaults, disable autofill, and change master session to "locked."
Revoke ephemeral credentials used by agents (API keys, ephemeral tokens)Cancel any virtual cards or masked aliases created during the incident.
Record the time & preserve volatile evidencetimestamp everything, take screenshots, and avoid further actions that would alter evidence.

Evidence capture: what to collect and how

Collect a complete chain of evidence. Use a preserve-first mindset - capture artifacts before you try to fix things.

Minimum artifact set (ordered by priority)

Agent action log (sanitized prompt → decision → action summary → user approval): export as JSON with timestamps and agent version.
Raw DOM snapshot of the page(s) the agent visited (full HTML). Preserve with sha256 hash.
Sanitized diff: the agent's sanitized view and a diff showing removed nodes/characters (hidden-node counts, zero-width chars).
Browser console logs and network logs (HAR file) for the session.
Screenshots (full-page and viewport) preserved as files.
Agent process logs (stdout, stderr) and any local extension logs.
Password manager audit logs: fill events, timestamps, origin, and whether user approved.
OS-level artifacts (running processes, open files, clipboard history if relevant).
Timeline entries: every manual step performed (who, when, why) - record as structured notes.

Provenance & integrity

Compute sha256 hashes for each artifact and store in an evidence manifest. Preserve chain-of-custody: who collected, when, where stored. Store artifacts in a write-once secure evidence store with access controls.

0–24 hours: triage & remediation

Triage steps

Scope mappingwhich users/devices/profiles interacted with the agent? Were vaults or virtual cards used? Use agent logs to map.
Impact classificationwas data exfiltrated (confirmed by network logs or external endpoint receipts)? Were credentials filled into attacker forms?
Risk Checkup runrun Risk Checkup to discover exposed or reused credentials and prioritize rotation.
Prioritize secrets rotationrotate highest-risk secrets first - primary email, admin accounts, financial accounts.

Remediation actions

Rotate exposed credentials (prioritize high-value accounts).
Cancel affected virtual cards / masked aliases and reissue if necessary.
Revoke agent tokens / suspend agent accounts and update to a patched version if vendor fixes are available.
Notify stakeholders (security ops, legal, compliance, impacted business owners) with a short internal incident report.

24–72 hours: deep forensics & notifications

Forensics

Recreate the agent's input timeline (sanitized prompt → removed content → action).
Correlate network destinations with known malicious domains; capture exfil endpoints and timestamps.
Check vault logs for any fill events and cross-reference with agent action trace.

Communications templates

INTERNAL INCIDENT ALERT (IM)

INCIDENT - Agent-Mediated Suspicious Activity Time: [ts] - Detected: agent performed [action] on [domain]. Status: Contained (agent paused). Immediate actions: paused agent, locked vaults, cancelled virtual cards. Responsible: [Recovery Lead]. Next update by [time].

PUBLIC / CUSTOMER STATEMENT

On [date] we detected unusual automated behavior originating from an AI agent used within our environment. We paused the agent, contained the incident, and are working with vendor/forensics to investigate. We have rotated impacted credentials and will notify affected users directly if their accounts are impacted. [Contact & status page]

Containment playbook

Individual / small team

Pause/kill agent.
Lock vault & disable autofill.
Cancel virtual cards and disable masked aliases.
Change primary email password and enable MFA.
Run Risk Checkup to prioritize other changes.

Enterprise (richer controls)

Global agent suspend via MDM / central admin.
Block agent egress to suspicious endpoints at the proxy.
Rotate secrets via shared vault automation (admin/email/billing first).
Run enterprise Risk Checkup and revoke sessions.
Execute communications plan.

Post-incident: remediation, lessons & policy updates

Post-mortemtimeline, root cause, who did what, what failed and what worked - publish internally.
Triage listcode fixes (sanitize, agent-safe endpoint), vendor patches, policy changes (do-not-automate lists, agent privilege limits).
User remediationnotify affected users, help with vault recovery, and provide a "what to do" guide.
Measure & tuneif detection false positives are high, tune thresholds; if misses occurred, lower thresholds or add new telemetry.

Post-incident KPIs

Containment time (goal: < X minutes for critical incidents).
Time to rotate top 10 exposed credentials (goal: under 24h for priority creds).
Number of artifacts captured & verified (for forensic completeness).
% incidents with full chain-of-custody preserved.
Reduction in similar incidents after policy changes.

For a general incident response framework that covers all incident types beyond agent-mediated ones - detection signals, 24-hour triage, week-long recovery, and escalation templates - see our Incident Detection & Response playbook.

Agent IR Quick Checklist

FIRST 10 MINUTES

Pause/kill the agent (Kill Switch).
Lock password vault & disable autofill.
Record timestamp & take full-page screenshots.
Preserve raw DOM & sanitized diff.
If ongoing exfil suspected - isolate host (coordinate for forensics).

0–24 HOURS

Export agent action log (sanitized + raw hashes).
Export browser HAR & console logs.
Cancel virtual cards / masked aliases used.
Run Risk Checkup and rotate highest-priority credentials.

24–72 HOURS

Image hosts if needed; capture memory, disk.
Notify legal & execs; prepare public statement if required.
Collect vendor & SIEM artifacts and begin root-cause analysis.

Need an incident playbook or 1:1 runbook review?

Ivy's Risk Checkup helps you prioritize and rotate credentials fast after an agent-mediated incident.