InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
arxiv(2024)
摘要
Recent work has embodied LLMs as agents, allowing them to access tools,
perform actions, and interact with external content (e.g., emails or websites).
However, external content introduces the risk of indirect prompt injection
(IPI) attacks, where malicious instructions are embedded within the content
processed by LLMs, aiming to manipulate these agents into executing detrimental
actions against users. Given the potentially severe consequences of such
attacks, establishing benchmarks to assess and mitigate these risks is
imperative.
In this work, we introduce InjecAgent, a benchmark designed to assess the
vulnerability of tool-integrated LLM agents to IPI attacks. InjecAgent
comprises 1,054 test cases covering 17 different user tools and 62 attacker
tools. We categorize attack intentions into two primary types: direct harm to
users and exfiltration of private data. We evaluate 30 different LLM agents and
show that agents are vulnerable to IPI attacks, with ReAct-prompted GPT-4
vulnerable to attacks 24
setting, where the attacker instructions are reinforced with a hacking prompt,
shows additional increases in success rates, nearly doubling the attack success
rate on the ReAct-prompted GPT-4. Our findings raise questions about the
widespread deployment of LLM Agents. Our benchmark is available at
https://github.com/uiuc-kang-lab/InjecAgent.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要