ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
CoRR(2024)
摘要
Tool learning is widely acknowledged as a foundational approach or deploying
large language models (LLMs) in real-world scenarios. While current research
primarily emphasizes leveraging tools to augment LLMs, it frequently neglects
emerging safety considerations tied to their application. To fill this gap, we
present ToolSword, a comprehensive framework dedicated to meticulously
investigating safety issues linked to LLMs in tool learning. Specifically,
ToolSword delineates six safety scenarios for LLMs in tool learning,
encompassing malicious queries and jailbreak attacks in the input
stage, noisy misdirection and risky cues in the execution stage, and
harmful feedback and error conflicts in the output stage. Experiments
conducted on 11 open-source and closed-source LLMs reveal enduring safety
challenges in tool learning, such as handling harmful queries, employing risky
tools, and delivering detrimental feedback, which even GPT-4 is susceptible to.
Moreover, we conduct further studies with the aim of fostering research on tool
learning safety. The data is released in
https://github.com/Junjie-Ye/ToolSword.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要