SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
CoRR(2024)
摘要
Human communication is a complex and diverse process that not only involves
multiple factors such as language, commonsense, and cultural backgrounds but
also requires the participation of multimodal information, such as speech.
Large Language Model (LLM)-based multi-agent systems have demonstrated
promising performance in simulating human society. Can we leverage LLM-based
multi-agent systems to simulate human communication? However, current LLM-based
multi-agent systems mainly rely on text as the primary medium. In this paper,
we propose SpeechAgents, a multi-modal LLM based multi-agent system designed
for simulating human communication. SpeechAgents utilizes multi-modal LLM as
the control center for individual agent and employes multi-modal signals as the
medium for exchanged messages among agents. Additionally, we propose
Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without
compromising general abilities. To strengthen and evaluate the effectiveness of
human communication simulation, we build the Human-Communication Simulation
Benchmark. Experimental results demonstrate that SpeechAgents can simulate
human communication dialogues with consistent content, authentic rhythm, and
rich emotions and demonstrate excellent scalability even with up to 25 agents,
which can apply to tasks such as drama creation and audio novels generation.
Code and models will be open-sourced at https://github.
com/0nutation/SpeechAgents
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要