LLMs Meet Multimodal Generation and Editing: A Survey
CoRR(2024)
摘要
With the recent advancement in large language models (LLMs), there is a
growing interest in combining LLMs with multimodal learning. Previous surveys
of multimodal large language models (MLLMs) mainly focus on understanding. This
survey elaborates on multimodal generation across different domains, including
image, video, 3D, and audio, where we highlight the notable advancements with
milestone works in these fields. Specifically, we exhaustively investigate the
key technical components behind methods and multimodal datasets utilized in
these studies. Moreover, we dig into tool-augmented multimodal agents that can
use existing generative models for human-computer interaction. Lastly, we also
comprehensively discuss the advancement in AI safety and investigate emerging
applications as well as future prospects. Our work provides a systematic and
insightful overview of multimodal generation, which is expected to advance the
development of Artificial Intelligence for Generative Content (AIGC) and world
models. A curated list of all related papers can be found at
https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要