ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing
arxiv(2024)
摘要
Recently, researchers have proposed powerful systems for generating and
manipulating images using natural language instructions. However, it is
difficult to precisely specify many common classes of image transformations
with text alone. For example, a user may wish to change the location and breed
of a particular dog in an image with several similar dogs. This task is quite
difficult with natural language alone, and would require a user to write a
laboriously complex prompt that both disambiguates the target dog and describes
the destination. We propose ClickDiffusion, a system for precise image
manipulation and generation that combines natural language instructions with
visual feedback provided by the user through a direct manipulation interface.
We demonstrate that by serializing both an image and a multi-modal instruction
into a textual representation it is possible to leverage LLMs to perform
precise transformations of the layout and appearance of an image. Code
available at https://github.com/poloclub/ClickDiffusion.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要