FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation
arxiv(2024)
摘要
Controllable text-to-image (T2I) diffusion models generate images conditioned
on both text prompts and semantic inputs of other modalities like edge maps.
Nevertheless, current controllable T2I methods commonly face challenges related
to efficiency and faithfulness, especially when conditioning on multiple inputs
from either the same or diverse modalities. In this paper, we propose a novel
Flexible and Efficient method, FlexEControl, for controllable T2I generation.
At the core of FlexEControl is a unique weight decomposition strategy, which
allows for streamlined integration of various input types. This approach not
only enhances the faithfulness of the generated image to the control, but also
significantly reduces the computational overhead typically associated with
multimodal conditioning. Our approach achieves a reduction of 41
parameters and 30
doubles data efficiency and can flexibly generate images under the guidance of
multiple input conditions of various modalities.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要