Learning semantic alignment from image for text-guided image inpainting

The Visual Computer(2022)

引用 2|浏览27
暂无评分
摘要
In this paper, we propose a method called LSAI (learning semantic alignment from image) to recover the corrupted image patches for text-guided image inpainting. Firstly, a multimodal preliminary (MP) module is designed to effectively encode global features for images and textual descriptions, where each local image patch and word are taken into account via multi-head self-attention. Secondly, non-Euclidean semantic relations between images and textual descriptions are captured with graph structure by building a semantic relation graph (SRG). The constructed SRG is able to obtain meaningful words describing the image content and alleviate the impact of distracting words, which is achieved by aggregating the semantic relations with graph convolution. In addition, a text-image matching loss is devised to penalize the restored images for diverse textual and visual semantics. Quantitative and qualitative experiments conducted on two public datasets show the outperformance of our proposed LSAI (e.g., FID value is reduced from 30.87 to 16.73 on CUB-200-2011 dataset).
更多
查看译文
关键词
Text-guided image inpainting,Graph convolution,Generative adversarial networks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要