Discriminative Probing and Tuning for Text-to-Image Generation

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2024)

引用 0|浏览74
暂无评分
摘要
Despite advancements in text-to-image generation (T2I), prior methods oftenface text-image misalignment problems such as relation confusion in generatedimages. Existing solutions involve cross-attention manipulation for bettercompositional understanding or integrating large language models for improvedlayout planning. However, the inherent alignment capabilities of T2I models arestill inadequate. By reviewing the link between generative and discriminativemodeling, we posit that T2I models' discriminative abilities may reflect theirtext-image alignment proficiency during generation. In this light, we advocatebolstering the discriminative abilities of T2I models to achieve more precisetext-to-image alignment for generation. We present a discriminative adapterbuilt on T2I models to probe their discriminative abilities on tworepresentative tasks and leverage discriminative fine-tuning to improve theirtext-image alignment. As a bonus of the discriminative adapter, aself-correction mechanism can leverage discriminative gradients to better aligngenerated images to text prompts during inference. Comprehensive evaluationsacross three benchmark datasets, including both in-distribution andout-of-distribution scenarios, demonstrate our method's superior generationperformance. Meanwhile, it achieves state-of-the-art discriminative performanceon the two discriminative tasks compared to other generative models.
更多
查看译文
关键词
Comprehensive Evaluation,Ability Of The Model,Model Discrimination,Discrimination Task,Discrimination Performance,Image Classification,Feature Maps,Object Detection,Intermediate State,Bounding Box,Diffusion Model,Latent Space,Semantic Similarity,Semantic Representations,Basic Abilities,Matching Performance,Semantic Space,Foundation Model,Counting Error,Local Ground,Global Matching,Projection Layer,Transformer Decoder
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要